I have done some tests and verified that on more-"recent" Intel and AMD processors, the cache line prefetcher behaves differently when a line belongs to a base page vs a huge page. How is the hardware exactly passed down this information since the information is process-specific? What bits are reserved on the hardware for this purposes, and does anyone know what algorithm is used to train the stride-prefetcher?