Maximum achievable memory throughput
I am interested to find the maximum achievable throughput between the dram and registers in the GA102 chip. NVIDIA reports the dram bandwidth, however I could not find code that achieves it.
What is the maximum throughput can be achieved? And what code achieves it?