Back in 2009, it was debatable whether or not general purpose computing on GPUs would ever become mainstream. They were tricky to program, you had to rewrite your application, and there was a lot of overhead transferring data via the PCIe bus. But nonetheless GPU’s power efficiency and low cost (back then, anyway) couldn’t be denied.
At the time I was a relatively new member in the Future Technologies Group at ORNL, and one of my first projects was the Scalable HeterOgeneous Computing Benchmark Suite (SHOC).
Given CUDA’s dominance in today’s GPU programming, SHOC’s original purpose now seems quaint—to try to compare CUDA and OpenCL versions of common kernels in scientific applications. But, at the time, it was unclear which programming model would win over developers. SHOC was the first benchmark suite to combine these new programming models (and eventually OpenACC) with MPI to mimic how most ORNL scientific codes would be structured.
SHOC was organized into three “levels” of benchmarks—level zero (speeds and feeds), one (common kernels), and two (application-specific kernels). It became one of the most useful tools for the group in various aspects of system planning and deployment. I used level zero all the time when bringing up the Keeneland system. Its GEMM benchmark in a tight loop to works well as a burn-in tool for GPUs.
While I actively maintained SHOC for a few years, today it is in need of some serious updates. But, given that CUDA has essentially won the language wars, it’s hard to invest a lot of time in refreshing the OpenCL code. I suspect SPEChpc 2021 will fill this space over time. It may be a better fit for ORNL overall, given the tendency of scientific app teams to rely on directive-based acceleration.