Over the last two years large clusters comprising 1,000s of commodity CPUs, running Hadoop MapReduce, have powered the analytical processing of “Big data” involving hundreds of terabytes of information.
Now a new generation of CUDA GPUs based on the Fermi (see figure 1) and the Kepler (Kepler is due in 2011) have created the potential for a new disruptive technology for “Big data” analytics based on the use of much smaller hybrid CPU-GPU clusters.
The GPU (Graphics Prossessing Unit) is changing the face of large scale data mining by significantly speeding up the processing of data mining algorithms. For example, using the K-Means clustering algorithm, the GPU-accelerated version was found to be 200x-400x faster than the popular benchmark program MimeBench running on a single core CPU, and 6x-12x faster than a highly optimised CPU-only version running on an 8 core CPU workstation.
These GPU-accelerated performance results also hold for large data sets. For example in 2009 data set with 1 billion 2-dimensional data points and 1,000 clusters, the GPU-accelerated K-Means algorithm took 26 minutes (using a GTX 280 GPU with 240 cores) whilst the CPU-only version running on a single-core CPU workstation, using MimeBench, took close to 6 days (see research paper “Clustering Billions of Data Points using GPUs” by Ren Wu, and Bin Zhang, HP Laboratories). Substantial additional speed-ups are expected were the tests conducted today on the latest Fermi GPUs with 480 cores and 1 TFLOPS performance.
Over the last two years hundreds of research papers have been published, all confirming the substantial improvement in data mining that the GPU delivers. (more…)