Over the last two years large clusters comprising 1,000s of commodity CPUs, running Hadoop MapReduce, have powered the analytical processing of “Big data” involving hundreds of terabytes of information.
Now a new generation of CUDA GPUs based on the Fermi (see figure 1) and the Kepler (Kepler is due in 2011) have created the potential for a new disruptive technology for “Big data” analytics based on the use of much smaller hybrid CPU-GPU clusters.
(more…)
The GPU (Graphics Prossessing Unit) is changing the face of large scale data mining by significantly speeding up the processing of data mining algorithms. For example, using the K-Means clustering algorithm, the GPU-accelerated version was found to be 200x-400x faster than the popular benchmark program MimeBench running on a single core CPU, and 6x-12x faster than a highly optimised CPU-only version running on an 8 core CPU workstation.
These GPU-accelerated performance results also hold for large data sets. For example in 2009 data set with 1 billion 2-dimensional data points and 1,000 clusters, the GPU-accelerated K-Means algorithm took 26 minutes (using a GTX 280 GPU with 240 cores) whilst the CPU-only version running on a single-core CPU workstation, using MimeBench, took close to 6 days (see research paper “Clustering Billions of Data Points using GPUs” by Ren Wu, and Bin Zhang, HP Laboratories). Substantial additional speed-ups are expected were the tests conducted today on the latest Fermi GPUs with 480 cores and 1 TFLOPS performance.
Over the last two years hundreds of research papers have been published, all confirming the substantial improvement in data mining that the GPU delivers. (more…)