Azinta Systems Blog

October 16, 2010

Scaling-up GPUs for "Big Data" Analytics – MapReduce and Fat Nodes

Filed under: GPU, GPU Analytics, LinkedIn, MapReduce — Tags: , , , — Suleiman Shehu @ 16:36

Over the last two years large clusters comprising 1,000s of commodity CPUs, running Hadoop MapReduce, have powered the analytical processing of “Big data” involving hundreds of terabytes of information.

Now a new generation of CUDA GPUs based on the Fermi (see figure 1) and the Kepler (Kepler is due in 2011) have created the potential for a new disruptive technology for “Big data” analytics based on the use of much smaller hybrid CPU-GPU clusters.

(more…)

GPU and Large Scale Data Mining

Filed under: GPU Analytics — Tags: , — Suleiman Shehu @ 14:27

The GPU (Graphics Prossessing Unit) is changing the face of large scale data mining by significantly speeding up the processing of data mining algorithms.  For example, using the K-Means clustering algorithm, the GPU-accelerated version was found to be 200x-400x faster than the popular benchmark program MimeBench running on a single core CPU, and 6x-12x faster than a highly optimised CPU-only version running on an 8 core CPU workstation.

These GPU-accelerated performance results also hold for large data sets.  For example in 2009 data set with 1 billion 2-dimensional data points and 1,000 clusters, the GPU-accelerated K-Means algorithm took 26 minutes (using a GTX 280 GPU with 240 cores) whilst the CPU-only version running on a single-core CPU workstation, using MimeBench, took close to 6 days (see research paper “Clustering Billions of Data Points using GPUs” by Ren Wu, and Bin Zhang, HP Laboratories).  Substantial additional speed-ups are expected were the tests conducted today on the latest Fermi GPUs with 480 cores and 1 TFLOPS performance.

Over the last two years hundreds of research papers have been published, all confirming the substantial improvement in data mining that the GPU delivers.   (more…)

Powered by WordPress