preview

Pavlo Comparison Essay

Decent Essays

The paper “A Comparison to Approaches to Large-Scale Data Analysis” by Pavlo, compares and analyze the MapReduce framework with the parallel DBMSs, for large scale data analysis. It benchmarks the open source Hadoop, build over MapReduce, with two parallel SQL databases, Vertica and a second system form a major relational vendor (DBMS-X), to conclude that parallel databases clearly outperform Hadoop on the same hardware over 100 nodes. Averaged across 5 tasks on 100 nodes, Vertica was 2.3 faster than DBMS-X which in turn was 3.2 times faster than MapReduce. In general, the parallel SQL DBMSs were significantly faster and required less code to implement each task, but took longer to tune and load the data. Finally, the paper talk about …show more content…

These arguments makes it clear that MapReduce is performs best when used over larger number of nodes which is where parallel databases starts to degrade is proves too complex and costly.

I would also like to cite the P3 Project paper “Effective Data Management in HealthCare Industry”, submitted by team Phoenix [7]. This paper compares performance of three different data storage system over thousand records to hundred million records. The experimental results in the paper clearly shows that it is better to work with small data sets on a RDBMS, like Oracle 10g as used by them, however for large data sets they are not a good option as they require a huge amount of processing time as shown in their experiments. They made use of Hive for large data sets which proved to be much faster and very cost efficient. Although, they did not use parallel DBMS but the efficiency, in terms of speed, cost and complexity, of Hive could be easily estimated. Another P3 project paper, “Data Analysis using Cloud Computing” [8] by team Nimbus, makes use of PigLatin, which is built over Hadoop. Their experimental analysis were done on 40 GB of data, stored in Amazon S3, which took them 5 minutes for analysis to be done with 15 instances running. They claim PigLatin provides a fast and

Get Access