Investigation Into An Efficient Hybrid Model Of A With Mapreduce + Parallel Platform Data Warehouse Architecture Essay

Best Essays

Investigation into deriving an Efficient Hybrid model of a - MapReduce + Parallel-Platform Data Warehouse Architecture

Shrujan Kotturi skotturi@uncc.edu College of Computing and Informatics
Department of Computer Science

Under the Supervision of
Dr. Yu Wang yu.wang@uncc.edu Professor, Computer Science

Investigation into deriving an Efficient Hybrid model of a - MapReduce + Parallel-Platform Data Warehouse Architecture

Shrujan Kotturi
University of North Carolina at Charlotte
North Carolina, USA
E-mail: skotturi@uncc.edu

Abstract—Parallel databases are the high performance databases in RDBMS world that can used for setting up data intensive enterprise data warehouse but they lack scalability whereas, MapReduce paradigm highly supports scalability, nevertheless cannot perform as good as parallel databases. Deriving an architectural hybrid model of best of both worlds that can support high performance and scalability at the same time.
Keywords—Data Warehouse; Parallel databases; MapReduce; Scalability

I. INTRODUCTION

Parallel-platform data warehouse is the one that built using parallel processing database like Teradata, IBM Netezza etc. that support Massive Parallel Processing (MPP) architecture for data read/write operations, unlike non-parallel processing databases like Oracle, MySQL and SQL server that does sequential row-wise read/write operations without parallelism from DBMS. MapReduce paradigm is popularized by Google, Inc.

Get Access

Decent Essays
Case Study: Active Data Warehousing
- 1485 Words
- 6 Pages
Case Study: Active Data Warehousing
Real-time data warehousing creates some special issues that need to be solved by data warehouse management. These can create issues because of the extensive technicality that is involved for not only planning the system, but also managing problems as they arise. Two aspects of the BI system that need to be organized in order to elude any technical problems are: the architecture design and query workload balancing.
- 1485 Words
- 6 Pages
Decent Essays
Read More
Decent Essays
Nt1330 Unit 1 Problem Analysis Paper
- 428 Words
- 2 Pages
Nt1330 Unit 1 Problem Analysis Paper
Hadoop \cite{white2012hadoop} is an open-source framework for distributed storage and data-intensive processing, first developed by Yahoo!. It has two core projects: Hadoop Distributed File System (HDFS) and MapReduce programming model \cite{dean2008mapreduce}. HDFS is a distributed file system that splits and stores data on nodes throughout a cluster, with a number of replicas. It provides an extremely reliable, fault-tolerant, consistent, efficient and cost-effective way to store a large amount of data. The MapReduce model consists of two key functions: Mapper and Reducer. The Mapper processes input data splits in parallel through different map tasks and sends sorted, shuffled outputs to the Reducers that in turn groups and processes them using a reduce task for each group.
- 428 Words
- 2 Pages
Decent Essays
Read More
Decent Essays
Nt1330 Unit 3 Problem Analysis Paper
- 1103 Words
- 5 Pages
Nt1330 Unit 3 Problem Analysis Paper
MapReduce Parallel programming model if we ever get a chance. In Hadoop, there are two nodes in the cluster when using the algorithm, Master node and Slave node. Master node runs Namenode, Datanode, Jobtracker and Task tracker processes. Slave node runs the Datanode and Task tracker processes. Namenode manages partitioning of input dataset into blocks and on which node it has to store. Lastly, there are two core components of Hadoop: HDFS layer and MapReduce layer. The MapReduce layer read from and write into HDFS storage and processes data in parallel.
- 1103 Words
- 5 Pages
Decent Essays
Read More
Satisfactory Essays
Nt1330 Unit 5 Algorithm
- 239 Words
- 1 Pages
Nt1330 Unit 5 Algorithm
Hadoop1 provides a distributed filesystem and a framework for the analysis and transformation of very large data sets using the MapReduce [DG04] paradigm. While the interface to HDFS is patterned after the Unix filesystem, faithfulness to standards was sacrificed in favor of improved performance for the applications at hand.
- 239 Words
- 1 Pages
Satisfactory Essays
Read More
Decent Essays
Nt1330 Unit 3 Problem Analysis Paper
- 373 Words
- 2 Pages
Nt1330 Unit 3 Problem Analysis Paper
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a Parallel and distributed computing environment. It makes Use of the commodity hardware Hadoop is Highly Scalable and Fault Tolerant. Hadoop runs in cluster and eliminates the use of a Super computer. Hadoop is the widely used big data processing engine with a simple master slave setup. Big Data in most companies are processed by Hadoop by submitting the jobs to Master. The Master distributes the job to its cluster and process map and reduce tasks sequencially.But nowdays the growing data need and the and competition between Service Providers leads to the increased submission of jobs to the Master. This Concurrent job submission on Hadoop forces us to do Scheduling on Hadoop Cluster so that the response time will be acceptable for each job.
- 373 Words
- 2 Pages
Decent Essays
Read More
Decent Essays
The Healthcare Information And Management Systems Society Essay
- 971 Words
- 4 Pages
The Healthcare Information And Management Systems Society Essay
A data warehouse is a large databased organized for reporting. It preserves history, integrates data from multiple sources, and is typically not updated in real time. The key components of data warehousing is the ability to access data of the operational systems, data staging area, data presentation area, and data access tools (HIMSS, 2009). The goal of the data warehouse platform is to improve the decision-making for clinical, financial, and operational purposes.
- 971 Words
- 4 Pages
Decent Essays
Read More
Better Essays
The Effect Of Hadoop On The Retail Banking Data Processing
- 1765 Words
- 8 Pages
The Effect Of Hadoop On The Retail Banking Data Processing
Hadoop is an open source framework that could be very resourceful in data processing of the complex data systems, and has been reverently used in the recent past for query processing in the complex databases that contains millions of records. The major advantage of Hadoop is that it clusters the entire records to few blocks and the query is run on each cluster and the compiled information is displayed in effective terms.
- 1765 Words
- 8 Pages
Better Essays
Read More
Better Essays
Data Warehouses, Decision Support and Data Mining
- 4967 Words
- 20 Pages
Data Warehouses, Decision Support and Data Mining
Data warehouses, in contrast, are targeted for decision support. Historical, summarized and consolidated data is more important than detailed, individual records. Since data warehouses contain consolidated data, perhaps from several operational databases, over potentially long periods of time, they tend to be orders of magnitude larger than operational databases; enterprise data warehouses are projected to be hundreds of gigabytes to terabytes in size. The workloads are query intensive with mostly ad hoc, complex queries that can access millions of records and perform a lot of scans, joins, and aggregates. Query throughput and response times are more important than transaction throughput.
- 4967 Words
- 20 Pages
Better Essays
Read More
Decent Essays
Implementing A Big Data Database Platform
- 817 Words
- 4 Pages
Implementing A Big Data Database Platform
The primary objective of the document is to detail the steps necessary to implement a Big Data database platform to achieve what is called an Enterprise Data Lake and to improve Enterprise Data Hub where all data stored in one place and integrated with existing infrastructure and tools.
- 817 Words
- 4 Pages
Decent Essays
Read More
Best Essays
Hadoop Cluster On Linode Using Ambari For Improving
- 1923 Words
- 8 Pages
Hadoop Cluster On Linode Using Ambari For Improving
Abstract: these days, data-intensive issues are thus prevailing that varied organizations in numerous IT industries face them in their business operation. It 's usually crucial for enterprises to own the potential of analyzing massive volumes of knowledge in a good and timely manner. Mapreduce and its ASCII text file implementation of Hadoop dramatically simplified the event of parallel knowledge computing applications for normal users for knowledge intensive parallerly, and therefore the combination of Hadoop and cloud computing created large-scale parallel knowledge computing rather more accessible and reliable to all or any potential users than ever before. Though Hadoop has become the foremost in style knowledge management framework for parallel data-intensive computing within the clouds, the Hadoop computer hardware isn 't an ideal match for the cloud environments. In this paper, we discuss the issues of Hadoop task assignment scheme and present an improved scheme for heterogeneous computing environments using Apache Ambari with Parallel Fast Fourier Transform. We conducted in depth simulation to judge the performance of the projected theme compared with the Hadoop theme in two forms of heterogeneous computing environments that are typical on the general public cloud platforms. The simulation
- 1923 Words
- 8 Pages
Best Essays
Read More
Better Essays
Hadoop Distributed File System Analysis
- 2019 Words
- 9 Pages
Hadoop Distributed File System Analysis
Abstract - Hadoop Distributed File System, a Java based file system provides reliable and scalable storage for data. It is the key component to understand how a Hadoop cluster can be scaled over hundreds or thousands of nodes. The large amounts of data in Hadoop cluster is broken down to smaller blocks and distributed across small inexpensive servers using HDFS. Now, MapReduce functions are executed on these smaller blocks of data thus providing the scalability needed for big data processing. In this paper I will discuss in detail on Hadoop, the architecture of HDFS, how it functions and the advantages.
- 2019 Words
- 9 Pages
Better Essays
Read More
Better Essays
The Evolution Of The Data Stored Essay
- 1556 Words
- 7 Pages
The Evolution Of The Data Stored Essay
Data has always been analyzed within companies and used to help benefit the future of businesses. However, the evolution of how the data stored, combined, analyzed and used to predict the pattern and tendencies of consumers has evolved as technology has seen numerous advancements throughout the past century. In the 1900s databases began as “computer hard disks” and in 1965, after many other discoveries including voice recognition, “the US Government plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tape.” The evolution of data and how it evolved into forming large databases continues in 1991 when the internet began to pop up and “digital storage became more cost effective than paper. And with the constant increase of the data supplied digitally, Hadoop was created in 2005 and from that point forward there was “14.7 Exabytes of new information are produced this year" and this number is rapidly increasing with a lot of mobile devices the people in our society have today (Marr). The evolution of the internet and then the expansion of the number of mobile devices society has access to today led data to evolve and companies now need large central Database management systems in order to run an efficient and a successful business.
- 1556 Words
- 7 Pages
Better Essays
Read More
Better Essays
Hdfc Bank 's Largest Private Sector Bank Essay
- 1874 Words
- 8 Pages
Hdfc Bank 's Largest Private Sector Bank Essay
Before discussing the current data warehouse architecture in place at ICICI Bank, issues associated with it, especially due to immense data growth and different modalities of data sources, it would be appropriate to have a quick look at the data warehouse history and architectural framework and how ICICI Bank’s data warehouse has evolved over the years. Back in 2008 ICICI Bank used Teradata and was dependent on Teradata for its data warehouse. Back in those days the size of the data warehouse was 3TB. Because of the dramatic growth in the amount of data, user population and the source stations coupled with cost of scaling and maintenance as well as system availability,posed a problem for the bank in using their legacy data warehouse solution. The bank felt that its legacy data warehouse solution posed scalability issues and one of the major issues that bank faced was with their current
- 1874 Words
- 8 Pages
Better Essays
Read More
Better Essays
Disadvantages Of Map Reduce
- 1112 Words
- 5 Pages
Disadvantages Of Map Reduce
MapReduce is a simple and powerful programming model which enables development of scalable parallel applications to process large amount of data which is scattered on a cluster of machines. The original implementations of Map Reduce framework had some limitations which have been faced by many research follow up work after its introduction. It is gaining a lot of attraction in both research and industrial community as it has the capacity of processing large data. Map reduce framework used in different applications and for different purposes.
- 1112 Words
- 5 Pages
Better Essays
Read More
Better Essays
Different Aspects Of The Mapreduce Framework
- 5715 Words
- 23 Pages
Different Aspects Of The Mapreduce Framework
The aim of this paper is to explore different aspects of the MapReduce framework. The primary focus will be given on how MapReduce framework follows the principles and techniques of distributed and parallel programming in the context of concurrent, parallel and distributed computing. In the following sections of the report, there will be a brief introduction of the MapReduce platform and how it is related to distributed and parallel computing. Following that, the discussion will be on the phases and job life cycle of MapReduce-based programming, the functionalities of the different components of a MapReduce job, implementation of MapReduce and the challenges in the implementations. Hence, the paper covers different aspects of the methodology, implementations, issues and examples of implementation of the MapReduce framework.
- 5715 Words
- 23 Pages
Better Essays
Read More

Get Access

Investigation Into An Efficient Hybrid Model Of A With Mapreduce + Parallel Platform Data Warehouse Architecture Essay

Case Study: Active Data Warehousing

Case Study: Active Data Warehousing

Nt1330 Unit 1 Problem Analysis Paper

Nt1330 Unit 1 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 5 Algorithm

Nt1330 Unit 5 Algorithm

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

The Healthcare Information And Management Systems Society Essay

The Healthcare Information And Management Systems Society Essay

The Effect Of Hadoop On The Retail Banking Data Processing

The Effect Of Hadoop On The Retail Banking Data Processing

Data Warehouses, Decision Support and Data Mining

Data Warehouses, Decision Support and Data Mining

Implementing A Big Data Database Platform

Implementing A Big Data Database Platform

Hadoop Cluster On Linode Using Ambari For Improving

Hadoop Cluster On Linode Using Ambari For Improving

Hadoop Distributed File System Analysis

Hadoop Distributed File System Analysis

The Evolution Of The Data Stored Essay

The Evolution Of The Data Stored Essay

Hdfc Bank 's Largest Private Sector Bank Essay

Hdfc Bank 's Largest Private Sector Bank Essay

Disadvantages Of Map Reduce

Disadvantages Of Map Reduce

Different Aspects Of The Mapreduce Framework

Different Aspects Of The Mapreduce Framework

Related Topics