ANALYSIS NOSQL DATABASE MANAGEMENT DEPENDING ON THE FEATURES AND DIFFERENTIATION OF RDBMS
ZAHRAA MUSTAFA ABDULRAHMAN AL-ANI
JUNE 2015
ANALYSIS NOSQL DATABASE MANAGEMENT DEPENDING ON THE FEATURES AND DIFFERENTIATION OF RDBMS
A THESIS SUBMITTED TO
THE GRADUATE SCHOOL OF NATURAL AND APPLIED
SCIENCES OF
ÇANKAYA UNIVERSITY
BY
ZAHRAA MUSTAFA ABDULRAHMAN AL-ANI
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF
MASTER OF SCIENCE
IN
THE DEPARTMENT OF
MATHEMATICS AND COMPUTER SCIENCEINFORMATION TECHNOLOGY PROGRAM
JUNE 2015
ABSTRACT
ANALYSIS NOSQL DATABASE MANAGEMENT DEPENDING ON THE FEATURES AND DIFFERENTIATION OF RDBMS
AL-ANI, Zahraa Mustafa Abdulrahman
M.Sc., Department of Mathematics and Computer Science Information Technology Program
Supervisor: Assist. Prof. Dr. Abdül Kadir GÖRÜR
June 2015, 53 Pages
In Nowadays, there are two major of database management systems which are used to deal with data, the first one called Relational Database Management System (RDBMS) which is the traditional relational databases, it deals with structured data and have been popular since decades since 1970, while the second one called Not only Structure Query Language databases (NoSQL), they are dealing with semi-structured and unstructured data; the NoSQL types are gaining their popularity with the development of the internet and the social media since April 2009. NoSQL are intending to override the cons of RDBMs, such as fixed
In order to overcome these limitations, a new database model known as Not Only SQL (NoSQL) database emerged with a set of new features. The main objective of NoSQL is not to discard SQL, but to be used as an alternative database data model for new features [1] [2] [3]. NoSQL database increases the performance of relational databases by a set of new characteristics and advantages. In contrast to relational databases, NoSQL databases introduced an additional feature that provides flexible and horizontal scalability and taking advantage of new clusters. The rise of NoSQL provides cost-effective management of data in modern web applications. With its new features, NoSQL can be used with applications that have a large transaction, and require low-latency access to huge datasets, service availability while
Though non-relational databases have been around since the 1960s, many companies have used relational databases to store data[2] but over the past decade with companies generating vast amounts of data, relational databases are unable to effectively manage these large data collections[1]. An ever increasing amount of companies is now, however, turning to non-relational databases known as NoSQL databases as they are more effective at handling these large amounts of data thus the reason we have seen an increase in its popularity over the past decade[2]. The term NoSQL database which stands for Not Only SQL[3] is defined as a database that
For the purpose of this paper, we are going to focus on these three type of NoSQL database BigTable, Cassandra, DynamoDB.
Relational databases play a major role in making many apps and programs work. They provide an easy way to store large amounts of data in a consistent, non duplicating, and maintainable way to be used by developers for analytical or software use ("Advantages of a relational database", n.d.). However, more and more applications and companies with a tremendous amount of data such as search engines, social networks, and e-commerce sites have been requiring a level of speed and scalability that relational databases can not provide ("Why NoSQL?", n.d.). NoSQL is a name given to a quickly growing type of database known as non-relational databases, which are being used to store and manage huge amounts of structured, semi-structured, and non-structured data known as "Big Data" ("Why NoSQL?" n.d.). With the advent of social networks and apps with millions of users, the rate of growth of non-structured and semi-structured data is exponential, and the value in being able to quickly traverse it, analyze it, and use it for development is also growing quickly (McGuire, Manyika, & Chui, 2012).
In order to understand NoSQL databases, chapter two will describe the most significant features of NoSQL databases for solving the above mentioned requirements. Since the relational data model is not suitable for some use cases, chapter three will explain structure and flexibility of different data models offered by NoSQL databases. Chapter four will compare two of the widely used NoSQL databases which are MongoDB and Cassandra.
Relational databases are the most prevalent in today’s database needs for numerous different applications. These databases go by specific rules and adding in a lot of different attributes involves complexity to the system. Especially in the web applications, it gets harder for the relational databases to handle the capacities and possibilities. The web domain thus, becomes the main motivator for NoSQL.
A relational database is a group of data which classified into a set of tables that can be accessed in several ways without having to reconstruct the tables’ oftenly.Relational Database was proposed by Edgar Codd around the time 1969 since then it has become very prevalent for commercial applications. In the 20th century there were countless Relational Database System (RDBMS) ,take for instance: IBM.DB2 and Oracle.
MongoDB is one of numerous cross-stage archive situated databases. Named a NoSQL database, MongoDB shuns the customary table-based social database structure for JSON-like archives with element constructions (MongoDB calls the organization BSON), making the combination of information in specific sorts of utilizations less demanding and quicker. Discharged under a mix of the GNU Affero General Public License and the Apache License, MongoDB is free and open-source programming.
2. NoSQL: NoSQL incorporates a wide assortment of diverse databases which was developed to combat increase in volume of data used by clients, items and products, the rate at which data is accessed, and execution and handling needs. Many technologies are available in NoSQL. Example, document-oriented databases, graph databases, big table structures, etc. It has an advantage of compatibility with many systems.
The modern RDBMS advancements are not capable of supporting unstructured information with ideal space necessity. The plan winds up plainly mind-boggling and is henceforth troublesome for designers. The requirement for unstructured information administration is so annoying with conventional RDBMS arrangements (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). Moreover, RDBMS turns out to be an exorbitant answer for creating light-footed web applications with direct information investigation necessities. NoSQL is developing as a proficient possibility in this situation, which connects the issues related with RDBMS innovation. The market development can credit to creative dispatches of NoSQL arrangements, and collective endeavors by NoSQL sellers and clients. The endeavors of organizations, to enhance their market offerings, are creating the request of NoSQL, as a back-end bolster (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). The emergence of agile software development is creating the demand for NoSQL (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). They offer users much more avenues to accept data in many different forms. NoSQL is adaptable as SQL but offers many more uses that can apply to many organizations.
There is also a much talked about database called Cassandra which also needs to be discussed. It was originally developed by Facebook as open-sourced in 2008 [6]. Facebook was among the first to try the system for its inbox search system, which controls and stores in its disk space, and with the high performance of the system within its service level agreement requirements more applications like Netflix, Twitter etc. embraced Cassandra as their storage engine as well as backend for their streaming services [9]. What is Cassandra? Based on many definitions, Cassandra is a type of open source distributed database that is highly scalable, high performance designed to handle big amounts of data between many commodity servers that guarantees high availability without failure. Its main duty is high performance, also with its robust clusters among several data centers, as well as providing low latency operation for its various clients which is why businesses love it. It was written in Java language. Cassandra in accordance with research conducted on NoSQL systems concluded that its scalability, ability supersedes rest of the database management system with its largest number of nodes. Designed as a distributing system, which supports replication and multi replication as well as the ability to replace failed nodes without downtime [2]. Cassandra supports other open source like Hadoop, Apache Pig etc. It is similar with relational database since
The RDBMS is so widely used because of its simplicity to understand relationships between data. The data can also be analyzed in many ways using queries, reports, etc. A database management system covers all functions of the business and is essential for businesses to be run efficiently. Since the introduction of this time of database management system in 1970 it has triumphed many former types of widely used databases and fended off new types of databases to remain the most common type.
With the appearance of Big Data, there was clearly a need for more flexible databases. In this paper, we will review one of the graph database (Neo4j), and compared it with one of the traditional relational databases (MySQL) based on the features like ACID, replication, and the language that is used for both of them. MySQL is being another name for Relational Databases and it has been used for a long time period until now. And Neo4j which is a graph database and it is a part of the emerging technology that is called NoSQL is now trying to prove that there is a need for NoSQL usage.
The past thirty years have seen increasingly rapid advances in the field of Database. Moreover the amount of data being stored in electronic format has been increased dramatically. This increased gives rise to increase accumulation of data at a very quick rate. In addition, the volume of information in the world has been projected to doubles every two years. For example, the health care database system or financial database system is worth instances for the types of data that are being collected and increased dramatically. In fact we are living in a world where vast amounts of data are collected daily and we cannot stop our live to interact with data because we are actually living in an age of the data. There are Terabytes or
Currently, there are two major of database management systems which are used to deal with data, the first one called Relational Database Management System (RDBMS) which is the traditional relational databases, it deals with structured data and have been popular since decades from 1970, while the second one called Not only Structure Query Language databases (NoSQL), they have been dealing with semi-structured and unstructured data; the NoSQL term was introduced for the first time in 1998 by Carlo Strozzi and Eric Evans reintroduced the term NoSQL in early 2009, and now the NoSQL types are gaining their popularity with the development of the internet and the social media. NoSQL are intending to override the cons of RDBMS, such as fixed schemas, JOIN operations and handling the scalability problems. With the appearance of Big Data,