Introduction
This report basically describes the process of design NoSQL systems for data persisence and implementation of design and the solution of tasks that we are required. The dataset we worked with is a music dataset from lastfm and the designs for MongoDB, HBase and Neo4j are based on the dataset features and given queries. The implementation includes creating databases, setting up the schema and running queries, followed by testing the performance. There are also iteration designs for each system in order to gain higher performance.
The report contains five sections. A brief introduciton is showing here and each system has two sections to demonstrate the schema and query design. At the end of the report, a section for
…show more content…
Figure 2. Schema design for solving queries (Schema2)
Schema2 is consist of three collections : “UserArtistInfo”, “Friends”, and “Artist”. “UserArtistInfo” collection has fields of UserID, ArtistID, ArtistName, Weight, Tag YN, TagId, TagValue, TimeStamp and does not structured as a embedded document. Like a RDBMS, each row has independent information of a user – artist – weight – tag which is easier to update and read. “Artist” collection is set aside due to its rare usage. “Friends” collection is also created separately to be linked (joined) when it needs to be.
The data structure of schema 1 and schema 2 are same as below Figure3.
Figure 1. Data structure for schema 1 and 2
Query Design
The given 8 queries can be distinctively divided into two parts: one needs a join function with 2 collections, while another needs not. As query 1, 7, 8 requires user – friends relationship, it needs join function executed by join aggregation commands ($lookup). On the other hand, the other queries (2,3,4,5,6) can be carried out by using only “UserArtistInfo” collection. Furthermore, simple queries such as query 2 and 4 does not necessarily need aggregation pipeline, in the mean while query 1,3,5,6,7,8 need aggregation pipeline which is more complex using the syntax of “$lookup”, “$unwind”, “$match”, “$group”, “$project”, “$sort”, and “$limit”.
Execution
Simple queries
Query1 : given a user id, find all artists the user’s friends listen
Phase 3: Sketch the star schema for the database and developed the database based on the star schema.
Since 1960 and beyond the need for an efficient data management and retrieval of data has always been an issue due to the growing need in business and academia. To resolve these issues a number of databases models have been created. Relational databases allow data storage, retrieval and manipulation using a standard Structured Query Language (SQL). Until now, relational databases were an optimal enterprise storage choice. However, with an increase in growth of stored and analyzed data, relational databases have displayed a variety of limitations. The limitations of scalability, storage and efficiency of queries due to the large volumes of data [1] [2].
Provide reasoning to support the use of the NoSQL database as the database of choice to solve the problem faced by TWC. Identify one strength and one weakness for each of the other three kinds of databases to solve the problem for TWC.
The relational model, which uses predefined tabular relations to store data, has remained the preeminent model for data storage since it was first implemented in the early 1980s. However, due to the proliferation of the Internet, today data flows in and out of organizations quickly, and most of this data is in a semi-structured state that is designed for communication over http. It is difficult to fit this complex data into a flat two dimensional array. For that reason, it is imperative that companies have the ability to store data in a semi-structured format compatible with modern network communications as well as various platforms and devices. The market has realized this and responded with document stores that support formats,
The rapid growth in the world of technology has influenced the way we communicate, shop, learn, and share information. The development of technology led database analysts and administrators to find more convenient ways to store the big amount of data. Big data is known as expression in the tech-world. It is defined as a huge collection of data that cannot be managed by relational databases (Moniruzzaman and Hossain 1). So, developers start to use non-relational databases (NoSQL) to arrange and store the Big data. In order to understand how developers solve the storing issue of the big amount of data and provide systems that can sync data between multiple devices, we need to start with a brief background of NoSQL databases to understand Couchbase system. The purpose of this paper is to define NoSQL database, compare it with SQL database, define Couchbase and describe how Couchbase is synchronizing data between multiple devices, especially Couchbase Mobile.
NoSQL databases are designed to expand transparently and horizontally to take advantage of new nodes, and designed with low-cost hardware. SQL have problems in Scalability.
For example, Facebook which is the most popular social networking website recently announced their adoption of a NoSQL based graph data store for efficient storage of user data. In other words, NoSQL has already made its way into the enterprise. However, just like every other widely accepted technology, NoSQL has its own set of advantages and disadvantages. It is important for an enterprise to quantify the pros and cons of a particularly new database technology against the already existing solutions based on their custom requirements. For example, legacy enterprise applications may require extensive community support from their database vendors. Moreover, traditional relational database vendors such as Oracle have already established themselves for providing excellent support. On the other hand, NoSQL has been rapidly growing since the past few years and is consistently evolving in terms of big data handling, data warehousing and lesser complexity. Hence, there is a need to study the current market of data stores based on the most popular NoSQL data stores and how well they fair against the widely accepted traditional database systems. This requires a study of the commonly used NoSQL data stores.
“NoSQL practitioners focus on physical data model design rather than the traditional conceptual / logical data model process” (Hsieh, 2014). The mindset of the data modelers have changed in recent years. The flexibility, scalability and the ability to handle variety of structured to unstructured data of the NoSQL data bases have made the data modelers to think more in business –centric notion.
The tables and joins are perplexing since they are standardized (for RDMS). This is carried out to decrease excess information and to spare space.
“Hadoop” it is not a language or technology, it is a frame developed by Yahoo and maintained by apache for big data problem. Data in the web, internet will be of different formats. Data will be of image, text, media files. Data can be of many structured, unstructured and semi structured formats. In the current world than static data, the streaming data is more and capturing it is a challenge for today’s world.
NoSQL databases had made for unraveling the Big Data issue by utilizing a distributed system to bring out excellent performance in data storage and retrieval at very large-scale. At this scale, pieces of the system often fail and NoSQL is created to handle these failures (Chow, 2013) (Ron, Shulman-Peleg, & Bronshtein, 2015). Various companies have espouse different sorts of non-relational databases, ordinarily alluded to as
The demands on database technology have been ever expanding since its introduction in the 1960’s. Today traffic on the internet requires that millions upon millions of records be stored and queried each second. Data must be highly available and quickly retrievable. These requirements put together have given rise to new forms of database technologies collectively called “NoSQL” or “Not Only SQL”. NoSQL eschews the strict guidelines that govern the creation and function of traditional relational databases. These guidelines are put aside in order to rise to the new demands of an increasingly interconnected world. The rigorous standards and data definitions of relational databases give way in order to provide the ability to rapidly
Information is stored in a triplestore and retrieved using a database query language called SPARQL. SPARQL is a query language for RDF data. It is a basic method for querying remote databases over HTTP. SPARQL can perform graph pattern matching queries and allows users to specify types of accessibility and navigational queries (the shortest distance between two nodes or how two nodes are connected) in triplestores. SPARQL generates powerful queries and reasons intuitively on the data in a triplestore. This also allows computers to reason intuitively on data.
The modern RDBMS advancements are not capable of supporting unstructured information with ideal space necessity. The plan winds up plainly mind-boggling and is henceforth troublesome for designers. The requirement for unstructured information administration is so annoying with conventional RDBMS arrangements (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). Moreover, RDBMS turns out to be an exorbitant answer for creating light-footed web applications with direct information investigation necessities. NoSQL is developing as a proficient possibility in this situation, which connects the issues related with RDBMS innovation. The market development can credit to creative dispatches of NoSQL arrangements, and collective endeavors by NoSQL sellers and clients. The endeavors of organizations, to enhance their market offerings, are creating the request of NoSQL, as a back-end bolster (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). The emergence of agile software development is creating the demand for NoSQL (Big data in financial services industry: Market trends, challenges, and prospects 2013 - 2018). They offer users much more avenues to accept data in many different forms. NoSQL is adaptable as SQL but offers many more uses that can apply to many organizations.
NoSQL Databases are being used in the social media applications and big data processing based portals in which huge, heterogeneous and unstructured data formats are handled. NoSQL Databases are used for faster access of records from the big dataset at back-end. The AADHAAR Card implementation in India was done using NoSQL Databases as huge amount of information is associated including Text Data, Images, Thumb Impressions and Iris Detection. Any classical database system cannot handle the dataset of different types (Image, Text, Video, Audio, Video, Thumb Impressions for Pattern Recognition, Iris Sample) simultaneously.