preview

Design Nosql Systems For Data Persisence And Implementation Of Design

Better Essays

Introduction
This report basically describes the process of design NoSQL systems for data persisence and implementation of design and the solution of tasks that we are required. The dataset we worked with is a music dataset from lastfm and the designs for MongoDB, HBase and Neo4j are based on the dataset features and given queries. The implementation includes creating databases, setting up the schema and running queries, followed by testing the performance. There are also iteration designs for each system in order to gain higher performance.
The report contains five sections. A brief introduciton is showing here and each system has two sections to demonstrate the schema and query design. At the end of the report, a section for …show more content…

Figure 2. Schema design for solving queries (Schema2)

Schema2 is consist of three collections : “UserArtistInfo”, “Friends”, and “Artist”. “UserArtistInfo” collection has fields of UserID, ArtistID, ArtistName, Weight, Tag YN, TagId, TagValue, TimeStamp and does not structured as a embedded document. Like a RDBMS, each row has independent information of a user – artist – weight – tag which is easier to update and read. “Artist” collection is set aside due to its rare usage. “Friends” collection is also created separately to be linked (joined) when it needs to be.

The data structure of schema 1 and schema 2 are same as below Figure3.

Figure 1. Data structure for schema 1 and 2

Query Design
The given 8 queries can be distinctively divided into two parts: one needs a join function with 2 collections, while another needs not. As query 1, 7, 8 requires user – friends relationship, it needs join function executed by join aggregation commands ($lookup). On the other hand, the other queries (2,3,4,5,6) can be carried out by using only “UserArtistInfo” collection. Furthermore, simple queries such as query 2 and 4 does not necessarily need aggregation pipeline, in the mean while query 1,3,5,6,7,8 need aggregation pipeline which is more complex using the syntax of “$lookup”, “$unwind”, “$match”, “$group”, “$project”, “$sort”, and “$limit”.

Execution
Simple queries
Query1 : given a user id, find all artists the user’s friends listen

Get Access