1.13
Solution:
a. Mainframes or minicomputer system: In a mainframe or minicomputer system the resources to be taken care of are :
1. memory and CPU resources, as there are many users using it.
2.storage, as it deals with a lot of data.
3.network bandwidth, as there are multiple users accessing data.
b. Workstations: In workstations the resources to be taken care of are:
1.memory and CPU resources, as there are a lot of process to be executed.
c. Handheld computers: In handheld computers the resources to be taken care of are 1.power consumption, as they run on battery and are mobile. 2.memory resources, as there is limited memory.
1.15
Solution:
Symmetric multiprocessing: here all the processors are treated as equals and I/O operations can be
…show more content…
Disadvantages: 1.They are more complex than uniprocessor systems in respect to hardware and software.
1.17
Solution:
The cluster software can access data on the disk through two ways, one is asymmetric clustering and the other is parallel clustering. Asymmetric clustering: one host runs the database application while the other host simply monitors it. If the server fails, the monitoring host becomes the active server. This is appropriate for providing redundancy. However, it does not utilize the potential processing power of both hosts. Parallel clustering: the database application is run parallel on both hosts. The difficulty in implementing parallel clusters is providing some form of distributed locking mechanism for files on the shared disk.
1.19
Solution:
An interrupt is a hardware generated change of flow of the execution within the system. An interrupt handler has to deal with the cause of the interrupt first and then the control is returned to the interrupted context and instruction. A trap is a software-generated interrupt. An interrupt is used in a hardware context i.e., can be used to signal the completion of an I/O to obviate the need for device
When a file is written in HDFS, it is divided into fixed size blocks. The client first contacts the NameNode, which get the list of DataNode where actual data can be stored. The data blocks are distributed across the Hadoop cluster. Figure \ref{fig.clusternode} shows the architecture of the Hadoop cluster node used for both computation and storage. The MapReduce engine (running inside a Java virtual machine) executes the user application. When the application reads or writes data, requests are passed through the Hadoop \textit{org.apache.hadoop.fs.FileSystem} class, which provides a standard interface for distributed file systems, including the default HDFS. An HDFS client is then responsible for retrieving data from the distributed file system by contacting a DataNode with the desired block. In the common case, the DataNode is running on the same node, so no external network traffic is necessary. The DataNode, also running inside a Java virtual machine, accesses the data stored on local disk using normal file I/O
Improved performance: A distributed DBMS fragments the database to keep data closer to where it is needed most. This helps to avoid unnecessary data transfer.
MapReduce Parallel programming model if we ever get a chance. In Hadoop, there are two nodes in the cluster when using the algorithm, Master node and Slave node. Master node runs Namenode, Datanode, Jobtracker and Task tracker processes. Slave node runs the Datanode and Task tracker processes. Namenode manages partitioning of input dataset into blocks and on which node it has to store. Lastly, there are two core components of Hadoop: HDFS layer and MapReduce layer. The MapReduce layer read from and write into HDFS storage and processes data in parallel.
Shared memory is the model that suggested for developing a parallel system in large scale analysis of social data that stored in multiple location. Shared memory is the memory that accessed by multiple program simultaneously to provide communication among them or to avoid redundant copies. We suggested shared memory as a model because the programs may run on a single processor or on multiple separate processors while using shared memory models.
Clustered file systems: Clustered file systems provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster.
Hadoop is an open source framework that could be very resourceful in data processing of the complex data systems, and has been reverently used in the recent past for query processing in the complex databases that contains millions of records. The major advantage of Hadoop is that it clusters the entire records to few blocks and the query is run on each cluster and the compiled information is displayed in effective terms.
Distributed computing alludes to controlling, designing, and getting to the applications on the web. It offers online information stockpiling, base and application.
Spark is a cluster framework with an open source software. It was 1st invented by Berkely in AMP Lab. It was initially invented by Berkeley's AMP Lab and later donated to Apache Foundation software. Apache Spark follows the concept of RDD called resilient distributed dataset. This is just a readable dataset. Later it is added to Apache foundation software Spark is built on resilient distributed datasets (RDD) as a read-only multiset of data items. Spark core, Spark SQL, Spark MLib, Spark Streaming--- are the modules in the spark. Spark is a kind of API which has inbuilt memory data, a faster machine to run that. It enables information specialists to
The file system that manages the storage across network of machines is called distributed file systems. Hadoop mainly comes with the distributed file system called HDFS (Hadoop distributed file system).
In this article, the authors proposed to use a new technique – “resource bricolage” to solve the low performance problem caused by the un-balanced workloads in parallel database systems.
Abstract - Hadoop Distributed File System, a Java based file system provides reliable and scalable storage for data. It is the key component to understand how a Hadoop cluster can be scaled over hundreds or thousands of nodes. The large amounts of data in Hadoop cluster is broken down to smaller blocks and distributed across small inexpensive servers using HDFS. Now, MapReduce functions are executed on these smaller blocks of data thus providing the scalability needed for big data processing. In this paper I will discuss in detail on Hadoop, the architecture of HDFS, how it functions and the advantages.
Applications in HDFS will require streaming access to data sets. Batch processing is done rather than interactive use by the users.
Abstract—Hadoop framework is a solution for big data problem. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. Big data is only not about storing the data, it is also about execution and analyzing of data.
A Cluster is defined as a group of independent server (nodes) in a computer system, which are accessed and presented as a single system and in turn enables high availability and, in few cases, cluster also plays important role in load balancing and parallel processing. Since it is hard to predict the number of requests that will be allocated to a networked server, clustering also plays vital role for load balancing to distribute processing and communications activity consistently across a network system so that no particular server is overloaded. Server clusters can be used to make sure that users have constant access to important server-based resources.
Server cluster is a set of liberated servers functioning in Windows Server 2003, Enterprise Edition, or Windows Server 2003, Data centre Edition, and employed together as an individual system to accommodate high facilities of services for customers. When a breakdown arises on one computer system in a cluster, sources are reclassified and the workload is reshuffled to one more computer in the cluster. We can utilize server clusters to assure that clients have consistent entry to crucial server-based resources.