Literature Review on the MapReduce Programming Model

Back to categories

2021-07-01

7 pages

1726 words

Categories:

Computer Science and IT

University/College:

Boston College

Type of paper:

Literature review

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

MongoDB and Hadoop are two data analysis scientific platforms that facilitate capturing, storage and manipulation of organizational semi-structured data. Organizations in the modern world are struggling with how to handle large data resulting from various internet related platforms such as social media marketing and business platforms. MongoDB platform is a type of NoSQL open source which is a document-oriented database supported by 10gen. Being a non-relational scientific facility, MongoDB is characterized by sorting and post-primary indexing and ranging queries. Hadoop open-source framework facilitates storage and analysis of big data in clusters. Data locality in this context is attained by combing data to the servers. Performance of both MongoDB and Hadoop open-source frameworks is a central issue which organization should consider based on a comprehensive comparison of other available alternatives which can facilitate manipulation of semi-structured data. This paper emphasizes on the performance evaluation of the frameworks based on storage, efficiency, reliability, reading and writing programming model aspects.The performance rate of the semi-structured software frameworks such as MongoDB and Hadoop via the application of MapReduce programming model can be evaluated based on these elements. The mapReduce programming model enables the open-source software to compute complex data by facilitating calculation of s pair of inputs hence resulting to a pair of output. Hadoop software comprises of various parts which include, Hadoop Distributed File system (HDFS) and a data processing section which is mainly made of the mapReduce programming model. Analysis of significant contents from social media platforms such as facebook and twitter has been challenging to many marketing businesses since some of the semi-structured data analysis frameworks they use such as AsterixDB are underperforming and are time-consuming. MongoDB and Hadoop open-source softwares are the only available and suitable alternatives for these organization but their suitability should be weighed and judged based on performance evaluation.

The decision to choose the best platform for scientific data analysis should be based on performance rate which is reported in this paper based on various scholarly advisable sources. The ability and organizational potentiality to achieve the set goals are centered on the performance of the scientific data analysis platform used, and the best one is obtainable via evaluating the performance rate of the available alternatives.

Main ideas:

Efficiency:

Dean Jeffrey and Sanjay Ghemawat [3] Efficiency of NoSQL systems is a primary issue which is fundamental in evaluating performance rate. MongoDM and Hodoop are the commonly and popularly acknowledged NoSQL systems and whose potentiality is ascertained based on their efficiency. The read and write intensive aspects are the primary elements which are used in ascertaining the consistency characteristic of these systems. The most efficient system is determined based on its ability to facilitate read and write perspectives. Most NoSQL systems are commonly and prominently acknowledged for their excellent potentialities. According to Dean Jeffrey, and Sanjay Ghemawat [3] study, MongoDB and Hadoop are very different based on their intensiveness in the ability to process and store limited and unlimited data flow. The intensiveness differences between scientific data analysis systems such as MongoDm and Hadoop is evident on their performance variations. MongoDM being mainly characterized by its primary functionality and features which enables it to document-oriented and organized? It is also schema-less which allows it to be dynamically adjustable by adding more documents. These primary characteristics of MongoDM contribute to its inability to be more efficient than Hadoop. The examination level which makes Hadoop system more efficient is the fact that, it facilitates read and write intensiveness while MongoDM is just a storing platform. MongoDM is not that effective in processing data since it is not consistent as compared to Hadoop whose efficiency is mainly facilitated by the fact that its hyper-tabulated. The central issue which this article reflects is the fact that performance evaluation is a highly demanding process which should be centered on the efficiency of scientific data analysis systems. Efficiency aspect should be ascertained based on the consistency and partition tolerance. Data analysis systems which portray these features can offer multiple services such as both read and write intensiveness. As per the case analysis of the article by Dean Jeffrey, and Sanjay Ghemawat [3] Hadoop system is more efficient than MongoDM since it offers both read and write intensive by combining data and servers. Hyper-tabulation which reflects the superiority of a scientific data analysis system defines and describes the prominent platform and which via the incorporation of MapReduce programming its efficiency is boosted.

Cost-effectiveness: This is the main factor which is considered in evaluating the performance of scientific data analysis system regarding efficiency. The determination of the suitable system between MongoDM and Hadoop platforms is the cost incurred in perfecting its performance based on their intensiveness. Read and write intensiveness are the fundamental aspects which characterizes NoSQL systems and which draws the marked line of variation between MongoDM and Hadoop. MongoDM is read and write intensive according to Dede et, al. [5]moreover it can be as Hadoop, but it is expensive to attain a significance level of performance. The cost-effectiveness between these two systems makes Hadoop system the best and the advisable one for many organizations in the facilitation of services such as Business intelligence. The fact that, this scientific data analysis platform can offer both reads and write services at an affordable cost makes it more preferred than MongoDM.

Productivity rate: Dede et, al. [5], Efficiency is a factor which is fundamentally based on the productivity rate of the data analysis system applicable in a particular situation. The ability of the both Hadoop and MongoDM systems to read and write is the differencing factor which reflects the productivity rate. HDFS which is a basic processing factor of Hadoop system is more productively efficient than MongoDB . The HDFS can write 74.2 records per minute compared to MongoDB which can write 3.2 million records per minute while attached to two data nodes and two servers respectively. These experimental variations between the two scientific data analysis systems reflects that Hadoop platform is more productively efficient and cost-effective than MongoDM . This measurement also portrays that, efficiency in specific productivity rate is an indispensable factor in evaluating the performance rate between Hadoop and MongoDM scientific data analysis systems. This systems e read intensiveness is also among the primary factors widening the variations between the scientific data analysis platforms which is under the conceptual productivity roof. Based on Dean, Jeffrey, and Sanjay Ghemawat (2008) the read performance differences between HDFS and MongoDM is at a ratio of 24:1 which is much higher than the write intensiveness variation. Conclusively, based on these perspectives it is clear that productivity rate is a central efficiency aspect which should be considered in the identification process of the performance evaluation results of both MonogoDM and Hadoop systems.

Storage pattern:

Yoo, Soyeop, Taesoo Park, Jein Song, and Okran Jeong [1], The concept of storage, is primarily addressable by considering the time and location which a scientific data analysis platform portrays. The determination of the performance rate of distributed data management systems such as MongoDB and Hadoop, the speed which is measured by the time consumed in transmitting input and storing it is recorded. The time utilized by each system moderated by the complexity of the available data from various sources such as users of organization's social media marketing platforms. Processing is an unavoidable phase in the systematic steps which distributed data management systems relies on and it an essential feature which describes the timeframe which they take before completing the storage process. Indexing method is an indispensable moderator of the time which the systems depends on in minimizing the time consumed in this process.

Reliability :

Accuracy is a central pillar in the determination of the performance rate of scientific data analysis systems. This aspect is mainly founded on the system's ability to incorporate appropriate programming models which facilitates production of accurate output. Bonnet Laurent [2], noted that MapReduce programming model is a primary moderator of scientific data analysis systems ability to post precise output after a vivid processing and manipulation of input. MapReduce model facilitates parallelization of the computation across various machines based on the map and reduces functions. This programming model enables keying in of a pair of values as the input and a set of paired values as output. The functionality of this model and its compatibility with various scientific data analysis systems such as Hadoop to accurately process and produce a perfect outcome in the shortest time possible is high. Read and write aspects are enhanced by this programming model since it facilitates data manipulation and enhances speed in data processing.

Compatibility of programming models as proposed by LEMO-MR concept is of great significance value of the reliability potential of scientific data analysis systems. Fadika, Zacharia, and Madhusudhan Govindaraju [4] noted that MapReduce is a central programming model which facilitates the implementation of the LEMO-MR perspective; a high speed and reliable process of data manipulation. This concept via the MapReduce programming facilitates execution of parallel data processing techniques hence enabling the use of CPU-intensive applications. These aspects reflect the performance rate of various scientific data analysis systems based on their ability to incorporate multiple programming models such as MapReduce in process and manipulation of data. Accuracy is enhanced by these perspectives since the scientific data analysis platforms enhance the timeframe consumed in data processing and analysis. Hadoop system is the highly compatible platform since it incorporates; MapReduce programming model has its founding processing aspects. Performance evaluation of scientific data analysis system should be based on the compatibility level with various programming models since they facilitate the creation of a foundation which contributes to the enhancement of accuracy. The accuracy aspect cannot be ignored in the evaluation of scientific data analysis systems since it's among the primary factors which moderate the selection process of the best platform to apply in various research and operational fields such as marketing and other business activities.

Conclusion

Madhusudhan Govindaraju [4], and Bonnet Laurent [2] reflects a similar argument based on the importance of MapReduce programming model in the enhancement of reliability as a central moderator of performance rate of scientific data analysis systems. MapReduce is an essential programming model which facilitates the implementation of LEMO-MR concept hence enhances the adoption of high speed and the reliable process of data analysis. Bonnet Laurent supports this perspective by arguing that MapReduce programming model is a primary facilitator of the scientific data analysis systems potentiality to enhance accuracy. Yoo, Soyeop, Tae...

Have the same topic and dont`t know what to write?
We can write a custom paper on any topic you need.

Order now

Request Removal

If you are the original author of this essay and no longer wish to have it published on the thesishelpers.org website, please click below to request its removal:

Literature Review on the MapReduce Programming Model

Don’t print this from here

Thank You!