History Of Hadoop

History of Hadoop

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware, specially used in Big Data implementations.

The project’s creator, Doug Cutting, son used to call a stuffed yellow toy elephant named “Hadoop”, so was this project named based on that toy.

Following are some of important notes about history of Hadoop

Hadoop

In 2002, internet researchers were looking for possibilities for a better search engine, and preferably one that was open-sourced.

That was when Doug Cutting and Mike Cafarella started project named “Nutch.” Hadoop was originally designed as part of the Nutch infrastructure, and was presented in the year 2005.

Google’s distributed filesystem

In 2003, Google published a publication of a paper that described the architecture of Google’s distributed filesystem, called GFS, which was being used in production at Google, that would solve their storage needs for the very large files generated as a part of the web crawl and indexing process.

In 2004, Google published one more paper on the architecture MapReduce, which was the solution of processing those large datasets.

That paper was another half solution for Doug Cutting and Mike Cafarella for their Nutch project. These both methods (GFS & MapReduce) were just on white paper at Google.

Doug Cutting knew from his work on Apache Lucene together with Mike Cafarella, he started implementing Google’s techniques (GFS & MapReduce) as open-source in the Apache Nutch project.

Cluster in production – History of Hadoop

On January 28th, 2006, first Hadoop cluster was put into production at Yahoo. Ever since 2006, the boom for Hadoop is exponentially rising, making it a cornerstone technology for businesses to power world-class products.

In 2007, Yahoo effectively tested Hadoop on a 1000 node cluster and start using it in production.

In February 2008, Yahoo production search index generated by a 10,000-core Hadoop cluster.

In late 2008, Yahoo released Hadoop as an open-source project. Today, Hadoop’s framework and ecosystem of technologies are managed, contributed and maintained by the non-profit Apache Software Foundation (ASF).

In May 2009, Yahoo used Hadoop to sort one terabyte in 62 seconds.

History of Haddoop version 1.0

In December 2011, Apache Software Foundation, ASF released Hadoop version 1.0.

In March 2013, YARN was deployed in production at Yahoo.

In Aug 2013, Version 2.0.6 was released by Apache Software Foundation, ASF.

In December 2017, Hadoop 3.0 was released.

Apache Hadopp 3.0

For details see Official web site of Hadoop here

Hadoop Consulting Services

Nub8 Hadoop Consulting Services go beyond just “business.” We know what it takes to deliver value for your business. We seek to create lasting partnerships with our customers by delivering value for money.

Nub8 expert team of consultants provide following services:

  • Administration, data integrations and architecture refinement
  • Deployment, installation and setup
  • Architecture and development
  • Cluster planning, analysis and designing
  • Solutions starting from POC to production implementations
  • Proactive and reactive monitoring and support
  •  

Recent Posts

What is Multi-cloud?

Do you wanna use more than 1 cloud computing service in a same network? You can do it with Multi-cloud. This enables to enterprises deliver

Read more »

What Can We Do For You?

Drop us a line!