History of Hadoop

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware, specially used in Big Data implementations.

The project’s creator, Doug Cutting, son used to call a stuffed yellow toy elephant named “Hadoop”, so was this project named based on that toy.

Following are some of important notes about history of Hadoop

In 2002, internet researchers were looking for possibilities for a better search engine, and preferably one that was open-sourced.

That was when Doug Cutting and Mike Cafarella started project named “Nutch.” Hadoop was originally designed as part of the Nutch infrastructure, and was presented in the year 2005.

Google’s distributed filesystem

In 2003, Google published a publication of a paper that described the architecture of Google’s distributed filesystem, called GFS, which was being used in production at Google, that would solve their storage needs for the very large files generated as a part of the web crawl and indexing process.

In 2004, Google published one more paper on the architecture MapReduce, which was the solution of processing those large datasets.

That paper was another half solution for Doug Cutting and Mike Cafarella for their Nutch project. These both methods (GFS & MapReduce) were just on white paper at Google.

Doug Cutting knew from his work on Apache Lucene together with Mike Cafarella, he started implementing Google’s techniques (GFS & MapReduce) as open-source in the Apache Nutch project.

Cluster in production – History of Hadoop

On January 28th, 2006, first Hadoop cluster was put into production at Yahoo. Ever since 2006, the boom for Hadoop is exponentially rising, making it a cornerstone technology for businesses to power world-class products.

In 2007, Yahoo effectively tested Hadoop on a 1000 node cluster and start using it in production.

In February 2008, Yahoo production search index generated by a 10,000-core Hadoop cluster.

In late 2008, Yahoo released Hadoop as an open-source project. Today, Hadoop’s framework and ecosystem of technologies are managed, contributed and maintained by the non-profit Apache Software Foundation (ASF).

In May 2009, Yahoo used Hadoop to sort one terabyte in 62 seconds.

History of Haddoop version 1.0

In December 2011, Apache Software Foundation, ASF released Hadoop version 1.0.

In March 2013, YARN was deployed in production at Yahoo.

In Aug 2013, Version 2.0.6 was released by Apache Software Foundation, ASF.

In December 2017, Hadoop 3.0 was released.

For details see Official web site of Hadoop here

Hadoop Consulting Services

Nub8 Hadoop Consulting Services go beyond just “business.” We know what it takes to deliver value for your business. We seek to create lasting partnerships with our customers by delivering value for money.

Nub8 expert team of consultants provide following services:

Administration, data integrations and architecture refinement
Deployment, installation and setup
Architecture and development
Cluster planning, analysis and designing
Solutions starting from POC to production implementations
Proactive and reactive monitoring and support

History of Hadoop

Following are some of important notes about history of Hadoop

Google’s distributed filesystem

Cluster in production – History of Hadoop

History of Haddoop version 1.0

Hadoop Consulting Services

Follow Us

Our Services

SUBSCRIBE TO OUR NEWSLETTER