What is Apache Pig-Hadoop?

Apache Pig, was established by Yahoo Research in the year 2006. This language practices a multi-query method that decreases the time in data scanning. It typically runs on a client side of clusters of Hadoop. Pig usages a language called Pig Latin to make scripts that handle data. The Pig Scripts are give in to the Pig Engine that convert the Pig Latin scripts into MapReduce jobs.
Pig also contracts users a ton of operators such as joins, sorts, filters, and other operations that are similar to SQL. Along with the comfort of software design that comes with Pig, Pig also mechanically enhances the implementation of the Pig scripts for the end user so they don’t have to worry about the uninteresting stuff.

Key concepts

A runtime engine

The runtime engine is a compiler that crops sequences of MapReduce programs. It uses HDFS to stock and regain data. It is also helps to interrelate with the Hadoop system (HDFS and MapReduce). The runtime engine plans, analyses, validates, and compiles the script operations into a sequence of MapReduce jobs.

Parser

At original, all the Pig Scripts are controlled by the Parser. Basically, Parser orders the syntax of the script, perform type checking, and other various checks. Subsequently, Parser’s output will be a DAG (directed acyclic graph). That signifies the Pig Latin statements as well as logical operators.
Fundamentally, the logical operators of the script are signified as the nodes and the data flows are characterized as edges, in the DAG (the logical plan).

Optimizer

Supplementary, DAG is accepted to the logical optimizer. That conveys out the logical optimizations, like projection and push down.

Use cases of Pig Latin

For execution tasks connecting ad-hoc dispensation and quick prototyping, data scientists usually use Apache Pig. Common use cases of Pig Latin, are:
• In order to procedure huge data sources.
• Also, to do data processing for search platforms.
• Furthermore, to procedure time sensitive data loads.

Pig vs. SQL

1. Definition

Pig is a scripting linguistic used to interrelate with HDFS. SQL is a query language that helps to interrelate with databases lying in the database engine.

2. Query Style

Pig bids a step-by-step implementation style. SQL deals the single block execution style.

3. Relational

The data model in Apache Pig is nested relational. The data model used in SQL is flat relational.

4. Query optimization

Apache Pig delivers limited chance for Query optimization. There is more accidental for query optimization in SQL.

Official web site of Apache Pig

About Apache Hadoop