When should you use Pig Latin and when should you use Hive?
Depending on where you work, you may need to simply use whatever standards your company has established.

For example, Hive is commonly used at Facebook for analytical purposes. Facebook promotes the Hive language and their employees frequently speak about Hive at Big Data and Hadoop conferences.
However, Yahoo! is a big advocate for Pig Latin. Yahoo! has one of the biggest Hadoop clusters in the world. Their data engineers use Pig for data processing on their Hadoop clusters.
Alternatively, you may have a choice of Pig or Hive at your organization, especially if no standards have yet been established, or perhaps multiple standards have been set up.

If you know SQL, then Hive will be very familiar to you. Since Hive uses SQL, you will feel at home with all the familiar select, where, group by, and order by clauses similar to SQL for relational databases. You do, however, lose some ability to optimize the query, by relying on the Hive optimizer. This seems to be the case for any implementation of SQL on any platform, Hadoop or traditional RDBMS, where hints are sometimes ironically needed to teach the automatic optimizer how to optimize properly.

However, compared to Hive, Pig needs some mental adjustment for SQL users to learn. Pig Latin has many of the usual data processing concepts that SQL has, such as filtering, selecting, grouping, and ordering, but the syntax is a little different from SQL (particularly the group by and flatten statements!). Pig requires more verbose coding, although it’s still a fraction of what straight Java MapReduce programs require. Pig also gives you more control and optimization over the flow of the data than Hive does.

Hadoop expert Alan Gates words on comparing the differences between Pig Latin and Hive and when to use each of them.


Published on

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *


Full Stack Cloudiologist Mind

Back to Home