Hadoop is an Apache framework that allows distributed processing of large datasets across clusters of servers using simple programming models. Hadoop was designed to scale to thousands of machines, each offering local computation and storage.
In this training students learn what Pig, Hive, and Impala have to offer for data retrieval, storage and analysis. Students will also understand how to implement real-time, complex queries on different datasets. This course also covers the fundamentals concepts of data ETL (extract, transform, load) using Pig, and explains how to import/export data using Sqoop.