Hadoop Analysis

Rating:
1 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 5
Loading...
Please Log in or register to rate

Hadoop Analysis

HDP-105

Hadoop is an Apache framework that allows distributed processing of large datasets across clusters of servers using simple programming models. Hadoop was designed to scale to thousands of machines, each offering local computation and storage.

In this training students learn what Pig, Hive, and Impala have to offer for data retrieval, storage and analysis. Students will also understand how to implement real-time, complex queries on different datasets. This course also covers the fundamentals concepts of data ETL (extract, transform, load) using Pig, and explains how to import/export data using Sqoop.

Audience

Target Audience:
This course is mainly intended for Database Administrators, Database Developers, Business Intelligence professionals, QA professionals, Data Analysts, and other roles responsible for developing Hadoop solutions.

Prerequisites:
– Basic working knowledge with databases
– Familiarity with basic Linux command-line
– Familiarity with the SQL language concepts and syntax
– Prior knowledge of Hadoop is not required

Course Topics

Module 1 – Introduction to Big Data & Hadoop

  • Basic concepts

Module 2 – Data Analysis and using Pig

  • Introduction, Pig Vs. SQL, Using GRUNT, Executing HDFS commands

Module 3 – Implementing ETL processes with Pig

  • data types – scalar and complex, case sensitivity, comments, LOAD, STORE, DUMP, FOREACH, FILTER, GROUP, ORDER BY, JOIN, LIMIT, Pig Functions, FLATTEN, Nested FOREACH, COGROUP, UNION, SPLIT, Using Parameters, Macros, and ILLUSTRATE

Module 4 – Pig Tuning and Optimization 

  • Advanced tips and techniques

Module 5 – Analysing your data using Hive and Impala

  • Introduction to Hive and Impala architectures, data types, Schema On Read, databases, table management, internal Vs. external tables, Using Partitions and different storage formats, HiveQL.

Module 6 – Moving data into the cluster using Sqoop

  • Importing and exporting data using Sqoop
Detailed Course Outline

Hadoop Analysis detailed syllabus

© Copyright - Skilit - Site by Dweb