Call Us on +91-9705299988

Email :

Courses Details

Spark with Scala

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

Hadoop Course Content: Concepts

Understanding Big Data and Hadoop

1.Big Data, Limitations and Solutions of existing Data Analytics Architecture, 2.Hadoop, 3.Hadoop Features, 4.Hadoop Ecosystem, 5.Hadoop 2.x core components, 6.Hadoop Storage: HDFS, 7.Hadoop Processing 8.MapReduce Framework 9.Hadoop Different Distributions.

Hadoop requirements

  1. Linux commands
30 Essential Linux Basic Commands You Must Know  
  1. vmware
  • Basics
  • Installations
  • Backups
  1. sql basics
  • Introduction to SQL
  • MySQL Essentials
  • Database Fundamentals
  1. Hands on exercise and Assignments

Hadoop Architecture and HDFS

  1. Hadoop 2.x Cluster Architecture
  2. Federation and High Availability,
  3. A Typical Production Hadoop Cluster,
  4. Hadoop Cluster Modes,
  5. Common Hadoop Shell Commands,
  6. Hadoop 2.x Configuration Files,
  7. Single node cluster and Multi node cluster set up Hadoop Administration.
Hands on exercise and Assignments

Hadoop MapReduce Framework

  1. MapReduce Use Cases,
  2. Traditional way Vs MapReduce way,
  3. Why MapReduce,
  4. Hadoop 2.x MapReduce Architecture,
  5. Hadoop 2.x MapReduce Components,
  6. YARN MR Application Execution Flow,
  7. YARN Workflow,
  8. Anatomy of MapReduce Program,
  9. Demo on MapReduce.
  10. Input Splits,
  11. Relation between Input Splits and HDFS Blocks,
  12. MapReduce Combiner & Partitioner,
Hands on exercise and Assignments


  1. About Pig,
  2. MapReduce Vs Pig,
  3. Pig Use Cases,
  4. Programming Structure in Pig,
  5. Pig Running Modes,
  6. Pig components,
  7. Pig Execution,
  8. Pig Latin Program,
  9. Data Models in Pig,
  10. Pig Data Types,
  11. Shell and Utility Commands,
  12. Pig Latin Relational Operators,
  13. File Loaders,
  14. Group Operator,
  15. COGROUP Operator,
  16. Joins and COGROUP,
  17. Union,
  18. Diagnostic Operators,
  19. Specialized joins in Pig,
  20. Hands on exercise and Assignments


  1. Hive Background,
  2. Hive Use Case,
  3. About Hive,
  4. Hive Vs Pig,
  5. Hive Architecture and Components,
  6. Metastore in Hive,
  7. Limitations of Hive,
  8. Comparison with Traditional Database,
  9. Hive Data Types and Data Models,
  10. Partitions and Buckets,
  11. Hive Tables(Managed Tables and External Tables),
  12. Importing Data,
  13. Querying Data,
  14. Managing Outputs,
  15. Hive Script,
  16. Hive UDF,
  17. Retail use case in Hive,
Hands on exercise and Assignments

Advanced Hive and HBase

  1. Hive QL: Joining Tables,
  2. Dynamic Partitioning,
  3. Custom Map/Reduce Scripts,
  4. Hive Indexes and views
  5. Hive query optimizers,
  6. User Defined Functions,
  7. HBase:
  8. Introduction to NoSQL
  9. Databases and HBase,
  10. HBase v/s RDBMS,
  11. HBase Components,
  12. HBase Architecture,
  13. Run Modes & Configuration,
  14. HBase Cluster Deployment.
Hands on exercise and Assignments

Advanced HBase

  1. HBase Data Model,
  2. HBase Shell,
  3. HBase Client API,
  4. Data Loading Techniques,
  5. ZooKeeper
  6. Demos on Bulk Loading,
  7. Getting and Inserting Data,
  8. Filters in HBase.
  9. Hands on exercise and Assignments
  1. Import Data.
  2. Export Data.
  3. Sqoop Syntax.
  4. Databases connection.
Hands on exercise and Assignments


  1. .Introduction to Impala
  2. .Impala Configuration
  3. .Comparison between Hive and Impala
  4. .Impala Commands
Hands on exercise and Assignments

Processing Distributed Data with Apache Spark

  1. What is Apache Spark,
  2. Spark Ecosystem,
  3. Spark Components,
  4. History of Spark
  5. Spark Versions/Releases,
  6. What is Scala?,
  7. Why Scala?,
  8. SparkContext,
  9. Spark Sql
Hands on exercise and Assignments.

Flume & solr

  1. Configuration and Setup
  2. Flume Sink with example
  3. Channel
  4. Flume Source with example
  5. Complex flume architecture
Streaming data storing into solr
  1. customization of solr
Hands on exercise and Assignments


  1. Introduction to Hue
  2. Advantages of Hue
  3. Hue Web Interface
  4. Ecosystems in Hue
Hands on exercise and Assignments


  1. Oozie,
  2. Oozie Components,
  3. Oozie Workflow,
  4. Scheduling with Oozie,
  5. Demo on Oozie Workflow,
  6. Oozie Co-ordinator,
  7. Oozie Commands,
  8. Oozie Web Console,
  9. Oozie for MapReduce,
  10. PIG, Hive, and Sqoop,
  11. Combine flow of MR, PIG, Hive in Oozie
Hands on exercise and Assignments


  1. Tableau Fundamentals
  2. Tableau Analytics.
  3. Visual Analytics.
Hands on exercise and Assignments



Hadoop Project

Hadoop -Tableau live integration Topics : This is a project that gives you opportunity to work on retail data analytics.   : 1.Hadoop Integration with Tableau



Multi-node cluster setup Topics : This is a project that gives you opportunity to work on real world Hadoop multi-node cluster setup in a distributed environment.   ·         Running a Hadoop multi-node using a 4 node cluster ·         Deploying of MapReduce job on the Hadoop cluster ·         You will get a complete demonstration of working with various Hadoop cluster master and slave nodes, installing Java as a prerequisite for running Hadoop, installation of Hadoop and mapping the nodes in the Hadoop cluster.

Hadoop Project3

Social media analytics   Topics : This is a project that gives you opportunity to work on  social media Analytics. ·         Streaming Twitter data ·         Store data into hadoop ·         Process social media data ·         Sentiment analysis on twitter data ·         Final result store in table ·         Connect BI Tool.


  • Instructor-led Sessions.
  • Real Time Case Studies
  • Assignments
  • 24 x 7 Expert Support

Content here 4