+91-9916812177 | contact@beingdatum.com

Data Engineering and Big Data Development

This course caters to learner’s need for learning data engineering and big data from scratch to the expert level.

Key Features

  • Instructor Led Training : 26 Hrs
  • Exercises & Project Work : 36 Hrs
  • Certification and Job Assistance
  • Flexible Schedule
  • Lifetime free upgrade
  • 24 x 7 Lifetime Support & Access
  • Certified by top MNCs
  • Cloud computing enabled real industry projects
  • CV and Profile revamp
  • Job referrals and assistance

Talk to us

Talk to us about this course by filling up below form:

 

Below is the curriculum of the course:

1.Hadoop , MapReduce, and PIG.

Why Is Data So Important?

Pre-Requisite – Data Scale

What Is Big Data?

Big Bank: Big Challenge

Common Problems

3 Vs Of Big Data

Defining Big Data

Sources Of Data Flood

Exploding Data Problem

Redefining The Challenges Of Big Data

Possible Solutions: Scaling Up Vs. Scaling Out

Challenges Of Scaling Out

Solution For Data Explosion-Hadoop

Hadoop: Introduction

Hadoop In Layman’s Term

Hadoop Ecosystem

Evolutionary Features Of Hadoop

Hadoop Timeline

Why Learn Big Data Technologies?

Who Is Using Big Data?

HDFS: Introduction

Design Of HDFS

Why Hadoop Cluster?

HDFS Blocks

Components Of Hadoop 1.X

NameNode And Hadoop Cluster

Arrangement Of Racks

Arrangement Of Machines And Racks

Local FS And HDFS

NameNode

Checkpointing

Replica Placement

Benefits-Replica Placement And Rack Awareness

URI

URL And URN

HDFS Commands

Problems With HDFS In Hadoop 1.X

HDFS Federation (Included In Hadoop 2.X)

HDFS Federation

High Availability

Anatomy Of File Read From HDFS

Data Read Steps

Important Java Classes To Write Data To HDFS

Anatomy Of File Write To HDFS

Writing File To HDFS: Steps

Building Principles

Introduction To MapReduce

MR Demo

Pseudo Code

Mapper Class

Reducer Class

Driver Code

InputSplit

InputSplit And Data Blocks – Difference

Why Is The Block Size 128 MB?

RecordReader

InputFormat

Default Inputformat : TextInputFormat

OutputFormat

Using A Different OutputFormat

Important Points

Partitioner

Using Partitioner

Map Only Job

Flow Of Operations In MapReduce

 

  1. Hive 

Serialization In MapReduce

Custom Writable In MapReduce

Custom WritableComparable In MapReduce

Schedulers In YARN

FIFO Scheduler

Capacity Scheduler

Fair Scheduler

Differences Between Hadoop 1.X And Hadoop 2.X

Introduction To Apache Pig

Why Pig?

Apache Pig Architecture

Simple Data Types

Complex Data Types

Sample Execution

Pig Operators Demo

Parameter Substitution

Macros

Anatomy Of Reduce-Side-Join

Job Optimizations In Pig

UDF’s In Pig

Execution Of XML And CSV Files In Pig

Introduction

Hive DDL

Demo: Databases.Ddl

Demo: Tables.Ddl

Hive Views

Demo: Views.Ddl

Architecture

Primary Data Types

Data Load

Demo: ImportExport.Dml

Demo: HiveQueries.Dml

Demo: Explain.Hql Table Types

Demo: ExternalTable.Ddl

Complex Data Types

Demo: Working With Complex Datatypes

Hive Variables

Demo: Working With Hive Variables

Hive Variables And Execution Customisation

 

  1. Hive advanced and Hbase

Working With Arrays

Sort By And Order By

Distribute By And Cluster By

Partitioning

Static And Dynamic Partitioning

Bucketing Vs Partitioning

Joins And Types

Bucket-Map Join

Sort-Merge-Bucket-Map Join

Left Semi Join

Demo: Join Optimisations

Input Formats In Hive

Sequence Files In Hive

RC File In Hive

File Formats In Hive

ORC Files In Hive

Inline Index In ORC Files

ORC File Configurations In Hive

SerDe In Hive

Demo: CSVSerDe

JSONSerDe

RegexSerDe

Analytic And Windowing In Hive

Demo: Analytics.Hql

Hcatalog In Hive

Demo: Using_HCatalog

Accessing Hive With JDBC

Demo: HiveQueries.Java

HiveServer2 and Beeline

Demo: Beeline

UDF In Hive

Demo: ToUpper.Java And Working_with_UDF

Optimizations In Hive

Demo: Optimizations

Challenges With Traditional RDBMS

Features Of NoSQL Databases

NoSQL Database Types

CAP Theorem

What Is HBase Regions

HBase HMaster ZooKeeper

HBase First Read

HBase Meta Table

Region Split

Apache HBase Architecture Benefits

HBase Vs. RDBMS

Shell Commands

 

4. Oozie,sqoop

Introduction To Oozie

Oozie Architecture

Oozie Workflow Nodes

Oozie Server

Oozie Workflow

Sqoop Architecture

Sqoop Features

Sqoop Hands-On

 

Introduction to Function Programming Language And Scala

Functional Vs OOP

Variable

Functions

Using If

While To Define Logic

Loops In Scala

Collections In Scala

 

5.scala

Object-Oriented Programming

Classes And Objects

Traits In Scala

Constructors In Scala

Method Overloading

Implicit Parameter Usage

Inheritance – OOP

Override Modifier

Polymorphism

Invoking Superclass Methods

Final Members

Traits In Detail

Control Structures In Detail

Exception Handling

Coding Without Break And Continue

Coding The Functional Way

Case Classes In Scala

Implicit Conversions And Implicit

Parameter In Depth

 

 

6.spark

Introduction To Apache Spark

Map Reduce Limitations

RDD’s

Spark Context – SQLContext And HiveContext

Programming With RDD’s

Creating RDD’s From Text-Files

Transformations And Actions

How Does Spark Execution Work

RDD API’s – Filter

FlatMap

Fold

Foreach

Glom

GroupBy

Map

ReduceByKey

Zip

Persist

Unpersist

Read/Write From Storage

RDD Examples

RDD API’s – Aggregate

Cartesian

Checkpoint

Coalesce

Reparation

Cogroup

CollectAsMap

CombineByKey

Count And CountApprox Functions

More RDD Examples

Schema – StructType

StructFields

DataType

DataFrame API’s And Examples

 

7. spark SQL, machine learning, spark graphX

Create Temporary Tables

SparkSQL

Parquet Vs Avro

Examples And Problem Solving On Real Data Using RDD And Converting The Same To Dataframe

Create A Spark Project

SBT / Maven

How Do Maven Repo Work

Accumulators

Broadcast Variables

Query Execution Plan

Internal Of Spark Workings

8 . Industry projects and deployment.

9 . Kafka

10. Advanced topics

Cloud Computing for Big Data Services in AWS, Azure, and GCP

INDUSTRY PROJECTS

Real-world industry Projects and deployment in AWS and Azure Cloud

 

Sample Certificate:

POPULAR KEYWORDS RELATED TO THE COURSE:

big data and Hadoop, big data applications, Hadoop, spark big data, python big data,aws big data, pig big data, big data processing, big data and data science, big data analytics

Course Curriculum

No curriculum found !

Course Reviews

N.A

ratings
  • 5 stars0
  • 4 stars0
  • 3 stars0
  • 2 stars0
  • 1 stars0

No Reviews found for this course.

TAKE THIS COURSE
  • 15,000.00 5,000.00
  • 2 months
  • Course Certificate
31 STUDENTS ENROLLED
© BeingDatum. All rights reserved.
X