Opleiding: PySpark for Big Data

In the course PySpark for Big Data participants learn to use Apache Spark from Python.

Spark Architecture

The course PySpark for Big Data discusses the architecture of Spark, the Spark Cluster Manager and the difference between Batch and Stream Processing.

Hadoop

After a discussion of the Hadoop Distributed File System, parallel operations and working with RDDs, Resilient Distributed Datasets are discussed in the course PySpark for Big Data. The configuration of PySpark applications via SparkConf and SparkContext is also explained.

MapReduce en SQL

Extensive consideration is given to the possible operations on RDDs, including map and reduce. The use of SQL in Spark is also discussed. The GraphX library is discussed and DataFrames is discussed. Iterative algorithms are also treated.

Mlib library

Finally the course PySpark for Big Data pays attention to machine learning with the Mlib library.

Audience PySpark for Big Data

The course PySpark for Big Data is intended for developers and upcoming Data Analysts who want to learn how to use Apache Spark from Python.

Prerequisites training PySpark for Big Data

To participate in this course, some experience with programming is beneficial for understanding. Prior knowledge of Python or big data handling with Apache Spark is not required.

Realization course PySpark for Big Data

The theory is treated on the basis of presentations. Illustrative demos are used to clarify the concepts discussed. There is ample opportunity to practice and alternate theory and practice. The course times are from 9.30 am to 4.30 pm.

Certification course PySpark for Big Data

Participants receive an official certificate PySpark for Big Data after successful completion of the course.

Modules

Module 1 : Python Primer

Python Syntax
Python Data Types
List, Tuples, Dictionaries
Python Control Flow
Functions and Parameters
Modules and Packages
Comprehensions
Iterators and Generators
Python Classes
Anaconda Environment
Jupyter Notebooks

Module 2 : Spark Intro

What is Apache Spark?
Spark and Python
PySpark
Py4j Library
Data Driven Documents
RDD's
Real Time Processing
Apache Hadoop MapReduce
Cluster Manager
Batch versus Stream Processing
PySpark Shell

Module 3 : HDFS

Hadoop Environment
Environment Setup
Hadoop Stack
Hadoop Yarn
Hadoop Distributed File System
HDFS Architecture
Parallel Operations
Working with Partitions
RDD Partitions
HDFS Data Locality
DAG (Direct Acyclic Graph)

Module 4 : SparkConf

SparkConf Object
Setting Configuration Properties
Uploading Files
SparkContext.addFile
Logging Configuration
Storage Levels
Serialize RDD
Replicate RDD partitions
DISK_ONLY
MEMORY_AND_DISK
MEMORY_ONLY

Module 5 : SparkContext

Main Entry Point
Executor
Worker Nodes
LocalFS
SparkContext Parameters
Master
RDD serializer
batchSize
Gateway
JavaSparkContext instance
Profiler

Module 6 : RDD’s

Resilient Distributed Datasets
Key-Value pair RDDs
Parallel Processing
Immutability and Fault Tolerance
Transformation Operations
Filter, groupBy and Map
Action Operations
Caching and persistence
PySpark RDD Class
count, collect, foreach,filter
map, reduce, join, cache

Module 7 : Spark Processing

SQL support in Spark
Spark 2.0 Dataframes
Defining tables
Importing datasets
Querying data frames using SQL
Storage formats
JSON / Parquet
GraphX
GraphX library overview
GraphX APIs

Module 8 : Broadcast and Accumulator

Performance Tuning
Serialization
Network Traffic
Disk Persistence
MarshalSerializer
Data Type Support
Python’s Pickle Serializer
DStreams
Sliding Window Operations
Multi Batch and State Operations

Module 9 : Algorithms

Iterative Algorithms
Graph Analysis
Machine Learning API
mllib.classification
Random Forest
Naive Bayes
Decision Tree
mllib.clustering
mllib.linalg
mllib.regression

Meer...

Nu inschrijven

Informatie aanvragen

€2.450

ex. BTW

Aangeboden door

SpiralTrain

Onderwerp

Big Data

Niveau

Duur

3 dagen

Looptijd

18 dagen

Taal

Type product

cursus

Lesvorm

Klassikaal

Aantal deelnemers

Max: 12

Tijdstip

Overdag

Tijden en locaties

Amsterdam

ma 15 jun. 2026

Eindhoven

ma 15 jun. 2026

Houten

ma 15 jun. 2026

Rotterdam

ma 15 jun. 2026

Utrecht

ma 15 jun. 2026

Zwolle

ma 15 jun. 2026

Amsterdam

ma 17 aug. 2026

Eindhoven

ma 17 aug. 2026

Houten

ma 17 aug. 2026

Rotterdam

ma 17 aug. 2026

Utrecht

ma 17 aug. 2026

Zwolle

ma 17 aug. 2026

Amsterdam

ma 12 okt. 2026

Eindhoven

ma 12 okt. 2026

Houten

ma 12 okt. 2026

Rotterdam

ma 12 okt. 2026

Utrecht

ma 12 okt. 2026

Zwolle

ma 12 okt. 2026

Amsterdam

ma 14 dec. 2026

Eindhoven

ma 14 dec. 2026

Houten

ma 14 dec. 2026

Rotterdam

ma 14 dec. 2026

Utrecht

ma 14 dec. 2026

Zwolle

ma 14 dec. 2026

Amsterdam

ma 15 feb. 2027

Eindhoven

ma 15 feb. 2027

Houten

ma 15 feb. 2027

Rotterdam

ma 15 feb. 2027

Utrecht

ma 15 feb. 2027

Zwolle

ma 15 feb. 2027

Amsterdam

ma 12 apr. 2027

Eindhoven

ma 12 apr. 2027

Houten

ma 12 apr. 2027

Rotterdam

ma 12 apr. 2027

Utrecht

ma 12 apr. 2027

Zwolle

ma 12 apr. 2027

Amsterdam

ma 14 jun. 2027

Eindhoven

ma 14 jun. 2027

Houten

ma 14 jun. 2027

Rotterdam

ma 14 jun. 2027

Utrecht

ma 14 jun. 2027

Zwolle

ma 14 jun. 2027

Amsterdam

ma 16 aug. 2027

Eindhoven

ma 16 aug. 2027

Houten

ma 16 aug. 2027

Rotterdam

ma 16 aug. 2027

Utrecht

ma 16 aug. 2027

Zwolle

ma 16 aug. 2027

Amsterdam

ma 11 okt. 2027

Eindhoven

ma 11 okt. 2027

Houten

ma 11 okt. 2027

Rotterdam

ma 11 okt. 2027

Utrecht

ma 11 okt. 2027

Zwolle

ma 11 okt. 2027

Amsterdam

ma 13 dec. 2027

Eindhoven

ma 13 dec. 2027

Houten

ma 13 dec. 2027

Rotterdam

ma 13 dec. 2027

Utrecht

ma 13 dec. 2027

Zwolle

ma 13 dec. 2027

Amsterdam

ma 14 feb. 2028

Eindhoven

ma 14 feb. 2028

Houten

ma 14 feb. 2028

Rotterdam

ma 14 feb. 2028

Utrecht

ma 14 feb. 2028

Zwolle

ma 14 feb. 2028

Amsterdam

ma 17 apr. 2028

Eindhoven

ma 17 apr. 2028

Houten

ma 17 apr. 2028

Rotterdam

ma 17 apr. 2028

Utrecht

ma 17 apr. 2028

Zwolle

ma 17 apr. 2028

Amsterdam

ma 12 jun. 2028

Eindhoven

ma 12 jun. 2028

Houten

ma 12 jun. 2028

Rotterdam

ma 12 jun. 2028

Utrecht

ma 12 jun. 2028

Zwolle

ma 12 jun. 2028

Keurmerken aanbieder

NRTO

UWV scholingsvoucher