Opleiding: Advanced Apache Spark for Data Engineers

Get a deeper understanding of Apache Spark in order to optimize your data workflow.

In this course, you will explore techniques and best practices for optimizing Apache Spark applications. As you study the architectural elements of Spark, you will learn to work with the Spark UI. You will identify and address common performance issues caused by shuffles and skew. Advanced optimization strategies for join, union, and merge operations, data formats, caching mechanisms, garbage collector settings, data partitioning, bucketing, and Delta Lake optimizations are also covered. Additionally, you will explore regular maintenance tasks for Spark applications and learn how to customize Spark session configurations for optimal performance.
- Describe the architecture of a spark application. [Remember]
- Explain the structure and functionality of the Spark UI. [Understand]
- Predict common performance issues casued by shuffling and data skew. [Apply]
- Optimize join, union, and merge operations in Spark. [Analyze]
- Change the data format for optimal performance. [Apply]
- Implement caching mechanisms and garbage collector settings for enhanced performance. [Apply]
- Use data partitioning and bucketing in Spark workloads. [Apply]
- Apply Delta Lake optimiz…

Meer...
€1.530
ex. BTW
Aangeboden door
Info Support
Onderwerp
Apache Spark
Niveau
Duur
2 dagen
Taal
en
Type product
training
Lesvorm
Klassikaal
Aantal deelnemers
Max: 12
Tijdstip
Overdag
Tijden en locaties
Veenendaal
ma 29 sep. 2025
Veenendaal
ma 24 nov. 2025
Keurmerken aanbieder
Microsoft Learning Partner
Cedeo
Cedeo Open
Cedeo Maatwerk