Opleiding: IBM InfoSphere QualityStage Essentials v11.7 [2M214G]
OVERVIEW
This course teaches how to build QualityStage parallel jobs that investigate, standardize, match, and consolidate data records. This course covers common data quality issues, QualityStage architecture, QualityStage clients and their functions, importing metadata, running jobs and reviewing results, building Investigate jobs, the Standardize stage and rule sets, identifying matching records and applying multiple Match passes, building a Survive job, and using a Two-Source match.
Students will gain experience by building an application that combines customer data from three source systems into a single master customer record.
OBJECTIVES
After completing this course, learners should be able to:
- List common data quality contaminants
- Describe QualityStage architecture, clients, and their functions
- Build and run DataStage and QualityStage jobs and review results
- Use Character Discrete, Concatenate, and Word Investigations to analyze data fields
- Build jobs using the Standardize stage
- Build a QualityStage job to identify matching records
- Interpret, improve, and consolidate match results
CONTENT
Unit 1 - Data Quality Issues
Unit 2 - QualityStage Overview
- Exercise 1: QualityStage Logon
Unit 3: Developing with QualityStage
- Exercise 1: Import Table Definition Metadata
- Exercise 2: Build a QualityStage Job
Unit 4: Investigate
- Exercise 1: Build Investigate Jobs
Unit 5: Standardize
- Exercise 1: Standardize Country
- Exercise 2: Select US Records
- Exercise 3: Standardize USPREP
- Exercise 4: Standardize USNAME, USADDR, and USAREA
- Exercise 5: Investigate unhandled Patterns
- Exercise 6: Apply Rule Set Override
Unit 6: Match
- Exercise 1: Create Match Frequency Job
- Exercise 2: One-Source Match Specification
- Exercise 3: Build One-Source Job using Match Specification
Unit 7: Survive
- Exercise 1: Survivorship
- Exercise 2: Create Customer Master Load File
Unit 8: Two-Source Match
- Exercise 1: Read the Case Study
- Exercise 2: Prepare the Data Environment
- Exercise 3: Run the Two-Source Match Job
