CORE DQ Hub
Data Quality Workflows · Azure Databricks
Bayer · CORE
Data Quality Workflows
Select a workflow. Download your CSV, open the Databricks link, run the notebook, and get your results.

Vendor / Customer

CVI + CDQ

Paste IDs → download the input CSV → run the CVI notebook in Databricks → download the output → upload to CDQ for duplicate matching.

Open workflow

Email Cleaning

Manual Steps

Paste partner IDs → download the input CSV → run the Email Cleaning notebook in Databricks → download the output report.

Open workflow

Contact

Manual Steps

Paste partner IDs → download the input CSV → run the Contact notebook in Databricks → download the output report.

Open workflow

Full Process Guide

Reference

End-to-end reference: Databricks extraction → General Mapping → CVI linkage → CDQ duplicate matching and download rules.

View guide
Vendor / Customer
Fill in your IDs below → download the input CSV → run the notebook in Databricks → download CVI output → upload to CDQ.
Business Input Step 1

Enter IDs below and click Download Input CSV. Then go to Databricks, upload the CSV, run the notebook, and download the output.

Next Steps Steps 2–4
2
Upload CSV to Databricks & Run Notebook

Log into Databricks. Go to your workspace, upload the downloaded CSV to the FileStore, and run the CVI Extraction notebook.

Open Databricks Workspace
3
Download CVI Output from FileStore

After the notebook finishes, go to Databricks FileStore and download the CVI output CSV file.

Open Databricks FileStore
4
Upload CVI output to CDQ

Go to CDQ → Data Mirror Management → Upload using General Mapping. Then run Duplicate Matching and download Duplicate Consolidation.

Open CDQ Portal
CDQ Duplicate Matching — Steps 4 & 5 Steps 4–5
1
Upload to CDQ Data Mirror ManagementGo to CDQ → Data Mirror Management → Upload using General Mapping. NULL values must be included where applicable — do not replace with blank strings.
2
Configure Duplicate MatchingGo to Duplicate Matching → Upload Configuration File. Adjust fuzzy matching thresholds if results are over/under-matching.
3
Select matching modeSelf-match: select one data source. Cross-system (PMD vs P08): select Pattern + Candidate sources.
4
US data ruleRemove Tax Number 1–5 columns from the upload template for any US records before running matching.
5
Download → Duplicate ConsolidationAlways select Duplicate Consolidation as the download option. Validate and block/delete confirmed duplicates before migration.
Email Address Cleaning
Enter partner IDs → download the input CSV → run the Email Cleaning notebook in Databricks → download the output.
Manual workflow. Download the CSV below, log into Databricks, upload and run the notebook, then download the output report from FileStore.
Enter Partner IDs Step 1

Enter the BP / Partner numbers to check for Celaning email addresses, then download the CSV.

Run & Download Steps 2–4
2
Upload CSV & Run Email Notebook

Log into Databricks. Upload the CSV to FileStore and run the Email notebook in your workspace.

Open Databricks Workspace
3
Download Email Output

After the notebook completes, download the output CSV from Databricks FileStore.

Open Databricks FileStore
4
Validate & Action

Review the output. Block or merge duplicate email records in the source system before migration.

Contact Person
Enter partner IDs → download the input CSV → run the Contact notebook in Databricks → download the output.
Manual workflow. Download the CSV below, log into Databricks, upload and run the notebook, then download the output report from FileStore.
Enter Partner IDs Step 1

Enter BP / Partner numbers to identify duplicate contact persons, then download the CSV.

Run & Download Steps 2–4
2
Upload CSV & Run Contact Notebook

Log into Databricks. Upload the CSV to FileStore and run the Contact notebook.

Open Databricks Workspace
3
Download Contact Output

After the notebook finishes, download the output CSV from Databricks FileStore.

Open Databricks FileStore
4
Validate & Action

Review the output. Delete or merge duplicate contacts in the source system. Document findings before migration.

Full Process Guide
End-to-end reference — Databricks extraction through CDQ validation.
End-to-End StepsReference
1
Load Golden List into Databricks

Upload the Excel golden list to Databricks FileStore. Load with Pandas → convert to Spark DataFrame.

2
Map to General Mapping Template (CDQ Schema)

Cast and rename columns: Name (CONCAT NAME1–4), Country, City, Postal Code (STRING), Tax Numbers 1–5 (STRING — empty for US data), VAT Number.

3
CVI Linkage — Join Tables

Join PMD_but000_view + PMD_cvi_cust_link_view + PMD_cvi_vend_link_view. Always filter: OPtype ≠ 'D' to exclude deleted records.

4
Generate External ID

Formula: BP_Number + '_C' + Customer_Number + '_V' + Vendor_Number. Use COALESCE on all fields.

5
Create Final Template View

UNION ALL of Customer_View (KNA1) and Vendor_View (LFA1) into Final_Template.

6
Export CSV & Upload to CDQ

Export Final_Template as CSV. Upload via CDQ → Data Mirror Management → General Mapping.

Open CDQ Portal
7
Configure Duplicate Matching

Upload Configuration File. Set thresholds. Mode: self-match (one source) or linkage (Pattern + Candidate).

8
Download Results

Always select Duplicate Consolidation. Validate and block/delete duplicates before migration.

Key Rules
OPtype filter

Always exclude OPtype = 'D' in all CVI joins — deleted records only.

US Tax Data

Tax Number 1–5 must be NULL / empty for all US records before CDQ upload.

Data types

Cast Postal Code and all Tax Numbers to STRING — no numeric formatting.

Name format

CONCAT(COALESCE(NAME1,''), ' ', COALESCE(NAME2,''), ' ', COALESCE(NAME3,''), ' ', COALESCE(NAME4,''))

External ID NULLs

Always use COALESCE to handle NULL values in the External ID concat formula.

Download option

Always select "Duplicate Consolidation" when downloading results from CDQ.