Purpose of the Module
This module is used to create and customize the conditions for matching records.
Table of Contents
Step-by-Step: Create Configuration
Step 1: Activation
The activation function allows you to control whether a configuration actively participates in the record linkage process or not. This is particularly useful if you have prepared multiple configurations but only want to execute certain ones at the moment.
Important Notes:
- Deactivation does not delete the configuration – you can reactivate it at any time.
- Only active configurations are considered by the Record Linkage Runner.
- Check all parameters before activation to avoid faulty runs.
How to activate/deactivate a configuration:
- Open the desired configuration in the Record Linkage Configuration module.
- Set the Active switch to “Yes” or “No”.
- Save the changes.
Step 2: Select Source and Target Module
In this step, you define which data sources should be compared.
- Source-Modul: Contains the records to be matched.
- Target-Modul: Contains the records to compare against (e.g., Golden Records).
How to select the modules:
- Open the desired configuration in the Record Linkage Configuration module.
- In the Source Module field, select the data source to be matched.

- In the Target Module field, select the target data source (e.g., Golden Records). Important: There must be a relationship between the Target Module and the Source Module!

Step 3: Mapping Settings
With the mapping settings, you define how individual fields are considered during comparison. Here you specify which attributes are relevant, how strongly they are weighted, and which algorithms are used for matching.
Available options:
- Fields for comparison: Select only attributes that are highly significant for identification (e.g., name, email, address).
- Weighting (0–100): Determine the importance of each field for the overall score. Higher values = stronger influence.
- Data Typer: String or Numeric.
- Normalize Entity: Name, Street etc. – normalizes incoming data into a uniform format e.g., Str/Str./Strasse to Straße.
- Comparison algorithm: Choose the appropriate algorithm for the respective data type. A detailed description can be found here.
Schritt 4: Source Record Relation Filter
The Source Record Relation Filter defines which source records should be considered in the record linkage process.
Available options:
- “All”: Matching is performed for all source records for complete data cleansing.
- “Unlinked”: Only records that do not yet have a link to a target records are checked. This is useful for identifying new duplicates without changing existing links.
- “Linked”: Only records that already have at least one link are considered. This is helpful when you want to review or update existing links.
Step 5: Define Match Thresholds
Match thresholds determine the similarity score at which two records are considered a match. These thresholds control whether matching occurs automatically or requires manual review.
Threshold details:
- Upper threshold: All results with a score equal to or higher than this value are automatically accepted as matches and linked.
- Lower threshold: Results between the lower and upper threshold are considered potential matches and must be manually reviewed.
- Scores below the lower threshold are ignored (no match).
Best Practice: Choose thresholds so that true duplicates are automatically detected, while uncertain cases are flagged for manual review.
Step 6: Define Actions
With actions, you specify how the system should respond to the results of the record linkage process. There are two main areas:
On Match
This setting determines what happens when two records are recognized as a match (score ≥ upper threshold).
- Link: Automatically creates a link between the source and target module.
Example: If a customer record from the source module matches a Golden Record, a relationship is created. - Do Nothing: No action is taken; the records remain unchanged.
Note: The “Link” option is ideal if you want to automatically merge duplicates.
On no match
This setting applies when no matching target record is found (score < lower threshold).
- Create New: Creates a new record in the target module based on the configuration fields and links it to the source record.
Example: A new Golden Record is created if no matching entry exists. - No Action: Nothing is done; the source record remains unchanged.
Note: Use “Create New” if you want to ensure that all relevant data exists in the target module.
Algorithms
Choosing the right algorithm is crucial for the accuracy of the record linkage process. Depending on the data type (string or numeric), different methods are available. Below is an overview of the most important algorithms and recommendations for their use.
String Algorithms
These algorithms are used for text fields such as names, addresses, or company names:
| Algorithm | Description | When to use? |
|---|---|---|
| Jaro | Measures similarity based on matching characters and their order. | Short strings like first or last names with minor typos. |
| Jaro-Winkler | Extension of Jaro, gives more weight to common prefixes. | Names where initial letters are particularly important. |
| Levenshtein | Counts the minimum number of edits (insert, delete, replace). | Fields with minor typos or missing characters. |
| Damerau-Levenshtein | Like Levenshtein, but also considers transposed characters. | Requent typos caused by letter swaps. |
| Q-Gram | Compares overlapping substrings (e.g., bigrams). | Longer texts or addresses with variable word order. |
| Cosine Similarity | Compares token frequencies and calculates the angle between vectors. | Product descriptions or addresses with similar words but different order. |
Numeric Algorithms
These algorithms are used for numeric fields such as prices, age, or measurements:
| Algorithm | Description | When to use? |
|---|---|---|
| Step | Binary logic: within a defined range = match, otherwise 0. | When small deviations should be tolerated (e.g., ±5 years for age). |
| Linear | Similarity decreases evenly with difference. | For values where every deviation is weighted equally (e.g., prices). |
| Gaussian | Decrease follows a bell curve – small differences barely matter, large ones heavily penalized. | Measurements like weight or height where moderate differences are acceptable. |
| Squared | Penalizes large differences more strongly than linear. | When even moderate deviations should have a strong impact. |



