---
title: "Duplicate Handling Technical Documentation"
slug: "duplicate-handling-technical-documentation"
updated: 2026-01-29T16:10:42Z
published: 2026-01-29T16:10:42Z
canonical: "knowledge-base.rossum.ai/duplicate-handling-technical-documentation"
---

> ## Documentation Index
> Fetch the complete documentation index at: https://knowledge-base.rossum.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Duplicate Handling Technical Documentation

Detailed documentation of the duplicate handling extension configuration options.

It is possible to define a list of configurations. All of them are encapsulated in the `configurations` attribute:

```json
{  
  "configurations": [  
    {...},  
    {...},  
    {...}	  
  ]  
}
```

## Configuration parameters

Each configuration can use the following attributes.

| Attribute | Type | Required | Description |
| --- | --- | --- | --- |
| **queues** | list | false | IDs of queues where the duplicate detection should be performed. You can assign the extension to multiple queues and specify numerous actions for different queues in one instance. If not present, the duplicate detection will be performed on all the queues to which the extension is assigned unless `excluded_queues` is set. |
| **excluded_queues** | list | false | IDs of queues where the duplicate detection won't be performed. This parameter cannot be set along with the queue_ids. |
| **trigger_events** | list | false | List of Rossum events that trigger this configuration. Corresponds to selected [hook events](https://elis.rossum.ai/api/docs/#webhook-events). If not specified, the duplicate detection will be performed for all allowed `trigger_events`. Possible values: - `annotation_status` - `annotation_content` |
| **trigger_actions** | list | false | List of Rossum actions that trigger this configuration. Corresponds to selected [hook actions](https://elis.rossum.ai/api/docs/#webhook-events). If not specified, the duplicate detection will be performed for all actions relevant for a selected `trigger_event`. Possible values: - changed - initialize - updated - started - confirm - export - user_update |
| **statuses** | list | false | List of annotation status-couples in the format `\{previous status}.\{current status\}` or `\{previous status}-&gt;\{current status\}` defining when the processing of logic should start. If not present, duplicate detection will be performed for all allowed status changes. |
| **logic** | list | false | List of logic objects. More details are in the section below. |

## Logic parameters

Detailed specification of the logic object attributes.

| Attribute | Type | Required | Default value | Description |
| --- | --- | --- | --- | --- |
| **rules** | list | false | `[]` | List of rules based on which the duplicates will be detected. More details are in the section below. |
| **scope** | object | false |  | The scope of the duplicate detection logic is where the extension will look for duplicates. The section below provides more details. If not specified, the default `scope` will be used. |
| **timestamp** | object | false | `\{\}` | How long in the past the extension should look for duplicates. More details are in the section below. |
| **matching_flow** | list | false |  | Definition of how to combine duplicates detected by individual rules. It can be a list of IDs or logical and operations between rule ids. Individual list elements behave like logical or operations. |
| **search_query** | object | false |  | **[Beta]** Alternatively, you can also define a search_query instead of the combination of `rules`, `matching_flow`, `scope`, and `timestamp`. More details are in the section below. If defined, such configuration will be **preferred**. |
| **actions** | list | true |  | List of actions to be performed if duplicates were detected. |

## Rule parameters

Detailed specification of the rule object attributes.

| Attribute | Type | Required | Description |
| --- | --- | --- | --- |
| **id** | int | true | ID of the rule object. This can be referred to in the `matching_flow` field. IDs must be unique. |
| **attribute** | enum | true | Type of the rule. Possible values: - field - relation - filename |
| **field_schema_id** | string |  | Required for `field` rule. The `schema_id` of the field that should be used to compare for duplicates. |

## Scope parameters

Detailed specification of the scope object attributes.

| Attribute | Type | Required | Default value | Description |
| --- | --- | --- | --- | --- |
| **object** | enum | false | `queue` | Scope, where the extension should look for duplicates. It is recommended to use the `queue` scope and define as few queues as possible. Detecting duplicates in the whole workspace or organization can be very expensive. Possible values: - `queue` - `workspace` - `organization` |
| **ids** | list | false |  | The object's IDs (queue/workspace) should be considered during detection. **Default** is the context of the incoming annotation. |
| **statuses** | list | false |  | The statuses of annotations that should be considered during detection. **Default** is all. Possible values: - `created` - `importing` - `failed_import` - `split` - `to_review` - `reviewing` - `in_workflow` - `confirmed` - `rejected` - `exporting` - `exported` - `failed_export` - `postponed` - `deleted` - `purged` |

## Timestamp parameters

Detailed specification of the timestamp object attributes. If not specified, it searches without any limit.

| Attribute | Type | Required | Description |
| --- | --- | --- | --- |
| **action** | enum | false | What [annotation](https://elis.rossum.ai/api/docs/#api-reference) date condition should be considered when checking the timespan. Possible values: - `arrived_at_before` - `arrived_at_after` - `assigned_at_before` - `assigned_at_after` - `confirmed_at_before` - `confirmed_at_after` - `modified_at_before` - `modified_at_after` - `exported_at_before` - `exported_at_after` |
| **timespan** | int | false | The time span in days for the selected date condition |

## Search query parameters

**[Beta]** Search query for the duplicate detection. Uses more powerful searching based on the [/search endpoint](https://elis.rossum.ai/api/docs/#search-for-annotations).

If this object is defined, it will be **preferred** for duplicate detection. `matching_flow`, `rules`, `scope` and `timestamp` objects will be ignored.

It is possible to reference some various values using the following syntax easily:

- [annotation](https://elis.rossum.ai/api/docs/#annotation) and [document](https://elis.rossum.ai/api/docs/#document) object attributes using the dot . notation: `{annotation.queue}` or `{document.created_at}`
- datapoint values using the `@` operator: `@{invoice_id}`
- datetime features
  - `now` - current timestamp
  - `timedelta` - python standard [timedelta object](https://docs.python.org/3/library/datetime.html#timedelta-objects)

| Attribute | Type | Required | Default value | Description |
| --- | --- | --- | --- | --- |
| **query** | object | false | `{}` | A subset of MongoDB Query Language. See [Rossum api docs](https://elis.rossum.ai/api/docs/#search-for-annotations) for specifics and more details. |
| **query_string** | object | false | `{}` | Object with configuration for full-text search. See [Rossum api docs](https://elis.rossum.ai/api/docs/#search-for-annotations) for specifics and more details. |

## Actions parameters

All actions have one common required attribute - type - specifying the type of action that will be performed. Other attributes vary based on the type. Possible action types:

- `apply_label`
- `fill_field`
- `forward_annotation`
- `mark_duplicate`
- `show_message`
- `stop_automation`

## Apply label action

Apply a [label](https://elis.rossum.ai/api/docs/#label) to the annotation that triggered the hook.

| Attribute | Type | Required | Description |
| --- | --- | --- | --- |
| **type** | enum | true | The type of action that should be performed. Possible value: `apply_label`. |
| **label** | string | true | `name` of the [label](https://elis.rossum.ai/api/docs/#label) that should be applied |

## Fill field action

Fill a field with a defined value.

The following placeholders can be used in the `value_to_fill` attribute:

- `duplicate_ids` - list of IDs of detected duplicate annotations
- `now` - current datetime object. Additional operations can follow.
- `timedelta` - timedelta object. Allows for operations like `+` or `-` together with `now`.
- `annotation` - nested object containing attributes of the [annotation](https://elis.rossum.ai/api/docs/#annotation) that triggered the hook.
- `%ANNOTATION_ID%` - expression to fill list of detected duplicates

**Example**: `Detected {duplicate_ids | length} duplicates.&lt;br&gt; Date of detection: {now.strftime('%m/%d/%Y')}.`

Detailed specification of the `fill_field` action attributes:

| Attribute | Type | Required | Default value | Description |
| --- | --- | --- | --- | --- |
| **type** | string | true |  | The type of action that should be performed. Possible value: `fill_field` |
| **fill_only_if_empty** | bool | false | false | If true, the field will only be filled if it was originally empty. |
| **field_to_fill** | string | true |  | The schema ID of the datapoint in Rossum schema to fill with custom value if duplicate is detected. |
| **value_to_fill** | string | true |  | The custom value to be filled to the datapoint in `field_to_fill`. Various placeholders can be used to create the value. |

## Forward annotation action

Forward the annotation to a different queue and/or different status.

Detailed specification of the `forward_annotation` action attributes:

| Attribute | Type | Required | Description |
| --- | --- | --- | --- |
| **type** | string | true | The type of action that should be performed. Possible value: `forward_annotation`. |
| **target_queue** | inst | false | ID of the target queue where the duplicate annotation should be placed. If not specified, the current annotation queue will be used. |
| **target_status** | string | false | The target status of the annotation after it is forwarded to another queue. By default, the status is kept the same. If not specified, the current annotation status will be used. |

## Mark duplicate action

Create a new [relation](https://elis.rossum.ai/api/docs/#relation) object. Ignored if the relation already exists.

| Attribute | Type | Required | Description |
| --- | --- | --- | --- |
| **type** | string | true | The type of action that should be performed. Possible value: `mark_duplicate` |

## Show message action

Show a message to the user in the UI. The type and content of the message can be customized.

Following placeholders can be used in the `message`` attribute:

- `duplicate_ids` - list of IDs of detected duplicate annotations
- `now` - current datetime object. Additional operations can follow.
- `timedelta` - timedelta object. Allows for operations like `+` or `-` together with `now`.
- `annotation` - nested object containing attributes of the [annotation](https://elis.rossum.ai/api/docs/#annotation) that triggered the hook.
- `%ANNOTATION_ID%` - expression to fill list of detected duplicates

Detailed specification of the `show_message` action attributes:

| Attribute | Type | Required | Default value | Description |
| --- | --- | --- | --- | --- |
| **type** | string | true |  | The type of action that should be performed. Possible value: `show_message`. |
| **message** | string | true |  | Content of the message that will be displayed. Various placeholders can be used to create the message. |
| **message_type** | enum | true | info | Possible values: `error`, `info`, `warning`. |

## Stop automation action

Stop the automation by returning an error message. Only applicable for `annotation_content.initialize` event.

| Attribute | Type | Required | Description |
| --- | --- | --- | --- |
| **type** | string | true | The type of action that should be performed. Possible value: `stop_automation` |

## Configuration examples

Simple detection with error message as an action

This configuration is triggered only on the `initialize`, `started` or `updated` actions of `annotation_content` event. It will detect duplicates based on the `invoice_id` field. If any other annotation in the same queue has the same `invoice_id`, an error will be shown in the annotation screen.

```json
{
    "configurations": [
        {
            "trigger_events": ["annotation_content"],
            "trigger_actions": ["initialize", "started", "updated"],
            "logic": [
                {
                    "rules": [
                        {
                            "attribute": "field",
                            "field_schema_id": "invoice_id"
                        }
                    ],
                    "actions": [
                        {
                            "type": "show_message",
                            "message_type": "error",
                            "message": "Detected {duplicate_ids | length} duplicates"
                        }
                    ]
                }
            ]
        }
    ]
}
```

## Complex detection logic with multiple actions

This configuration is triggered only on the `changed` action of `annotation_status` event and only if the status is changed from `importing` to `to_review` or from `to_review` to `postponed`.

It will detect duplicates based on a combination of several rules. The detection will be scoped on the queue with ID `12345` and only annotations in selected statuses will be considered. The detection will also search only among annotations that arrived at Rossum in the last 60 days. If duplicates are detected, multiple actions will be performed in the order defined in the configuration.

```json
{
    "configurations": [
        {
            "trigger_events": ["annotation_status"],
            "trigger_actions": ["changed"],
            "statuses": ["importing->to_review", "to_review->postponed"],
            "logic": [
                {
                    "matching_flow": ["1and2and3", "4"],
                    "rules": [
                        {
                            "id": 1,
                            "attribute": "field",
                            "field_schema_id": "invoice_id"
                        },
                        {
                            "id": 2,
                            "attribute": "field",
                            "field_schema_id": "sender_name"
                        },
                        {
                            "id": 3,
                            "attribute": "filename"
                        },
                        {
                            "id": 4,
                            "attribute": "relation"
                        }
                    ],
                    "scope": {
                        "object": "queue",
                        "ids": [12345],
                        "statuses": ["confirmed", "exported", "deleted"]
                    },
                    "timestamp": {
                        "action": "arrived_at_after",
                        "timespan": 60
                    },
                    "actions": [
                        {
                            "type": "fill_field",
                            "field_to_fill": "duplicate",
                            "value_to_fill": "Duplicate of %ANNOTATION_ID%"
                        },
                        {
                            "type": "forward_annotation",
                            "target_queue": 123456,
                            "target_status": "postponed"
                        },
                        {
                            "type": "mark_duplicate",
                            "message": "Marked as duplicate"
                        },
                        {
                            "type": "show_message",
                            "message_type": "error",
                            "message": "Detected {duplicate_ids | length} duplicates"
                        }
                    ]
                }
            ]
        }
    ]
}
```

## Detection based on Search Query

Such configuration is triggered only on the `initialize`, `started` or `updated` actions of the `annotation_content` event. It will detect duplicates based on the `invoice_id` field among annotations in the same queue and created in the last 200 days (`{(now - timedelta(days=200)).isoformat()}`).

```json
{
    "configurations": [
        {
            "trigger_events": ["annotation_content"],
            "trigger_actions": ["initialize", "started", "updated"],
            "logic": [
                {
                    "search_query": {
                        "query": {
                            "$and": [
                                {
                                    "field.invoice_id.string": {
                                        "$eq": "@{invoice_id}"
                                    }
                                },
                                {
                                    "queue": {
                                        "$in": ["{annotation.queue}"]
                                    }
                                },
                                {
                                    "created_at": {
                                        "$gt": "{(now - timedelta(days=200)).isoformat()}"
                                    }
                                }
                            ]
                        }
                    },
                    "actions": [
                        {
                            "type": "show_message",
                            "message_type": "error",
                            "message": "Duplicates detected: %ANNOTATION_ID%"
                        }
                    ]
                }
            ]
        }
    ]
}
```

## Detection based on Search Query — Gotchas

- **Rossum API: POST**`/annotations/search`**(query syntax, meta fields, field typing)**
  - [https://rossum.app/api/docs/#tag/Annotation/operation/annotations_search](https://rossum.app/api/docs/#tag/Annotation/operation/annotations_search)

### Normalized vs. raw datapoint values (the most common pitfall)

Rossum content datapoints can contain both:

- `value` (raw, as seen on the document, e.g. `"1 500.00"`, `"15.1.2024"`)
- `normalized_value` (machine-normalized, e.g. `"1500.00"`, `"2024-01-15"`)

In our setup:

- json-templating `@{schema_id}` resolves to `normalized_value` **when available**, otherwise `value`.
- The search index behind `field.&lt;schema_id&gt;.string` matches the **raw** `value` formatting.

Result:

- `field.amount_due.string == "@{amount_due}"` can become `"1 500.00" == "1500.00"` → **no match**.
- `field.date_issue.string == "@{date_issue}"` can become `"15.1.2024" == "2024-01-15"` → **no match**.

**Fix**: Use typed fields for normalized matching:

- Amounts: use `.number`
- Dates: use `.date`

**Example**:

```plaintext
{
  "search_query": {
    "query": {
      "$and": [
        {"field.invoice_id.string": {"$eq": "@{invoice_id}"}},
        {"field.amount_due.number": {"$eq": "@{amount_due}"}},
        {"field.date_issue.date": {"$eq": "@{date_issue}"}},
        {"queue": {"$in": ["{annotation.queue}"]}}
      ]
    }
  }
}
```

> [!WARNING]
> 💬 YOUR FEEDBACK MATTERS!
> 
> Help us keep this page accurate and useful. Select **Yes** or **No** below, then use the feedback form to propose a correction, ask for clarification, or request a new article.
