Extraction Schema Editor in Rossum

Prev Next

The extraction schema lets you define the data to capture from documents uploaded to Rossum and other values to enrich extraction results (e.g., fields for values added by extensions). Usually, each queue has its own schema, but you can link one schema with multiple queues if needed.

How to access and use our schema editor

To customize a schema, go to Queue Settings and find the editor under the Fields tab.

This tab shows a list of fields in your extraction schema, organised by section. For example, the Basic Information section might include fields like Document Type, Document Language, and Document ID etc.

You can configure several elements in the main tab, described below.

Section

Hover over a section name to see five icons to adjust or access the section.

  • Eye: Make all fields in the section visible/hidden on the validation screen

  • Square with arrow: Include/exclude all fields in the section from export results

  • Asterisk/OPT: Make all fields in the section required or optional

  • Three dots: Delete section

  • Arrow: Go to the section details / edit the section

You can also add a new section.

Fields

Hover over a field to see options for:

  • Dragging and dropping the field to change its position in the schema

  • Checking the field type

  • Eye: Making the field hidden/visible

  • Square with arrow: Included/excluded from export

  • Asterisk/OPT: Required/optional

  • Three dots: Edit JSON / Delete section

  • Arrow: Go to the field details / edit the field

  • Adding a new field

JSON Editor

This is an alternative way to adjust your schema for users who prefer working directly with the code.

Adding and editing a section

A section is a container that holds fields in your schema, representing parts of the document like e.g. Amounts or Vendor Information. Each schema should have at least one section. When adding or editing a section, provide:

  • Label: The section name, identifying the type of information it contains

  • ID: A unique identifier used in exports and integrations

  • Description (optional): An internal description of the section

To edit an existing section, go to the section details (indicated by the arrow icon) or select a section from the list visible on the “Add Section” screen.

After completing the section configuration, you can save all changes here. If needed, you can also delete a section. Additionally, you have the option to access the section directly in the JSON schema editor by clicking the “Edit JSON” button.

Adding and editing a field

When adding or editing a field, you can set it to be visible/hidden, included in/excluded from export, and required or optional.

You can then make selections on the Type & Source section:

Field type

  • Simple value (field holding one value)

  • Multivalue (field holding multiple values)

  • Line items (table)

  • Button

Value source

  • Captured – value extracted from the document

  • Formula – value calculated using a formula

  • Data – value added by an extension

  • Manual – value manually specified by an annotator during a document review

  • Formula – value added/calculated by formula

Editing (serves to set if users can edit the value)

  • Enabled

  • Enabled without warning

  • Disabled

And the Identification section:

  • Label: The name visible to annotators on the validation screen and in export results (.csv and .xlsx files)

  • ID: A unique field identifier used in exports and integrations

  • Description (optional): A description of the field

  • Data type:

    • text

    • number

    • date

    • enum (allows you to select value from predefined list of options)

For date and number fields you can also specify format (please find more information here).

If the field is captured by AI, provide:

AI Engine Field ID: This attribute determines which value should be presented in the field. To give you an example – our AI engine is pre-trained to recognise certain fields (you can find a full list here), if you want to capture a bank account number, create an “Account Number” field and set “account_num” as the AI Engine Field ID.

Important: For custom fields our AI is not pre-trained to recognise, leave the AI Engine Field ID attribute empty.

You can also define a confidence score threshold to automate document processing based on AI prediction confidence.

Additional options include:

  • dropdown options for enum fields

  • formula definition for fields with a “Formula” value source

After completing the field configuration, you can save all changes with the blue save button on the top of the screen. If needed, you can also delete a field. Additionally, you have the option to access the field directly in the JSON schema editor by clicking the “Edit JSON” button.

Before You Start

General rules:

  • If using a dedicated engine, consult with Rossum before making schema changes to ensure optimal AI training results.

  • Ensure that the field ID remains consistent across all your queues. For example, if you have a field for the supplier name, its ID should be the same in all schemas (e.g., sender_name). Only this specific value should be annotated in that field.

  • Do not change the field ID if you have already annotated documents. If necessary, contact support@rossum.ai for assistance.

Set the correct value source for each field:

  • Captured if added by AI

  • Data if added by extension

  • Manual if added manually by an annotator

  • Formula if using a formula for calculations or transformations

Predefined fields:

  • For predefined fields (recognized by our AI), edit only the label. If you need to modify the field ID for business reasons, the AI Engine Field ID should remain unchanged.

Custom fields:

  • For custom fields our AI is not pre-trained to recognise, leave the AI Engine Field ID attribute empty.

Hidden fields:

  • Remove hidden fields from the schema if not used.

Line item fields:

  • Field IDs should start with “item_” (e.g., “item_code”), and AI Engine Field IDs should start with “table_column_” (e.g., “table_column_description”).

  • Set up the tuple correctly, and add the AI Engine Field ID of the tuple (usually “line_items”).

Fields filled by extension:

  • Keep the AI Engine Field ID for fields filled by extensions empty, as predictions are not needed.

  • If the value can be present on the document and filled by an extension, use two fields: one annotated (from the document) and one for the value populated by an extension.

  • Set the value source to Data.