The extraction schema lets you define the data to capture from documents uploaded to Rossum and other values to enrich extraction results (e.g., fields for values added by extensions). Usually, each queue has its own schema, but you can link one schema with multiple queues if needed.
How to access and use our schema editor
To customize a schema, go to Queue Settings and find the editor under the Fields tab.
This tab shows a list of fields in your extraction schema, organised by section. For example, the Basic Information section might include fields like Document Type, Document Language, and Document ID etc.
You can configure several elements in the main tab, described below.
Section
Hover over a section name to see five icons to adjust or access the section.
Eye: Make all fields in the section visible/hidden on the validation screen
Square with arrow: Include/exclude all fields in the section from export results
Asterisk/OPT: Make all fields in the section required or optional
Three dots: Delete section
Arrow: Go to the section details / edit the section
You can also add a new section.
Fields
Hover over a field to see options for:
Dragging and dropping the field to change its position in the schema
Checking the field type
Eye: Making the field hidden/visible
Square with arrow: Included/excluded from export
Asterisk/OPT: Required/optional
Three dots: Edit JSON / Delete section
Arrow: Go to the field details / edit the field
Adding a new field
JSON Editor
This is an alternative way to adjust your schema for users who prefer working directly with the code.
Adding and editing a section
A section is a container that holds fields in your schema, representing parts of the document like e.g. Amounts or Vendor Information. Each schema should have at least one section. When adding or editing a section, provide:
Label: The section name, identifying the type of information it contains
ID: A unique identifier used in exports and integrations
Description (optional): An internal description of the section
To edit an existing section, go to the section details (indicated by the arrow icon) or select a section from the list visible on the “Add Section” screen.
After completing the section configuration, you can save all changes here. If needed, you can also delete a section. Additionally, you have the option to access the section directly in the JSON schema editor by clicking the “Edit JSON” button.
Adding and editing a field
When adding or editing a field, you can set it to be visible/hidden, included in/excluded from export, and required or optional.
You can then make selections on the Type & Source section:
Field type
Simple value (field holding one value)
Multivalue (field holding multiple values)
Line items (table)
Button
Captured – value extracted from the document
Formula – value calculated using a formula
Data – value added by an extension
Manual – value manually specified by an annotator during a document review
Formula – value added/calculated by formula
Editing (serves to set if users can edit the value)
Enabled
Enabled without warning
Disabled
And the Identification section:
Label: The name visible to annotators on the validation screen and in export results (.csv and .xlsx files)
ID: A unique field identifier used in exports and integrations
Description (optional): A description of the field
Data type:
text
number
date
enum (allows you to select value from predefined list of options)
For date and number fields you can also specify format (please find more information here).
If the field is captured by AI, provide:
AI Engine Field ID: This attribute determines which value should be presented in the field. To give you an example – our AI engine is pre-trained to recognise certain fields (you can find a full list here), if you want to capture a bank account number, create an “Account Number” field and set “account_num” as the AI Engine Field ID.
Important: For custom fields our AI is not pre-trained to recognise, leave the AI Engine Field ID attribute empty.
You can also define a confidence score threshold to automate document processing based on AI prediction confidence.
Additional options include:
dropdown options for enum fields
formula definition for fields with a “Formula” value source
After completing the field configuration, you can save all changes with the blue save button on the top of the screen. If needed, you can also delete a field. Additionally, you have the option to access the field directly in the JSON schema editor by clicking the “Edit JSON” button.
Before You Start
General rules:
If using a dedicated engine, consult with Rossum before making schema changes to ensure optimal AI training results.
Ensure that the field ID remains consistent across all your queues. For example, if you have a field for the supplier name, its ID should be the same in all schemas (e.g., sender_name). Only this specific value should be annotated in that field.
Do not change the field ID if you have already annotated documents. If necessary, contact support@rossum.ai for assistance.
Set the correct value source for each field:
Captured if added by AI
Data if added by extension
Manual if added manually by an annotator
Formula if using a formula for calculations or transformations
Predefined fields:
For predefined fields (recognized by our AI), edit only the label. If you need to modify the field ID for business reasons, the AI Engine Field ID should remain unchanged.
Custom fields:
For custom fields our AI is not pre-trained to recognise, leave the AI Engine Field ID attribute empty.
Hidden fields:
Remove hidden fields from the schema if not used.
Line item fields:
Field IDs should start with “item_” (e.g., “item_code”), and AI Engine Field IDs should start with “table_column_” (e.g., “table_column_description”).
Set up the tuple correctly, and add the AI Engine Field ID of the tuple (usually “line_items”).
Fields filled by extension:
Keep the AI Engine Field ID for fields filled by extensions empty, as predictions are not needed.
If the value can be present on the document and filled by an extension, use two fields: one annotated (from the document) and one for the value populated by an extension.
Set the value source to Data.