Annotations Guide

Prev Next

What are the annotations, and why do you need them?

Annotations refer to all the captured data from a document. You can recognize them by the blue bounding boxes (b-boxes) that appear on the document after it is processed in Rossum.

To ensure that the AI engine will learn what data you need to capture, you need good-quality annotations.

How to ensure high-quality annotations?

You should always keep a few simple rules in mind if you want high-quality annotations. Following them will help optimize and improve the accuracy of your AI engine.

1. Keep the annotations consistent and precise

Consistency is essential for the engine to learn to capture the data correctly. That is why you should maintain it in your annotations. It is best to capture a value from the same place on documents with the same layout and/or from the same vendor every time.

The bounding box borders should go around the data, not through it. You should also avoid lines or other characters that do not belong to the correct value.

2. Always annotate all values on the document

If there is data on the document with a corresponding field in the schema for its extraction, you should annotate it on each document where it is present. Please be sure to do it, even if you may not need it to be extracted for a particular vendor or in a specific case.

Amounts should also always be annotated, even if the value on the document is “0”.

3. Only annotate values that appear on the document

If there is a value on the invoice, it should be captured. Please do not enter a value that is not on the invoice manually. The engine cannot learn to populate the values based on your manual input.

4. Annotate related data from the same location

It is always preferable to annotate data logically connected and close to each other rather than capturing it from two different locations.

For example, it’s better to annotate the supplier name next to the supplier address rather than on another location far from it.

5. Annotate data from preferred locations

Data should be annotated from the preferred location whenever possible.

We recommend annotating the value the first that it appears on the document. For example, annotate vendor information in the header rather than the footer. Also, annotate values on the first page rather than on other pages.

6. Avoid overlapping annotations

While it is occasionally fine to extract data from the same place, you should avoid overlapping annotations. Taking the same value for several fields may confuse the engine and lead to lower confidence in predicting those fields.

7. Focus on each field and annotations of correct values

When capturing the data, pay attention to each field and annotate the correct values. Always check that the values predicted by the engine are correct.

If you find any typos or other errors, try adjusting the bounding box to get the correct value.

8. Annotating tax details

When annotating tax data (e.g., tax rates, tax amounts, base amounts, etc.), make sure to annotate the related values together. All tax data should be in the tax tables.

Values that are in the document total table (usually, the total base amount or subtotal, total tax amount, and total amount with tax) should be captured in the corresponding header fields.

9. Annotate the data values

Annotate only the data values, not the labels. When annotating a PO number written as “PO. no.: AB1234”, for example, only annotate “AB1234”.

10. Annotate the data values

If the same value appears in the logo, footer, and body of the document, choose the one with the more standard font and size.

How to capture table data in Rossum

Annotate table data in the Line Items and header data in the header fields

Unless otherwise instructed by Rossum, you should annotate the header fields as header fields and line items as line items. If the invoice only has one line item, the amounts are usually the same as the line item amounts. Even if the amount is the same, annotate it in the table footer rather than within the line item.

Use the Magic Grid to annotate structured line item tables

The is a helpful tool for quickly annotating structured line item tables. These are tables with data placed in separate columns, with one data type per column, and each line item in a different row.

It is possible to annotate them by pointing and clicking, but if the table contains many values, it will be faster to use the Magic Grid. Use it for all the data you can extract from such a table.

Drag the grid over the data and adjust it as needed. You can adjust the grid by moving the separators up or down, adding or removing labels, ignoring rows you don’t want to capture, and so on.

You can find more information on how to use Rossum’s Magic Grid in our user guide article.

Aurora for Complex Tables

Understanding tables in documents can be challenging, from simple layouts to more complex structures. Aurora for complex tables simplifies the process, making it easy to extract data from tables quickly. This solution not only handles structured and straightforward cases but also lets you extract information from complex tables. Importantly, no additional add-ons, like the Magic Items extension are needed.

More information can be found here.