Documents may be imported into Rossum using the REST API or email gateway. To ensure successful import and subsequent processing of documents in Rossum, files must meet certain criteria. We divide them to requirements (these must be matched for a file to be processed by Rossum) and recommendations (following which guarantees the highest possible processing reliability).
Rossum extracts data from all document pages. This behavior can be limited to extraction only from several initial pages. It is generally not necessary to remove additional pages of other types (for example a purchase order appended after an invoice). Splitting of documents can be done manually via UI or automatically using a special Separator page.
File Requirements
Import channels
You can import documents via the web app (manual upload), REST API, or email.
Supported file formats
PDF, PNG, JPEG, TIFF, XLSX/XLS, DOCX/DOC. (Rossum also accepts .zip archives as a container; see ZIP rules below.)
File size limits
Max 40 MB per file.
For .zip uploads, the uncompressed total of all contained files must not exceed 40 MB.
ZIP archives
Fewer than 1000 files per archive.
Only files in the root of the archive are processed (or files inside a single first-level directory if that’s the only item).
The same ZIP rules apply to uploads via API, email, and web app.
Email import limits
Email size limit: 50 MB (raw email, including base64-encoded attachments).
Rossum extracts PDFs, images, and ZIP files from incoming emails.
Document Specification
Image resolution should be at least 150 DPI in case of scans/photos
Minimum font size on a document should be 6pt
Documents should be in A4 or Letter format (small-size documents like receipts should be scanned on top of a blank A4 page)
One page may contain only one document (i.e. two receipts on one page cannot be extracted separately)
Scans should not have extremely large dimensions, ideally no more than 3000 pixels on each side
Maximum 50,000 extracted fields of all types per document is supported; in case of master data enum fields, every two enum rows count as an additional field