Rossum lets you process documents in many different languages. Here’s a complete list of the supported languages and the features available for each one.
Legacy Data Capture
Before Aurora, Rossum could recognize the following languages: Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Lithuanian, Norwegian, Polish, Portuguese, Brazilian Portuguese, Romanian, Slovak, Slovenian, Spanish, Catalan, and Swedish. We also provided support for Japanese and Chinese.
With the release of Aurora 1.5, the list of supported languages expanded significantly. We now offer full or partial support for 276 languages. This article provides all the details.
Rossum Aurora – Fully Supported Languages
For all the languages listed below, Document Language Classification is available, meaning Rossum will automatically recognize the language (the document language can be predicted). Instant Learning is also supported, which means our AI can learn from annotations in these languages.
Most of these languages support pre-trained fields, meaning the AI can provide predictions for certain fields right away, without needing extra training or previous annotations. You can find the full list of fields here.
For some languages, we support full or partial recognition of handwritten text. Partial handwriting recognition means that some diacritics may not be recognised.
✅ – Available
❌ – Not available
Language | Instant Learning | Document Language Classification | Pre-trained fields | Handwritten text recognition* |
---|---|---|---|---|
Albanian | ✅ | ✅ | ✅ | PARTIAL |
Arabic | ✅ | ✅ | ❌ | ❌ |
Azerbaijani | ✅ | ✅ | ❌ | ❌ |
Basque | ✅ | ✅ | ✅ | PARTIAL |
Belarusian | ✅ | ✅ | ❌ | ❌ |
Bosnian | ✅ | ✅ | ❌ | ❌ |
Bulgarian | ✅ | ✅ | ❌ | ❌ |
Catalan | ✅ | ✅ | ✅ | PARTIAL |
Chinese Simplified | ✅ | ✅ | ✅ | ✅ |
Chinese Traditional | ✅ | ✅ | ❌ | ❌ |
Croatian | ✅ | ✅ | ✅ | PARTIAL |
Czech | ✅ | ✅ | ✅ | PARTIAL |
Danish | ✅ | ✅ | ✅ | PARTIAL |
Dutch | ✅ | ✅ | ✅ | PARTIAL |
English | ✅ | ✅ | ✅ | ✅ |
Estonian | ✅ | ✅ | ✅ | PARTIAL |
Finnish | ✅ | ✅ | ✅ | PARTIAL |
French | ✅ | ✅ | ✅ | ✅ |
German | ✅ | ✅ | ✅ | ✅ |
Greek | ✅ | ✅ | ❌ | ❌ |
Hebrew | ✅ | ✅ | ❌ | ❌ |
Hindi | ✅ | ✅ | ❌ | ❌ |
Hungarian | ✅ | ✅ | ✅ | PARTIAL |
Icelandic | ✅ | ✅ | ✅ | PARTIAL |
Indonesian | ✅ | ✅ | ✅ | PARTIAL |
Italian | ✅ | ✅ | ✅ | ✅ |
Japanese | ✅ | ✅ | ❌ | ✅ |
Kazakh | ✅ | ✅ | ❌ | ❌ |
Korean | ✅ | ✅ | ❌ | ✅ |
Latvian | ✅ | ✅ | ✅ | PARTIAL |
Lithuanian | ✅ | ✅ | ✅ | PARTIAL |
Malay | ✅ | ✅ | ❌ | ❌ |
Norwegian | ✅ | ✅ | ✅ | PARTIAL |
Persian | ✅ | ✅ | ❌ | ❌ |
Polish | ✅ | ✅ | ✅ | PARTIAL |
Portuguese | ✅ | ✅ | ✅ | PARTIAL |
Romanian | ✅ | ✅ | ✅ | PARTIAL |
Russian | ✅ | ✅ | ❌ | ❌ |
Serbian | ✅ | ✅ | ❌ | ❌ |
Slovak | ✅ | ✅ | ✅ | PARTIAL |
Slovenian | ✅ | ✅ | ✅ | PARTIAL |
Spanish | ✅ | ✅ | ✅ | PARTIAL |
Swedish | ✅ | ✅ | ✅ | PARTIAL |
Thai | ✅ | ✅ | ❌ | ❌ |
Turkish | ✅ | ✅ | ✅ | PARTIAL |
Ukrainian | ✅ | ✅ | ❌ | ❌ |
Vietnamese | ✅ | ✅ | ✅ | PARTIAL |
*To enable recognition of handwritten text, queue locale has to be set to auto (Document regional format -> Detect automatically)
Rossum Aurora – Instant Learning Support
For the languages listed below, we support Instant Learning, meaning our AI will learn from annotations made in these languages. Currently, no additional features are available.
Afrikaans Abaza Abkhazian Achinese Acoli Adangme Adyghe Akan Algonquin Angika Asturian Asu Avaric Awadhi Aymara Bafia Bagheli Bambara Bashkir Bemba Bena Bhojpuri Bikol Bini Bislama Bodo Brajbha Breton Bundeli Buryat Cebuano Chamling Chamorro Chechen Chhattisgarhi Chiga Choctaw Chukot Chuvash Cornish Corsican Creek Crimean Tatar Crow Dargwa Dari Dhimal Dogri Duala Dungan Efik Erzya Faroese Fijian Filipino Fon Friulian | Ga Gagauz Galician Ganda Gayo Gilbertese Greenlandic Guarani Gurung Gusii Haitian Creole Halbi Hani Haryanvi Hawaiian Herero Hiligaynon Hmong Daw Ho Iban Igbo Iloko Inari Sami Ingush Interlingua Irish Jaunsari Jola-Fonyi K’iche’ Kabardian Kabuverdianu Kachin Kalenjin Kalmyk Kangri Kanuri Kara-Kalpak Karachay-Balkar Kashubian Khakas Khaling Khasi Kikuyu Kildin Sami Kinyarwanda Komi Kongo Korku Koryak Kosraean Kpelle Kuanyama Kumyk Kurdish Kurukh Kyrgyz Lak | Lakota Latin Lezghian Lingala Lower Sorbian Lozi Lule Sami Luo Luxembourgish Luyia Macedonian Machame Madurese Mahasu Pahari Makhuwa-Meetto Makonde Malagasy Maltese Malto Mandinka Manx Maori Mapudungun Marathi Mari Masai Mende Meru Meta’ Minangkabau Mohawk Mongondow Montenegrin Morisyen Mundang Nahuatl Navajo Ndonga Neapolitan Nepali Ngomba Niuean Nogay North Ndebele Northern Sami Nyanja Nyankole Nzima Occitan Ossetic Pampanga Pangasinan Papiamento Pashto Pedi Quechua Ripuarian | Romansh Rundi Rwa Sadri Sakha Samburu Samoan Sango Sangu Sanskrit Scots Scottish Gaelic Sena Shambala Shona Siksika Sirmauri Skolt Sami Soga Somali Songhai South Ndebele Southern Altai Southern Sami Southern Sotho Swahili Swati Tabassaran Tahitian Taita Tajik Tatar Teso Tetum Thangmi Tok Pisin Tongan Tsonga Tswana Turkmen Tuvan Udmurt Uighur Upper Sorbian Urdu Uzbek Volapük Vunjo Walser Welsh Western Frisian Wolof Xhosa Yucatec Maya Zapotec Zarma Zhuang Zulu |