Import Data

Datasaur supports a wide variety of import data formats. The available formats depend on the task types, as described in the table below. Click on any format to see a detailed explanation of the file structure Datasaur expects.

Available Formats

Task Type

Import Formats


.txt, .tsv, .json

Token-based with arrows

.txt, .tsv, .json


.tsv, .csv, .xls, .xlsx, .txt


.md, .pdf, .jpeg, .jpg, .png, .gif, .svg, .bmp, .tiff, .tif, .webp

OCR **

Media: .pdf, .jpeg, .jpg, .png

Transcription: .txt

Important Notes

*Document-based formats only work if you import the files through Project Creation Wizard.

**When uploading pairs of OCR documents, please make sure your image files and their corresponding transcription have the same file name. For example unicef.jpg and unicef.txt.