Import Data

Datasaur supports a wide variety of import data formats. The available formats depend on the task types, as described in the table below. Click on any format to see a detailed explanation of the file structure Datasaur expects.

Available Formats

Task Type

Import Formats

Token-based

.txt, .tsv, .json​

Token-based with arrows

.txt, .tsv, .json​

Row-based

​.tsv, .csv, .xls, .xlsx, .txt​

Document-based*

​.md, .pdf, .jpeg, .jpg, .png, .gif, .svg, .bmp, .tiff, .tif, .webp​

​OCR **

Media: .pdf, .jpeg, .jpg, .png​

Transcription: .txt​

Important Notes

*Document-based formats only work if you import the files through Project Creation Wizard.

**When uploading pairs of OCR documents, please make sure your image files and their corresponding transcription have the same file name. For example unicef.jpg and unicef.txt.