Import Data

Datasaur supports a wide variety of import data formats. The available formats depend on the task type, as described in the table below. Click on any format to see a detailed explanation of the file structure Datasaur expects.

Available Formats

Task Type

Import Formats

Token-based

.txt, .tsv, .json

Token-based with arrows

.txt, .tsv, .json, .conllu

Row-based

.tsv, .csv, .xls, .xlsx, .txt

Document-based*

.md, .pdf, .jpeg, .jpg, .png, .gif, .svg, .bmp, .tiff, .tif, .webp

OCR

Media: .pdf, .jpeg, .jpg, .png

Transcription: .txt, .json, .tsv

Important Notes

*Document-based formats only work if you import the files through Project Creation Wizard.

**When uploading pairs of OCR documents, please make sure your image files and their corresponding transcription have the same file name. For example unicef.jpg and unicef.txt.