Datasaur
Search…
⌃K

Import Data

Datasaur supports a wide variety of import data formats. The available formats depend on the task type, as described in the table below. Click on any format to see a detailed explanation of the file structure Datasaur expects.

Available Formats

Task Type
Import Formats
Token-based
.txt, .tsv, .json
Token-based with arrows
Row-based
.tsv, .csv, .xls, .xlsx, .txt
Document-based*
OCR
Media: .pdf, .jpeg, .jpg, .png, .tiff, .tif,
Transcription: .txt, .json, .tsv
Audio
Media: .mp3, .flac, and .wav Transcription: .SRT and .VTT

Size limit

  • Document: 50 MB per file
  • Audio project: 100 MB per file
  • Project size: 125 MB
If you would like to create projects with larger files, it can be achieved using Robosaur. Please reach out to us in [email protected] for further assistance.

Important Notes

Document-based formats only work if you import the files through Project Creation Wizard.
When uploading pairs of OCR documents, please make sure your image files and their corresponding transcription have the same file name. For example, unicef.jpg and unicef.txt.