Data Samples

On this page we will provide you with sample datasets so you can immediately create a project and testing the labeling interface. As we mentioned, Datasaur has many different project types. We will provide you with sample datasets for the following projects: span-based, span-based with arrows, row-based, bounding-box, and document-based. If you would like to create an Audio or LLM project type, select their respective links. Both of these pages contain sample data for you to upload.

Span-based

The following zip files include the sample dataset and label sets. The .txt files in this zip folder contain the dataset to be labeled. Upload the .txt files in Step 1 of the project creation wizard (PCW). The .csv is the labelset (taxonomy) to be applied to the dataset. Upload the .csv in Step 3 of PCW.

Span-based with arrows

The following zip files include the dataset and sample label sets. The .tsv files in this zip folder contain the dataset to be labeled. Upload the .tsv files in Step 1 of the project creation wizard (PCW). The .csv is the labelset (taxonomy) to be applied to the dataset. Upload the .csv in Step 3 of PCW.

Row-based (textual classification)

Upload this .csv in Step 1 of PCW. In Step 3 you will be able to make your question set either through UI or by uploading a .csv. In this example, we are doing a sentiment analysis.

Document-based (document/image classification)

The following zip files include images and PDFs for you to create a document-based project. Make sure to chose one file type when uploading the dataset in Step 1 of PCW. You can make your question set either in the UI or by uploading a .csv.

Bounding-box based

If you would like to create a Bounding-box project, you can use the datasets below. We have included PDFs and .jpg images; please upload one file type in Step 1 of PCW for your project. Once you get to Step 3 of PCW you can upload your labelset (taxonomy) by .csv or by creating them in the UI.

Last updated