Datasaur
Search
⌃K

Task Types

Token-based

Token-based projects allow you to label tokens in a text document. This type is well-suited for projects such as NER and POS.
Before submitting your project, you can also adjust additional settings as described below.
  • Limit selection to a maximum of 1 token is useful when you want to enforce that every token in the document must be labeled.
  • Tokens and token spans should have at most one label does not allow you to add multiple labels to a single token or token span.
  • Allow arrows to be drawn between labels allows you to draw arrows from one label to another to annotate relationships between words. This is useful for showing that an adjective is related to a noun, or a pronoun is referring to a person.
  • Default text selection **** allows you to select whether token or character selection. Some languages may require you to change the selection to character selection, i.e. Mandarin, Korean, or Thai.
Note: If you have already created a project, you can change the configurations through Settings.

Row-based

Row-based allows you to label data in tables on a row-by-row basis. This is commonly used for classification or data extraction.
Unlike token-based labeling, you label entire sentence(s) by answering the questions about each row. If your data is already in a table format (ex: .xls or .csv), you will be automatically guided to select row-based.

Document-based

Document-based asks you to label entire documents at a time. This can be useful when answering questions about a document like a .pdf, or when classifying images.

Bounding box-based

Bounding Box-Based allows you to label specific areas of an image or document-based files contained within a Bounding Box. This will help you to recognize some parts of the document/image, you can also add corresponding text transcription to the Bounding Box which is commonly used for OCR use cases.