Datasaur
Search
⌃K

Let's Get Labeling!

Label projects fall under three broad categories: token-based, row-based, and document-based. We'll take a look at examples of each below.

Token-based

In token-based labeling, the labeling process can be done by labeling tokens or spans of tokens. Token-based labeling is well-suited for projects such as NER and POS. Here are the things that are important for you to know before labeling your project. (See this Youtube video for a visual guide on token-based labeling).

Keyboard shortcuts

The label box will appear when you click on the tokens. You can click manually on the labels or use the corresponding keyboard shortcuts by typing 1, 2, 3, or 4.
Due to a limited number of numerals on the keyboard, keyboard shortcuts are only available for the first 9 labels.

Search for labels

You can search for labels in the label box by starting to type out parts of the label. In the following example, you could type "NNP" followed by 2 to apply the NNPS label.

Apply overlapping or multiple labels

It is possible to apply an overlapping label or even multiple labels to the same token or span.
  • The first way is selecting the token or span, holding shift and click the appropriate label.
  • The second way is to use keyboard shortcuts. Select the span, use up and down to find the right label, then press shift + enter or shift + return.

Edit sentence

You can Edit the sentence by double-click on the row, then you can edit the sentence. When editing, we will show you the original sentence. Please take a note that we will tokenize the sentence at the server. To apply changes, you can do one of these:
  • Press shift + enter if you want to use space as token separator and not using the default tokenizer that Datasaur has.
  • Click Save after editing.

Personalization

Datasaur strives to provide a balance of comfort and control to all users. You can adjust the font and size to your desire under File > Settings menu.

Insert new lines

You can add new lines by right-clicking on the row then choosing Insert Line Above or Insert Line Below.

Delete lines

You can delete lines by right-clicking on the row then choosing Delete Line.

Delete sentence labels

You can delete all the labels on a given sentence by right-clicking anywhere in the sentence and choosing Delete Sentence Labels.

Draw arrows

Once you have labeled tokens, you can draw arrows between labels by following these steps.
  1. 1.
    Clicking the label
  2. 2.
    Hold it
  3. 3.
    Pointing it to the other label
  • You can even apply labels to the arrows themselves. In order to do so, double-click the arrow and select the appropriate label.
  • You can also reverse arrows, delete arrows, and delete labels by right-clicking on the arrow.

Go Menu

You can move to the desired line via the Go menu.
  • Go to Start will take you to the first line.
  • Go to End will take you to the last line.
  • Go to Line will take you to a specific line.
  • Go to Next Unlabeled Token will take you to the next unlabeled token.
  • Go to Previous Unlabeled Token will take you to the previous unlabeled token.
  • Go to Next Unlabeled Line will take you to the next unlabeled line.
  • Go to Previous Unlabeled Line will take you to the previous unlabeled line.
  • Go to Next File will take you to the next file.
  • Go to Previous File will take you to the previous file.

Delete labels

Deleting labels can be done in two ways:
  • Right-clicking the label and clicking on Delete label.
  • Press delete or backspace on your keyboard.

Paragraph/Sentence labeling

Paragraph/sentence labeling optimizes the interface for when you are applying labels to longer sentences or entire paragraphs. You will have the option to show the label as an index bar on the left-hand side, and hide the label above the text to avoid clutter.
You can enable this by altering the project settings in token-based projects:
  • Click File on the top left, then click Settings --> Personalization.
  • Check Show index bar for labels.

Character selection

Character selection allows you to select and apply labels on a character-level basis, so you don't have to select the entire token.
  • Click File on the top left, then click Settings --> Task settings.
  • Open Default text selection.
  • Choose `Character selection`
  • Labeling the character can be done in two ways:
    • Select the desired character using your mouse.
    • Select the character using keyboard shortcuts shift + right.
Task settings will show in the personal workspace and in the reviewer mode under the team workspace.

Highlight an entire sentence

If you want to label the entire sentence, you can simply click on the line number.

Select multiple lines at once

Select multiple lines at once can be done by holding shift and clicking the desired line number.
You can also select multiple lines starting with any line number, for example selecting lines 4-8.

Drawing bounding boxes in the OCR interface

In addition to labeling the transcription in the OCR interface, you can also draw bounding boxes on the viewer and bind them to the corresponding text in the text transcription.
Before drawing the bounding boxes, click the icon shown in the screenshot below to enable the drawing capability. Once the icon's color turns blue, you are ready to begin drawing the bounding boxes!

Auto-scroll in the OCR interface

After you create bounding boxes and bind them to the text,
  1. 1.
    Clicking a bounding box on a specific page will automatically scroll the text editor to find the corresponding text
  2. 2.
    Clicking a span of text or a label will also automatically scroll the media viewer to find the corresponding bounding box
This feature can be helpful for PDF with multiple pages.

Mark as complete

Once you have finished labeling, click Mark as complete. This will signify to your team you are done with the project, and it is ready for Review or Export.

Row-based and Document-based

If you choose row-based or document-labeling as the task type, the goal of labeling is to answer the questions. You can answer the questions in the Document Labeling extension on the right side. (See this Youtube video for visual instructions on how to label row-based projects).
  • You can navigate to the next question by using your mouse or typing Tab on the keyboard.
Row based project

Go Menu

You can move to the desired row via the Go menu.
  • Go to Start will take you to the first row.
  • Go to End will take you to the last row.
  • Go to Line will take you to a specific row.
  • Go to Next Unlabeled Line will take you to the next unlabeled line.
  • Go to Previous Unlabeled Line will take you to the previous unlabeled line.
  • Go to Next File will take you to the next file.
  • Go to Previous File will take you to the previous file.

Required question

The asterisk (*) next to the question indicates that the question requires an answer - leaving a required field blank will trigger an error.

Sort and filter column

If you create Text Field, Text Area, and Date type questions, you are able to sort and filter the columns.
For the Text Field and Text Area columns, you can filter by searching the keyword.
For the Date column, you can filter the date range.

Keyboard shortcuts for dropdown questions

When the answer type is Dropdown, keyboard shortcuts are displayed in the extension. In the example below, you can click 1 on your keyboard to apply POSITIVE as an answer.

Filter rows

You are allowed to see all rows or the unlabeled rows by clicking the View menu. This feature will help you if your project has many rows.
📌 Conflicts only option will only show in the Reviewer Mode.

Hide and rename the headers

You can hide and rename headers by right-clicking on the header.

Mark as complete

Once you have finished labeling, click Mark as complete. This will signify to your team you are done with the project, and it is ready for Review or Export.

Row-based with URL view

There's an option for you to label multiple images by providing the URL of the Images in a column of your Row-based file.
Prepare a Row-based file that contains a URL column
You can store your Images on your preferred storage options (make sure it's accessible). You can also add additional information for each of the images by adding the attributes to the columns (The data can't be edited later).
url-viewer-sample.csv
821B
Binary
Check and set the preview of the row-based labeling
You can set how the media will be previewed on the labeling page. Here are some of the options:
  • Don't expand: Not previewing image from the URL
  • Thumbnail: Previewing smaller size of the image from the URL
  • Large: Previewing the larger size of the image from the URL
Set the viewer setting to URL view
Make sure to change the Viewer Setting from the Tabular View to the URL View. Also, set the URL Columns to the column name of your Row-based file that contains the URLs.
Labeling page of the row-based with URL View
Here you can see your links that are retrieved from the URL that you provided on the Row-based file. Now you can conduct the Row-based labeling with the help of URL View.
See additional information on the document labeling extension
The additional information that are available on the columns of the Row-based file can be found on the Document Labeling extension.

Applying answers for multiple rows

Some users may notice some patterns in their dataset. So, they may need to apply the same answer to multiple rows at once. The good news is Datasaur support that!
Users are allowed to select multiple rows and apply an answer to the selected rows. There are two ways to use this feature:
  1. 1.
    Select the checkbox that is available per row
  2. 2.
    Hold ctrl from the keyboard to select multiple rows or you can tick the check box to select multiple rows
After selecting the rows, you can select answers on the questions in Document Labeling extension, then click Submit to apply the answer.

How it works

  • Select multiple rows that have no answer, then bulk answer
    • All rows should have the same answers
  • Select two rows where one row has an answer and one row has no answer, then bulk answer
    • All rows should have the same answers
  • Select two rows where one row has an answer and one row has no answer, then answer question one and leave the rest
    • The same answer should be provided for question one only
  • If you select multiple rows, there is a possibility that the questions have different answers, hence the Mixed value will be shown below the question:
    Case 1: If the user changes the answer value, it will override all answers for the selected question if submitted.
    Case 2: Reset button will be shown in case user changes their mind and wish to change back to Mixed value
  • This capability is only available in the labeler mode.
  • This capability doesn’t apply for row labeling projects with any of these settings:
    • Number of rows per page: 1
    • URL viewer

Project settings

If you have already created a project, you can change the configurations through Settings.
You can click the File menu and choose Settings.

Token-based projects

  • Personalization tab allows you to adjust the font type and font size, show index bar for labels, and task settings.
  • Task settings tab allows you to edit the token labeling setting.
  • Assignment tab allows you to change the assigned labelers. This is only available for projects created in Datasaur Teams.
  • Administrator tab allows you to change the settings related to your project.

Row-based projects

  • Task settings tab allows you to change the number of rows displayed per page, configure how media should be expanded, and enable markdown parsing.
  • Assignment tab allows you to change the assigned labelers. This is only available for projects created in Datasaur Teams.
  • Administrator tab allows you to change the settings related to your project.
"Label set" here means a question set

Document-based projects

Document-based projects only has similar settings with row-based projects. The difference is there is Auto mark document as complete toggle in the Administrator tab.

Role

Datasaur has 5 team roles. You will find these roles in the team management.

Admin

You will automatically become an admin when you create a team. As an admin, you are allowed to invite team members, create projects, assign team members as labelers or reviewers, and promote team members as admins. You can also access the high-level overview of your team's projects and progress through the Overview page.

Team reviewer

We divide reviewers into team scope and project scope. If the admin assigns you as a team reviewer, you are able to see all projects and review them.

Project reviewer

A project reviewer can be assigned in Step 4 of the project creation wizard. As a project reviewer, you will only see and review the projects that are assigned to you.

Labeler

The most common role, anyone an admin invites to a team, is assigned as a labeler. As a labeler, you will only see the projects that are assigned to you.

Labeler + reviewer

If you're a team admin, you can also be a labeler. This role will automatically selected if you assign yourself in Step 4.