Let's Get Labeling!
Label projects fall under three broad categories: token-based, row-based, and document-based. We'll take a look at examples of each below.
In token-based labeling, the labeling process can be done by labeling tokens or spans of tokens. Token-based labeling is well-suited for projects such as NER and POS. Here are the things that are important for you to know before labeling your project. (See this Youtube video for a visual guide on token-based labeling).
The label box will appear when you click on the tokens. You can click manually on the labels or use the corresponding keyboard shortcuts by typing
1
, 2
, 3
, or 4
..png?alt=media)
Due to a limited number of numerals on the keyboard, keyboard shortcuts are only available for the first 9 labels.
You can search for labels in the label box by starting to type out parts of the label. In the following example, you could type "NNP" followed by
2
to apply the NNPS label..png?alt=media)
It is possible to apply an overlapping label or even multiple labels to the same token or span.
- The first way is selecting the token or span, holding
shift
and click the appropriate label. - The second way is to use keyboard shortcuts. Select the span, use
up
anddown
to find the right label, then pressshift
+enter
orshift
+return
.
.png?alt=media)
You can Edit the sentence by double-click on the row, then you can edit the sentence. When editing, we will show you the original sentence. Please take a note that we will tokenize the sentence at the server. To apply changes, you can do one of these:
- Press
shift
+enter
if you want to use space as token separator and not using the default tokenizer that Datasaur has.
.png?alt=media)
- Click
Save
after editing.
Datasaur strives to provide a balance of comfort and control to all users. You can adjust the font and size to your desire under File > Settings menu.
.png?alt=media)
You can add new lines by right-clicking on the row then choosing Insert Line Above or Insert Line Below.
.png?alt=media)
You can delete lines by right-clicking on the row then choosing Delete Line.
.png?alt=media)
You can delete all the labels on a given sentence by right-clicking anywhere in the sentence and choosing Delete Sentence Labels.
.png?alt=media)
Once you have labeled tokens, you can draw arrows between labels by following these steps.
- 1.Clicking the label
- 2.Hold it
- 3.Pointing it to the other label
- You can even apply labels to the arrows themselves. In order to do so, double-click the arrow and select the appropriate label.
- You can also reverse arrows, delete arrows, and delete labels by right-clicking on the arrow.
.png?alt=media)
You can move to the desired line via the Go menu.
- Go to Start will take you to the first line.
- Go to End will take you to the last line.
- Go to Line will take you to a specific line.
- Go to Next Unlabeled Token will take you to the next unlabeled token.
- Go to Previous Unlabeled Token will take you to the previous unlabeled token.
- Go to Next Unlabeled Line will take you to the next unlabeled line.
- Go to Previous Unlabeled Line will take you to the previous unlabeled line.
- Go to Next File will take you to the next file.
- Go to Previous File will take you to the previous file.
.png?alt=media)
Deleting labels can be done in two ways:
- Right-clicking the label and clicking on Delete label.
- Press
delete
orbackspace
on your keyboard.
.png?alt=media)
Paragraph/sentence labeling optimizes the interface for when you are applying labels to longer sentences or entire paragraphs. You will have the option to show the label as an index bar on the left-hand side, and hide the label above the text to avoid clutter.
.png?alt=media)
You can enable this by altering the project settings in token-based projects:
- Click File on the top left, then click Settings --> Personalization.
- Check Show index bar for labels.
.png?alt=media)
Character selection allows you to select and apply labels on a character-level basis, so you don't have to select the entire token.
- Click File on the top left, then click Settings --> Task settings.
- Open Default text selection.
- Choose `Character selection`
- Labeling the character can be done in two ways:
- Select the desired character using your mouse.
- Select the character using keyboard shortcuts
shift
+right
.
.png?alt=media)
Task settings will show in the personal workspace and in the reviewer mode under the team workspace.
If you want to label the entire sentence, you can simply click on the line number.
.png?alt=media)
Select multiple lines at once can be done by holding
shift
and clicking the desired line number.You can also select multiple lines starting with any line number, for example selecting lines 4-8.
.png?alt=media)
In addition to labeling the transcription in the OCR interface, you can also draw bounding boxes on the viewer and bind them to the corresponding text in the text transcription.
Before drawing the bounding boxes, click the icon shown in the screenshot below to enable the drawing capability. Once the icon's color turns blue, you are ready to begin drawing the bounding boxes!

.png?alt=media)
.png?alt=media)
After you create bounding boxes and bind them to the text,
- 1.Clicking a bounding box on a specific page will automatically scroll the text editor to find the corresponding text
- 2.Clicking a span of text or a label will also automatically scroll the media viewer to find the corresponding bounding box
This feature can be helpful for PDF with multiple pages.
Once you have finished labeling, click Mark as complete. This will signify to your team you are done with the project, and it is ready for Review or Export.
.png?alt=media)
If you choose row-based or document-labeling as the task type, the goal of labeling is to answer the questions. You can answer the questions in the Document Labeling extension on the right side. (See this Youtube video for visual instructions on how to label row-based projects).
- You can navigate to the next question by using your mouse or typing
Tab
on the keyboard.
.png?alt=media)
Row based project
You can move to the desired row via the Go menu.
- Go to Start will take you to the first row.
- Go to End will take you to the last row.
- Go to Line will take you to a specific row.
- Go to Next Unlabeled Line will take you to the next unlabeled line.
- Go to Previous Unlabeled Line will take you to the previous unlabeled line.
- Go to Next File will take you to the next file.
- Go to Previous File will take you to the previous file.
.png?alt=media)
The asterisk (*) next to the question indicates that the question requires an answer - leaving a required field blank will trigger an error.
.png?alt=media)
If you create Text Field, Text Area, and Date type questions, you are able to sort and filter the columns.
.png?alt=media)
For the Text Field and Text Area columns, you can filter by searching the keyword.

For the Date column, you can filter the date range.

When the answer type is Dropdown, keyboard shortcuts are displayed in the extension. In the example below, you can click
1
on your keyboard to apply POSITIVE
as an answer..png?alt=media)
You are allowed to see all rows or the unlabeled rows by clicking the View menu. This feature will help you if your project has many rows.
.png?alt=media)
📌 Conflicts only option will only show in the Reviewer Mode.
You can hide and rename headers by right-clicking on the header.
.png?alt=media)
Once you have finished labeling, click Mark as complete. This will signify to your team you are done with the project, and it is ready for Review or Export.
.png?alt=media)
There's an option for you to label multiple images by providing the URL of the Images in a column of your Row-based file.
Prepare a Row-based file that contains a URL column
You can store your Images on your preferred storage options (make sure it's accessible). You can also add additional information for each of the images by adding the attributes to the columns (The data can't be edited later).
.png?alt=media)
url-viewer-sample.csv
821B
Binary
Check and set the preview of the row-based labeling
You can set how the media will be previewed on the labeling page. Here are some of the options:
- Don't expand: Not previewing image from the URL
- Thumbnail: Previewing smaller size of the image from the URL
- Large: Previewing the larger size of the image from the URL
.png?alt=media)
Set the viewer setting to URL view
Make sure to change the Viewer Setting from the Tabular View to the URL View. Also, set the URL Columns to the column name of your Row-based file that contains the URLs.
.png?alt=media)
Labeling page of the row-based with URL View
Here you can see your links that are retrieved from the URL that you provided on the Row-based file. Now you can conduct the Row-based labeling with the help of URL View.
.png?alt=media)
See additional information on the document labeling extension
The additional information that are available on the columns of the Row-based file can be found on the Document Labeling extension.
.png?alt=media)
Some users may notice some patterns in their dataset. So, they may need to apply the same answer to multiple rows at once. The good news is Datasaur support that!
Users are allowed to select multiple rows and apply an answer to the selected rows. There are two ways to use this feature:
- 1.Select the checkbox that is available per row
- 2.Hold
ctrl
from the keyboard to select multiple rows or you can tick the check box to select multiple rows
After selecting the rows, you can select answers on the questions in Document Labeling extension, then click Submit to apply the answer.
.png?alt=media)
- Select multiple rows that have no answer, then bulk answer
- All rows should have the same answers
- Select two rows where one row has an answer and one row has no answer, then bulk answer
- All rows should have the same answers
- Select two rows where one row has an answer and one row has no answer, then answer question one and leave the rest
- The same answer should be provided for question one only
- If you select multiple rows, there is a possibility that the questions have different answers, hence the
Mixed
value will be shown below the question:Case 1: If the user changes the answer value, it will override all answers for the selected question if submitted.Case 2: Reset button will be shown in case user changes their mind and wish to change back toMixed
value
- This capability is only available in the labeler mode.
- This capability doesn’t apply for row labeling projects with any of these settings:
- Number of rows per page: 1
- URL viewer
If you have already created a project, you can change the configurations through Settings.
You can click the File menu and choose Settings.
.png?alt=media)
- Personalization tab allows you to adjust the font type and font size, show index bar for labels, and task settings.
.png?alt=media)
- Task settings tab allows you to edit the token labeling setting.
.png?alt=media)
- Assignment tab allows you to change the assigned labelers. This is only available for projects created in Datasaur Teams.
.png?alt=media)
- Administrator tab allows you to change the settings related to your project.
.png?alt=media)
- Task settings tab allows you to change the number of rows displayed per page, configure how media should be expanded, and enable markdown parsing.
.png?alt=media)
- Assignment tab allows you to change the assigned labelers. This is only available for projects created in Datasaur Teams.
.png?alt=media)
- Administrator tab allows you to change the settings related to your project.
.png?alt=media)
"Label set" here means a question set
Document-based projects only has similar settings with row-based projects. The difference is there is Auto mark document as complete toggle in the Administrator tab.
.png?alt=media)
You will automatically become an admin when you create a team. As an admin, you are allowed to invite team members, create projects, assign team members as labelers or reviewers, and promote team members as admins. You can also access the high-level overview of your team's projects and progress through the Overview page.
We divide reviewers into team scope and project scope. If the admin assigns you as a team reviewer, you are able to see all projects and review them.
A project reviewer can be assigned in Step 4 of the project creation wizard. As a project reviewer, you will only see and review the projects that are assigned to you.
.png?alt=media)
The most common role, anyone an admin invites to a team, is assigned as a labeler. As a labeler, you will only see the projects that are assigned to you.
If you're a team admin, you can also be a labeler. This role will automatically selected if you assign yourself in Step 4.
Last modified 11d ago