Span Based

Span-based

In span-based labeling, the labeling process can be done by labeling tokens or spans of tokens. Span-based labeling is well-suited for projects such as NER and POS. Here are the things that are important for you to know before labeling your project. (See this Youtube video for a visual guide on span-based labeling). To begin labeling, you can easily use your pointer to select a word/token in your dataset, you will then see a list of your labels. You can select a label using your pointer. However, Datasaur is designed for you to be a power-user. So we have more efficient methods for you to manually label. In the following documentation we will guide you on how to use hotkeys to apply labels and how to apply multiple labels to the same token span.

Keyboard shortcuts

The label box will appear when you click on the tokens. You can click manually on the labels or use the corresponding keyboard shortcuts by typing 1, 2, 3, or 4.

Due to a limited number of numerals on the keyboard, keyboard shortcuts are only available for the first 9 labels. Do you have more than 9 labels? Read the next section!

Search for labels

If you have a long list of labels, you can search for specific labels in the label box by starting to type out parts of the label. In the following example, you could begin to type "date" then immediately select the label by using the corresponding hotkey: "1"

Apply multiple labels to the same span

You can apply multiple labels to the same token span. Here are three methods:

1) Apply a label to a span. Select the same span and hold shift while you select an additional label. You can keep applying additional labels as long as you hold down shift when selecting the additional label.

2) The second way is to use keyboard shortcuts. Select the span, use up and down to find the right label, then press shift + enter or shift + return. 3) We have a new feature that enables you to apply multiple labels to the same token without having to hold down shift. See our next discussion below!

3) We have a new feature that enables you to apply multiple labels to the same token without having to hold down shift. See our next discussion below!

Enable multiple label's selection

We know that sometimes you need to label a token or spans with multiple labels. Previously, we supported this capability by having you hold down the SHIFT key while selecting the appropriate labels. But now, we’ve taken things up a notch and made the process even smoother with our new feature!

Enable multiple labels selection allows you to select multiple labels and apply them to the same token or spans without the need to hold down the SHIFT key. It's a real time-saver and simplifies the labeling process.

Enabling the feature

As a user, you have the flexibility to choose your preferred method, whether it’s using keyboard shortcuts or directly enabling the setting in the interface.

You can find the setting under File menu > Settings > Personalization.

The Personalization setting can be accessed once the project is created. Please note that each user needs to enable this setting for their project, as it will not reflect to others.

Once the setting is enabled, follow these simple steps to apply multiple labels to a token or a span:

  1. Select the desired labels by clicking on them. You can select as many labels as needed. There are other ways to select the labels:

    1. Use keyboard shortcuts (1-9)

    2. Navigate between labels with the arrow keys (up and down) and press Enter

  2. Click the “Apply Labels” button, and all your selected labels will be automatically applied to the token or span.

    1. You can also can use TAB key to navigate between labels until it reaches the “Apply labels” button, and then press Enter.

Modifying the labels applied

If you ever need to modify the labels applied to a token or a span, you have several options,

  • Click on the label you wish to change

  • Unselect the label you want to change, then select the correct label

  • Click “Apply labels”

Adding a new label classes

There is a case that you want to add a new label classes to the label box here. For this, adding a new label classes will automatically be selected inside the label box.

Applying labels to the multiple tokens or spans

We support this capability by holding CTRL and select the tokens or spans. Let’s say there are two spans we would like to apply the same labels: Sherlock Holmes and Dr. John Watson.

  • If Sherlock Holmes and Dr. John Watson doesn’t have any labels applied, you can simply select the appropriate labels, then click “Apply labels” button. Those labels will be applied to both spans.

  • If Sherlock Holmes and Dr. John Watson have already had PERSON as the label,

    1. Checkboxes will be reset — PERSON label will be not selected

      1. “Reset to mixed labels” button will show, but it’s disabled

    2. Select ORGANIZATION and BOOK TITLE labels as we want to change the current label applied

      1. “Reset to mixed labels” button will be clickable

        1. Clicking this will remain PERSON as the label for both spans

    3. Click “Apply labels”

    4. ORGANIZATION and BOOK TITLE labels will be applied to Sherlock Holmes and Dr. John Watson

  • If Sherlock Holmes and Dr. John Watson have already had PERSON as the label, and you would like to add a new label classes,

    1. Checkboxes will be reset — PERSON label will be not selected

      1. “Reset to mixed labels” button will show, but it’s disabled

    2. Type a new label, say CHARACTER, then click Add new label

    3. CHARACTER will be automatically selected

      1. “Reset to mixed labels” button will be clickable

        1. Clicking this will remain PERSON as the label for both spans

    4. Click “Apply labels”

    5. CHARACTER will be applied to Sherlock Holmes and Dr. John Watson

A couple of notes of the enabled multiple labels selection feature:

  • Multiple labels selection only available for token/spans, not for arrows

  • If you enable the multiple labels selection in the Personalization in an arrow labeling,

    • the checkbox will still be displayed in the arrow label box

    • will not be able to select multiple labels

  • If you have enabled the Tokens and token spans should have at most one label setting in Step 3 (most likely for Part of Speech use case),

    • the checkbox will still be displayed in the arrow label box

    • will not be able to select multiple labels

Edit sentence

You can Edit the sentence by double-click on the row, then you can edit the sentence. When editing, we will show you the original sentence. Please take a note that we will tokenize the sentence at the server. To apply changes, you can do one of these:

  • Press shift + enter if you want to use space as token separator and not using the default tokenizer that Datasaur has.

  • Click Save after editing.

When you’re making significant edits to sentences or using the Shift+Enter for the tokenizer, particularly at the beginning or end of sentences where labels are already present, it might result in those labels being removed. So, please take extra care when making these kinds of changes to avoid any unintended label removal.

Personalization

Datasaur strives to provide a balance of comfort and control to all users. You can adjust the font and size to your desire under File > Settings menu.

Insert new lines

You can add new lines by right-clicking on the row then choosing Insert Line Above or Insert Line Below.

Delete lines

You can delete lines by right-clicking on the row then choosing Delete Line.

Delete sentence labels

You can delete all the labels on a given sentence by right-clicking anywhere in the sentence and choosing Delete Sentence Labels.

Draw arrows

Once you have labeled tokens, you can draw arrows between labels by following these steps.

  1. Clicking the label

  2. Hold it

  3. Pointing it to the other label

  • You can even apply labels to the arrows themselves. In order to do so, double-click the arrow and select the appropriate label.

  • You can also reverse arrows, delete arrows, and delete labels by right-clicking on the arrow.

Go Menu

You can move to the desired line via the Go menu.

  • Go to Start will take you to the first line.

  • Go to End will take you to the last line.

  • Go to Line will take you to a specific line.

  • Go to Next Unlabeled Token will take you to the next unlabeled token.

  • Go to Previous Unlabeled Token will take you to the previous unlabeled token.

  • Go to Next Unlabeled Line will take you to the next unlabeled line.

  • Go to Previous Unlabeled Line will take you to the previous unlabeled line.

  • Go to Next File will take you to the next file.

  • Go to Previous File will take you to the previous file.

Delete labels

Deleting labels can be done in two ways:

  • Right-clicking the label and clicking on Delete label.

  • Press delete or backspace on your keyboard.

Paragraph/Sentence labeling

Paragraph/sentence labeling optimizes the interface for when you are applying labels to longer sentences or entire paragraphs. You will have the option to show the label as an index bar on the left-hand side, and hide the label above the text to avoid clutter.

You can enable this by altering the project settings in token-based projects:

  • Click File on the top left, then click Settings --> Personalization.

  • Check Show index bar for labels.

Character selection

Character selection allows you to select and apply labels on a character-level basis, so you don't have to select the entire token.

  • Click File on the top left, then click Settings --> Task settings.

  • Open Default text selection.

  • Choose `Character selection`

  • Labeling the character can be done in two ways:

    • Select the desired character using your mouse.

    • Select the character using keyboard shortcuts shift + right.

Highlight an entire sentence

If you want to label the entire sentence, you can simply click on the line number.

Select multiple lines at once

Select multiple lines at once can be done by holding shift and clicking the desired line number.

You can also select multiple lines starting with any line number, for example selecting lines 4-8.

Adjust span selection

This feature allows you to adjust the selection to an already-applied span label, so you don’t have to delete and reapply the label.

Please note that you will need the ability to modify the applied label in order to adjust span selection.

Enabling the adjusting mode

To start adjusting any label selection, right-click on the label that you want to adjust. You should see “Adjust span selection” option.

The 'Adjust span selection' won't be accessible in Reviewer mode if the label is still conflicted and hasn't been resolved.

Inside adjusting mode

After you’ve selected the option, you will enter Adjusting Mode. In this mode, you can move the selection handle to create the new selection.

Shortcut, extension, and title bar functionality will be disabled while the user is in Adjusting Mode.

You can change the selection by dragging and dropping the selection handle to the desired position.

All span labeling settings in Task settings, such as “Spans should have at most one label” and “Limit selection to a span of 1 token,” will also be applied.

Exiting the mode will automatically save the selection. Click anywhere besides the handle to exit the mode. Additionally, the user will see a saving indicator below the label selection, and it will disappear after the save is successful.

Drawing bounding boxes in the OCR interface

In addition to labeling the transcription in the OCR interface, you can also draw bounding boxes on the viewer and bind them to the corresponding text in the text transcription.

Before drawing the bounding boxes, click the icon shown in the screenshot below to enable the drawing capability. Once the icon's color turns blue, you are ready to begin drawing the bounding boxes!

Auto-scroll in the OCR interface

After you create bounding boxes and bind them to the text,

  1. Clicking a bounding box on a specific page will automatically scroll the text editor to find the corresponding text

  2. Clicking a span of text or a label will also automatically scroll the media viewer to find the corresponding bounding box

This feature can be helpful for PDF with multiple pages.

Synchronized Scrolling

Synchronized scrolling is a feature available in OCR Span Labeling project.

This feature reduces the effort of manually scrolling through a PDF file and a transcription file by synchronizing the scroll position between the two viewers. This way, you won’t have to scroll through both viewers to ensure that each viewer is aligned.

This feature is currently only available for OCR labeling project with PDF files.

How to enable

Below are the step-by-step you can follow to enable this feature:

  1. Make sure you are working in an OCR Span Labeling Project. Read how to create one here.

  2. On the bottom bar, you will see a button to toggle synchronized scrolling.

  3. Click on the button, and wait until the mapping process is finished.

    • You can check the mapping progress by hovering on the button or checking on the progress indicator on the top part of the editor.

    • While the mapping is in progress, you can still interact with the document and do any labeling. But, any action that modifies the sentence content will be disabled such as editing the sentence.

  4. When the mapping process is finished, you will be notified by a success snack bar message.

  5. Finally, you can try to scroll on one of the viewer, then the other viewer should be automatically scroll.

Triggering the Auto-scroll

To trigger the auto-scroll, you can simply scroll on the transcription or the document viewer. Scrolling can be done with your mouse wheel, touchpad scroll gesture, or using the up-down arrow keys.

You can also click-to-highlight any span on the transcription to trigger the auto-scroll.

Disabling the Feature

You can also disable this feature momentarily, to do so you can just click on the Synchronized Scrolling button on the bottom of the page.

The good news is that you can immediately re-enable it and don’t need to wait for the mapping process again, as long as the mapping is still available.

Some actions can discard the mapping result, which requires the mapping process to be redone. See the “Limitations” section below to learn which actions may require the mapping process to be redone.

Getting the Best Experience

To create scroll points between texts in the PDF and the transcription, the app maps the text content of your file to your transcription using text matching. To enhance the accuracy of our mapping, ensure that you follow these points.

  1. Use native PDFs rather than scanned PDFs. Native PDFs have their text contents embedded in the file, which can simply be extracted and used for the mapping process. Meanwhile, scanned PDFs have images of texts, which the app currently is unable to extract.

  2. For your transcription file, avoid having one line with content that spans across multiple lines from your PDF document. A rule of thumb is to have one line from your PDF file as one line in the transcription file.

  3. The auto-scrolling behavior works best if the Document viewer is at 100% zoom.

Limitations

  1. Enabling PII Anonymization may break the mapping process, as parts of the transcription will be masked, and the app may fail to find matches for the masked transcription lines.

  2. Synchronized scrolling will be temporarily disabled when the document is rotated.

    It will be enabled again when the document is back to its original orientation.

  3. Scrolling by dragging the scroll bar does not trigger the auto-scroll.

  4. The mapping result is stored locally in your browser. This means that the mapping is stored in-device and never leaves your browser. However, this also means certain actions can cause the mapping result to be discarded, and then the mapping process need to be redone. Those actions are:

    • Navigating out of the current document, such as to another project or document, or switching between reviewer mode and labeler mode.

    • Refreshing the page or browser.

    • Using a different browser or device.

Mark as complete

Once you have finished labeling, click Mark as complete. This will signify to your team you are done with the project, and it is ready for Review or Export.

Last updated