Extensions and Shortcuts

Making your labeling process even easier.

Datasaur provides a number of extensions and shortcuts to make labeling even easier.

Extensions

Labels

A core extension to any project, the Labels extension contains the label set or taxonomy used for the project. There are three ways to add a label set:

  1. Upload a label set with a .csv or .tsv format

  2. Choose from a library of label sets

  3. Type the labels in manually

Note: maximum label set size is 500 KB

Built-in Label Sets

The library has three label sets loaded by default: INACL, Stanford, and Default NER. If you have created a project before, previous label sets will be automatically added to the library.

__📌 One important thing to note is that uploading or choosing a new label set will delete all existing labels in the document.

While labeling your project, it is possible to add or remove labels dynamically. Furthermore, you can edit a label or the label color by clicking on the pencil icon as shown below. We provide twelve colors that you can choose from.

You can also move a label by clicking the dots and dragging the label up and down.

💡 Best practice: Some users find that colors can help them memorize labels and reduce human error while labeling.

List of Files

The List of Files extension lists out all the files that you upload to a project. The files will be arranged alphabetically.

As you can see, there are multiple button controls in the picture above that each have different functions.

  1. The Up/Down buttons allow you to go to the previous or the next file.

  2. The Number Field button allows you to jump to a specific file number by typing the number of the desired file and either clicking on the Go (-->) button or pressing Enter on your keyboard.

Dashboard

The Dashboard extension contains basic statistics for the labeling process and helps track progress.

ML-assisted Labeling

ML-assisted Labeling extension allows you to label semi-automatically. By default, spaCy is loaded in. If you click the dropdown, you can change it to use DistilBERT OPIEC, HuggingFace, or a Custom API instead.

📌 Before using this feature, please ensure that your project is already has a label set containing the appropriate label set items. For example, you want to use spaCy. You should add a label set on the Labels extension containing PERSON, NORP, FACILITY, etc.

Dictionary

The Dictionary extension contains two dictionaries - English and Bahasa Indonesia. The English dictionary sources are obtained from Merriam-Webster and WordNet. Meanwhile, the Indonesian dictionary is from Kamus Besar Bahasa Indonesia (KBBI).

This extension will be active when you click the token. This will show the part of speech, the definition entries, and the definition. You can change the desired dictionary set by clicking the triple-dots menu in the top-right corner.

The Search extension allows you to search for words and label tokens that match the search. By searching for a word or phrase, Datasaur will identify all matches in the document. These results are shown in the results list at the bottom of the extension.

There are additional settings that can be accessed by clicking on the settings gear icon.

Regex ON

Regex on allows you to search using regular expressions. For example, men* will show all words that start with men.

Exact words ON

Exact words ON will show exact matches only. For example, men will match to men but not to mentioned.

Label All

Label All allows you to label all matching tokens with the same label.

  • For example, type Walt Disney in the search box. It will show the number of Walt Disney instances in the file. Select Person on the label box and press the Label All button. All the Walt Disney instances in the document will be labeled as Person.

Search by Labels

Search by labels can be done by typing label:[desired label] in Search field.

For example, we want to search tokens with Location label. Type label:location and the search results will show a list of tokens labeled Location.

Guidelines

Guidelines extension contains the guidelines that the labelers should follow. The guidelines must be in .txt or .md format.

There are three ways to add guidelines.

  1. Drag and drop a file to the guidelines box

  2. Upload a file by clicking on the blue box

  3. Choosing from the library by clicking on the triple-dots in the top-right corner

Uploading new guidelines will replace any existing guidelines.

Note: maximum guideline size is 1 MB

Legend Color

Legend color extension can be viewed only on Reviewer Mode and Read-only Mode.

Metadata

Metadata extensions allows you to see metadata that have been set previously with a custom import script.

Metadata in the labeling interface

Keyboard Shortcuts

We recognize that many Datasaur users are power users. We provide keyboard shortcuts to help make labeling more efficient.

The most commonly used shortcuts are available in the label boxes that are displayed while labeling a token. In the example below, you can select 1 to apply the date label.

Source: https://www.gradesaver.com/a-study-in-scarlet/study-guide/summary

In document-labeling, shortcuts are displayed in the extension as shown below. These shortcuts only appear if you choose Dropdown as the question type.

You can access the full version of the keyboard shortcuts by clicking the Help button.