Extensions and Shortcuts
Making your labeling process even easier.
Datasaur provides a number of extensions and shortcuts to make labeling even easier.



A core extension to any project, the Labels extension contains the label set or taxonomy used for the project. There are three ways to add a label set:
  1. 1.
    Upload a label set with a .csv or .tsv format
  2. 2.
    Choose from a library of label sets
  3. 3.
    Type the labels in manually
Note: maximum label set size is 500 KB

Built-in Label Sets

The library has three label sets loaded by default: INACL, Stanford, and Default NER. If you have created a project before, previous label sets will be automatically added to the library.
__πŸ“Œ One important thing to note is that uploading or choosing a new label set will delete all existing labels in the document.
While labeling your project, it is possible to add or remove labels dynamically. Furthermore, you can edit a label or the label color by clicking on the pencil icon, as shown below. We provide twelve colors that you can choose from.
You can also move a label by clicking the dots and dragging the label up and down.
πŸ’‘ Best practice: Some users find that colors can help them memorize labels and reduce human error while labeling.

List of Files

The List of Files extension lists out all the files that you upload to a project. The files will be arranged alphabetically.
As you can see, there are multiple button controls in the picture above that each have different functions.
  1. 1.
    The Up/Down buttons allow you to go to the previous or the next file.
  2. 2.
    The Number Field button allows you to jump to a specific file number by typing the number of the desired file and either clicking on the Go (-->) button or pressing Enter on your keyboard.


The Dashboard extension contains basic statistics for the labeling process and helps track progress.

ML-assisted Labeling

ML-assisted Labeling extension allows you to label semi-automatically. By default, spaCy is loaded in. If you click the dropdown, you can change it to use built-in service providers, HuggingFace, or a Custom API instead.
Before using this feature, please ensure that your project is already has a label set containing the appropriate label set items. For example, you want to use spaCy. You should add a label set on the Labels extension containing PERSON, NORP, FACILITY, etc.

Multiple label sets

As we have multiple label sets capability in a single project, this also reflect to the intelligence site. Datasaur ML-Assisted Labeling feature will work properly if you are in an active label set.
For example, you have NER label set and POS label set in your token-based project. You are going to use a built-in spaCy model to help you label the project. Before you use the feature, you have to ensure that you are in NER label set by changing the label set dropdown to NER label set. Then, spaCy can detect your labels


The Dictionary extension contains two dictionaries - English and Bahasa Indonesia. The English dictionary sources are obtained from Merriam-Webster and WordNet. Meanwhile, the Indonesian dictionary is from Kamus Besar Bahasa Indonesia (KBBI).
This extension will be active when you click the token. This will show the part of speech, the definition entries, and the definition. You can change the desired dictionary set by clicking the triple-dots menu in the top-right corner.
The Search extension allows you to search for words and label tokens that match the search. By searching for a word or phrase, Datasaur will identify all matches in the document. These results are shown in the results list at the bottom of the extension. (Watch this Youtube video for visual instruction on how to use the Search extension).
There are additional settings that can be accessed by clicking on the settings gear icon.

Regex ON

Regex on allows you to search using regular expressions. For example, men* will show all words that start with men.

Exact words ON

Exact words ON will show exact matches only. For example, men will match to men but not to mentioned.

Label All

Label All allows you to label all matching tokens with the same label.
  • For example, type Walt Disney in the search box. It will show the number of Walt Disney instances in the file. Select Person on the label box and press the Label All button. All the Walt Disney instances in the document will be labeled as Person.

Search by Labels

Search by labels can be done by typing label:[desired label] in Search field.
For example, we want to search tokens with Location label. Type label:location and the search results will show a list of tokens labeled Location.


Guidelines extension contains the guidelines that the labelers should follow. The guidelines must be in .txt or .md format.
There are three ways to add guidelines.
  1. 1.
    Drag and drop a file to the guidelines box
  2. 2.
    Upload a file by clicking on the blue box
  3. 3.
    Choosing from the library by clicking on the triple-dots in the top-right corner
Uploading new guidelines will replace any existing guidelines.
Note: maximum guideline size is 1 MB

Legend Color

Legend color extension can be viewed only in Reviewer Mode and Read-only Mode.


Metadata extensions allow you to see metadata that have been set previously with a custom import script.
Metadata in the labeling interface

Grammar Checker

Grammar Checker is a Datasaur extension in the token-based project which corrects words mistakes on the dataset.

How to Use Grammar Checker:

  1. 1.
    Add the Grammar Checker from the Extension
  2. 2.
    Select the Service Provider
  3. 3.
    Click "Enable Checker" to enable the Grammar Checker
  4. 4.
    Click "Disable Checker" to disable the Grammar Checker

How It Works:

  • The Grammar Checker will check the entire document
  • It will show the list of grammar mistakes on the grammar checker extension and red underline the mistakes on the token editor
The grammatically mistaken text will be given a red underline
The list of grammar mistakes after enabling the Grammar Checker
  • Clicking the item on the list of grammar mistakes will jump to the row where it refers to
  • When the Grammar Checker is enabled, it will always recheck the line where text modification happens
About LanguageTool:
​LanguageTool is a free online proofreading service for English, Spanish, and 20 othe languages. Instantly check your text for grammar and style mistakes.
  • We use LanguageTool opensource
  • ​Here is the link of LanguageTool rules

Keyboard Shortcuts

We recognize that many Datasaur users are power users. We provide keyboard shortcuts to help make labeling more efficient.
The most commonly used shortcuts are available in the label boxes that are displayed while labeling a token. In the example below, you can select 1 to apply the date label.
In document-labeling, shortcuts are displayed in the extension as shown below. These shortcuts only appear if you choose Dropdown as the question type.
You can access the full version of the keyboard shortcuts by clicking the Help button.
Last modified 24d ago