Datasaur provides a number of extensions and shortcuts to make labeling even easier.
A core extension to any project, the Labels extension contains the label set or taxonomy used for the project. There are three ways to add a label set:
Upload a label set with a
Choose from a library of label sets
Type the labels in manually
The library has three label sets loaded by default:
Default NER. If you have created a project before, previous label sets will be automatically added to the library. One important thing to note is that uploading or choosing a new label set will delete all existing labels in the document.
Note: maximum label set size is 500 KB
While labeling your project, it is possible to add or remove labels dynamically. Furthermore, you can edit a label or the label color by clicking on the pencil icon as shown below. We provide twelve colors that you can choose from.
You can also move a label by clicking the dots and dragging the label up and down.
💡 Best practice: Some users find that colors can help them memorize labels and reduce human error while labeling.
The List of Files extension lists out all the files that you upload to a project. The files will be arranged alphabetically.
As you can see, there are multiple button controls in the picture above that each have different functions.
The Up/Down buttons allow you to go to the previous or the next file.
The Number Field button allows you to jump to a specific file number by typing the number of the desired file and either clicking on the
Go (-->) button or pressing
Enter on your keyboard.
The Dashboard extension contains basic statistics for the labeling process and helps track progress.
ML-assisted Labeling extension allows you to label semi-automatically. By default, spaCy is loaded in. If you click the dropdown, you can change it to usee DistilBERT OPIEC or a Custom API instead.
If you choose Custom, you can simply enter a custom API URL and load in any API of your own choice. You can see a Custom API sample preview by clicking the button beside the text field.
If you click the
Apply labels button, the project will automatically apply labels to the document based on the loaded model.
The Dictionary extension contains two dictionaries - English and Bahasa Indonesia. The English dictionary sources are obtained from Merriam-Webster and WordNet. Meanwhile, the Indonesian dictionary is from Kamus Besar Bahasa Indonesia (KBBI).
This extension will be active when you click the token. This will show the part of speech, the definition entries, and the definition. You can change the desired dictionary set by clicking the triple-dot menu in the top-right corner.
The Search extension allows you to search for words and label tokens that match the search. By searching for a word or phrase, Datasaur will identify all matches in the document. These results are shown in the results list at the bottom of the extension.
There are additional settings that can be accessed by clicking on the settings gear icon.
Regex on allows you to search using regular expressions. For example,
men* will show all words that start with
Exact words ON will show exact matches only. For example,
men will match to
men but not to
Label All allows you to label all matching tokens with the same label.
For example, type
Walt Disney in the search box. It will show the number of
Walt Disney instances in the file. Select
Person on the label box and press the Label All button. All the
Walt Disney instances in the document will be labeled as
Search by labels can be done by typing label:[desired label] in Search field.
For example, we want to search tokens with
Location label. Type label:location and the search results will show a list of tokens labeled
Guidelines extension contains the guidelines that the labelers should follow. The guidelines must be in
There are three ways to add guidelines.
Drag and drop a file to the guidelines box
Upload a file by clicking on the blue box
Choosing from the library by clicking on the triple-dots in the top-right corner
Uploading new guidelines will replace any existing guidelines.
Note: maximum guideline size is 1 MB
We recognize that many Datasaur users are power users. We provide keyboard shortcuts to help make labeling more efficient.
The most commonly used shortcuts are available in the label boxes that are displayed while labeling a token. In the example below, you can select
1 to apply the
In document-labeling, shortcuts are displayed in the extension as shown below. These shortcuts only appear if you choose Dropdown as the question type.
You can access the full version of the keyboard shortcuts by clicking the Help button.