Extensions and Shortcuts
Making your labeling process even easier.
Datasaur provides a number of extensions and shortcuts to make labeling even easier.
A core extension to any project, the Labels extension contains the label set or taxonomy used for the project. There are three ways to add a label set:
- 1.Upload a label set with a
- 2.Choose from a library of label sets
- 3.Type the labels in manually
Note: maximum label set size is 500 KB
The library has three label sets loaded by default:
Default NER. If you have created a project before, previous label sets will be automatically added to the library.
📌 One important thing to note is that uploading or choosing a new label set will delete all existing labels in the document.
Modify label set items
While labeling your project, it is possible to add or remove label items dynamically. Furthermore, you can edit or add label items and edit the label color by clicking on the triple dots and clicking Edit label set, as shown below. We provide twelve colors that you can select from.
You can also move a label by clicking the dots and dragging the label up and down.
💡 Best practice: Some users find that colors can help them memorize labels and reduce human error while labeling.
Show instances by labels
As you can see in the screenshots above, Datasaur count all labels applied to a span of tokens and a relation. If you click on the counter, you can see the instances.
Clicking one of the instances will directly go to the instance's location in the text editor.
The List of Files extension lists all the files that you upload to a project. The files will be arranged alphabetically.
As you can see, there are multiple button controls in the picture above and each has different functions.
- 1.The Search bar allows you to find specific files in the project.
- 2.The Filter button allows you to filter out documents based on the status:
- All Files
- Only show the In Progress documents
- Only show the Completed documents
- 3.The Document Icon on the right side of the filename can be used to mark the document as complete. Press the button when you finish Labeling the document.
- 4.The Up/Down buttons allow you to go to the previous or the next file.
- 5.The Number Field at the bottom right allows you to jump into the desired file directly by typing the file's order.
For example, you have the NER label set and the POS label set in your token-based project. You are going to use a built-in spaCy model to help you label the project. Before you use the feature, you have to ensure that you are in the NER label set by changing the Label Set Dropdown to the NER label set. Then, spaCy can detect your labels
This extension will be active when you click the token. This will show the part of speech, the definition entries, and the definition. You can change the desired dictionary set by clicking the triple-dots menu in the top-right corner.
The Search extension allows you to search for words and labeled tokens that match the search. By searching for a word or phrase, Datasaur will identify all matches in the document or project. These results are shown in the results list at the bottom of the extension.
There are additional settings that can be accessed by clicking on the Settings gear icon.
Search by Labels
Search by labels can be done by selecting the dropdown beside the keyword field. Once you choose to search by label, you can type out the label you want to search.
Search Options: by Text or Label
Search All Files
Search All Files allows you to search through all files. The results that are displayed in the results list will be from all files in the project.
Regex allows you to search using regular expressions. For example,
men*will show all words that start with
Exact words will show the exact matches only. For example,
menwill match with
menbut not to
Label All allows you to label all matching tokens in the token-based project.
For example, type
Walt Disneyin the search box. It will show the number of
Walt Disneyinstances in the file. Select
Personon the label box and press the Label All button. All the
Walt Disneyinstances in the document will be labeled as
This applies to any setting above.
Go to next/previous result
To improve your convenience to check the result, you can use the
arrow downto go to the next or previous result. You can also click the arrow up and down at the bottom of the search extension to go next or previous result.
- Search by label, search all files, regex, and exact word settings are available for the token or row-based project.
- Label All functionality is only available on the token-based project
Labeling Guidelines extension contains the guidelines that the labelers should follow. The guidelines must be in
There are three ways to add guidelines.
- 1.Drag and drop a file to the guidelines box
- 2.Upload a file by clicking on the blue box
- 3.Choosing from the library by clicking on the triple-dots in the top-right corner
Uploading new guidelines will replace any existing guidelines.
Note: maximum guideline size is 1 MB
Legend color can be viewed at the bottom right of the page. It can be viewed in Reviewer Mode and Labeler Mode.
Reviewer Mode Legend:
The code above is useful for providing additional info that will describe a line. Meanwhile, users can choose additional info to show only in the extension or above the related row.
Metadata in labeling interface
Grammar Checker is a Datasaur extension in the token-based project which corrects word mistakes on the dataset.
- 1.Add the Grammar Checker from the Extension
- 2.Select the Service Provider
- 3.Click "Enable Checker" to enable the Grammar Checker
- 4.Click "Disable Checker" to disable the Grammar Checker
- The Grammar Checker will check the entire document
- It will show the list of grammar mistakes on the grammar checker extension and red underline the mistakes on the token editor
The grammatically mistaken text will be given a red underline
The list of grammar mistakes after enabling the Grammar Checker
- Clicking the item on the list of grammar mistakes will jump to the row where it refers to
- When the Grammar Checker is enabled, it will always recheck the line where text modification happens
We recognize that many Datasaur users are power users. We provide keyboard shortcuts to help make labeling more efficient.
The most commonly used shortcuts are available in the label boxes that are displayed while labeling a token. In the example below, you can select
1to apply the
In document-labeling, shortcuts are displayed in the extension as shown below. These shortcuts only appear if you choose Dropdown as the question type.
You can access the full version of the keyboard shortcuts by clicking the Help button or by Pressing
CTRL + /on your keyboard.