Datasaur
Search…
⌃K

Extensions and Shortcuts

Making your labeling process even easier.
Datasaur provides a number of extensions and shortcuts to make labeling even easier.

Extensions

Labels

A core extension to any project, the Labels extension contains the label set or taxonomy used for the project. There are three ways to add a label set:
  1. 1.
    Upload a label set with a .csv or .tsv format
  2. 2.
    Choose from a library of label sets
  3. 3.
    Type the labels in manually
Note: maximum label set size is 500 KB

Built-in label sets

The library has three label sets loaded by default: INACL, Stanford, and Default NER. If you have created a project before, previous label sets will be automatically added to the library.
📌 One important thing to note is that uploading or choosing a new label set will delete all existing labels in the document.
Modify label set items
While labeling your project, it is possible to add or remove label items dynamically. Furthermore, you can edit or add label items and edit the label color by clicking on the triple dots and clicking Edit label set, as shown below. We provide twelve colors that you can select from.
You can also move a label by clicking the dots and dragging the label up and down.
💡 Best practice: Some users find that colors can help them memorize labels and reduce human error while labeling.
Show instances by labels
As you can see in the screenshots above, Datasaur count all labels applied to a span of tokens and a relation. If you click on the counter, you can see the instances.
Clicking one of the instances will directly go to the instance's location in the text editor.

List of Files

The List of Files extension lists all the files that you upload to a project. The files will be arranged alphabetically.
As you can see, there are multiple button controls in the picture above and each has different functions.
  1. 1.
    The Search bar allows you to find specific files in the project.
  2. 2.
    The Filter button allows you to filter out documents based on the status:
    • All Files
    • Only show the In Progress documents
    • Only show the Completed documents
  3. 3.
    The Document Icon on the right side of the filename can be used to mark the document as complete. Press the button when you finish Labeling the document.
  4. 4.
    The Up/Down buttons allow you to go to the previous or the next file.
  5. 5.
    The Number Field at the bottom right allows you to jump into the desired file directly by typing the file's order.

ML-assisted Labeling

ML-assisted Labeling extension allows you to label semi-automatically. By default, spaCy is loaded in. If you click the dropdown, you can change it to use built-in service providers, HuggingFace, or a Custom API instead.
Before using this feature, please ensure that your project is already has a label set containing the appropriate label set items. For example, you want to use spaCy. You should add a label set on the Labels extension containing PERSON, NORP, FACILITY, etc.

Multiple label sets

As we have multiple label sets capability in a single project, this also reflects to the intelligence site. Datasaur ML-Assisted Labeling feature will work properly if you are in an active label set.
For example, you have the NER label set and the POS label set in your token-based project. You are going to use a built-in spaCy model to help you label the project. Before you use the feature, you have to ensure that you are in the NER label set by changing the Label Set Dropdown to the NER label set. Then, spaCy can detect your labels

Dictionary

The Dictionary extension contains two dictionaries - English and Bahasa Indonesia. The English dictionary sources are obtained from Merriam-Webster and WordNet. Meanwhile, the Indonesian dictionary is from Kamus Besar Bahasa Indonesia (KBBI).
This extension will be active when you click the token. This will show the part of speech, the definition entries, and the definition. You can change the desired dictionary set by clicking the triple-dots menu in the top-right corner.
The Search extension allows you to search for words and labeled tokens that match the search. By searching for a word or phrase, Datasaur will identify all matches in the document or project. These results are shown in the results list at the bottom of the extension.
There are additional settings that can be accessed by clicking on the Settings gear icon.
Search by Labels
Search by labels can be done by selecting the dropdown beside the keyword field. Once you choose to search by label, you can type out the label you want to search.
Search Options: by Text or Label
Search All Files
Search All Files allows you to search through all files. The results that are displayed in the results list will be from all files in the project.
Regex
Regex allows you to search using regular expressions. For example, men* will show all words that start with men.
Exact words
Exact words will show the exact matches only. For example, men will match with men but not to mentioned.
Label All
Label All allows you to label all matching tokens in the token-based project.
For example, type Walt Disney in the search box. It will show the number of Walt Disney instances in the file. Select Person on the label box and press the Label All button. All the Walt Disney instances in the document will be labeled as Person.
This applies to any setting above.
Go to next/previous result
To improve your convenience to check the result, you can use the arrow up or arrow down to go to the next or previous result. You can also click the arrow up and down at the bottom of the search extension to go next or previous result.
Notes
  • Search by label, search all files, regex, and exact word settings are available for the token or row-based project.
  • Label All functionality is only available on the token-based project

Labeling Guidelines

Labeling Guidelines extension contains the guidelines that the labelers should follow. The guidelines must be in .txt or .md format.
There are three ways to add guidelines.
  1. 1.
    Drag and drop a file to the guidelines box
  2. 2.
    Upload a file by clicking on the blue box
  3. 3.
    Choosing from the library by clicking on the triple-dots in the top-right corner
Uploading new guidelines will replace any existing guidelines.
Note: maximum guideline size is 1 MB

Legend Color

Legend color can be viewed at the bottom right of the page. It can be viewed in Reviewer Mode and Labeler Mode.
Reviewer Mode Legend:
Labeler Mode:

Metadata

Metadata extensions allow you to see metadata that has been set previously with an import file transformer. Here is the code that can be used for metadata.
CellMetadata {
key: string;
value: string;
type?: string;
pinned?: boolean;
config?: {
backgroundColor: string;
color: string;
borderColor: string;
}
}
The code above is useful for providing additional info that will describe a line. Meanwhile, users can choose additional info to show only in the extension or above the related row.
Metadata in labeling interface

Grammar Checker

Grammar Checker is a Datasaur extension in the token-based project which corrects word mistakes on the dataset.

How to Use Grammar Checker:

  1. 1.
    Add the Grammar Checker from the Extension
  2. 2.
    Select the Service Provider
  3. 3.
    Click "Enable Checker" to enable the Grammar Checker
  4. 4.
    Click "Disable Checker" to disable the Grammar Checker

How It Works:

  • The Grammar Checker will check the entire document
  • It will show the list of grammar mistakes on the grammar checker extension and red underline the mistakes on the token editor
The grammatically mistaken text will be given a red underline
The list of grammar mistakes after enabling the Grammar Checker
  • Clicking the item on the list of grammar mistakes will jump to the row where it refers to
  • When the Grammar Checker is enabled, it will always recheck the line where text modification happens
About LanguageTool:
LanguageTool is a free online proofreading service for English, Spanish, and 20 othe languages. Instantly check your text for grammar and style mistakes.
  • We use LanguageTool opensource
  • Here is the link of LanguageTool rules

Keyboard Shortcuts

We recognize that many Datasaur users are power users. We provide keyboard shortcuts to help make labeling more efficient.
The most commonly used shortcuts are available in the label boxes that are displayed while labeling a token. In the example below, you can select 1 to apply the date label.
Source: https://www.gradesaver.com/a-study-in-scarlet/study-guide/summary
In document-labeling, shortcuts are displayed in the extension as shown below. These shortcuts only appear if you choose Dropdown as the question type.
You can access the full version of the keyboard shortcuts by clicking the Help button or by Pressing CTRL + / on your keyboard.
Last modified 17d ago