Creating a Project

After signing in, you will be automatically directed to your personal workspace. If you find yourself in your personal workspace, switch to your team workspace. You can do so by selecting on your avatar in the top right. Choose “Switch Workspace” and then select your team workspace. You will be brought to your Project page of the workspace. The project page allows the Admin of the team to create projects. On this page you can see the Project shortcuts and the list of Projects that you are working on.

Creating a Project can be done by clicking on the Create project button: this enables you to create any type of project. You can also create a project by selecting one of the Project Template shortcuts: these selections contain pre-selected settings for that specific use-case. In this article, we will walk through creating a new project.

Ready to make our first project? We are going to create a Token-Based project together (span-based labeling). This type of project enables you to do use-case like Named Entity Recognition, Parts of Speech, and more. Any workflow that requires labeling specific words and/or phrases can be done with Datasaur’s Token-based project. If you would like a tutorial on creating a project for Row-based projects (textual classification), Audio, OCR, Bounding Box, or Document-Based Projects please watch their corresponding Youtube videos. Once you have selected “Create Project” you will find yourself in the Project Creation Wizard.

Project Creation Wizard

The Project Creation Wizard is a tool for creating custom Projects. It has five basic steps: Upload, Preview, and Labeler's tasks, Assignment, and Project settings.

Step 1: Upload your files (Video Tutorial)

You can see a list of the file formats that Datasaur natively supports for each project type by expanding the 'Supported File Types' section.

As an example, we will create a Span Labeling project by uploading several .txt files.

When uploading multiple files, ensure they are all in the same file format.

Files can be uploaded in three ways: by dragging and dropping, browsing files from your hard drive, or fetching files through external object storage. Note: the maximum file size allowed is 50 MB. If you are interested in creating project via API, you can find the documentation here.

Add Project Tags

Furthermore, you can also add one or multiple project tags by selecting the available tags or creating the new one.

If you forget to select the project tags at this step, you can also add them through the project management page. Simply follow the steps outlined here.

Step 2: Preview the uploaded file (Video Tutorial)

In this step, we get to decide two different options for our data: separation of lines and the tokenizer. Line Separator Line Separator decides how your rows in the labeling interface are split. There are two native options available:

  1. New Line will create a new row for every new line that was in your original data.

  2. Dot (.) will create a new line after each “.” in your data.

Tokenizer

Datasaur offers two options for your tokenizer: whitespace and wink.

Step 3: Labeler's tasks (Video Tutorial)

In this step, you must choose which task type you would like to work on. A detailed explanation of each task type can be found here.

Since we have previously uploaded .txt files, the available task types to choose from are Span and Document Labeling. In this case, we will proceed with choosing Span Labeling. Therefore, we need to provide labels to be used later in the project.

There are three different ways to upload or create our labels.

  1. Create Labels in the UI Select 'Create your own' to simply begin manually typing in your labels (as you seen in the image below). You can also manually select the color for each of your labels.

  2. Upload Labels from File Select the white space to upload a CSV of your labels from your local drive. A good question you may have is – what format is the CSV? The formatting of the CSV is very simple: your first label is in the A1 cell, your subsequent labels should go down the A column (A2, A3, A4, A5, etc).

  3. Upload Labels from Your Team’s Saved Library In your team workspace, we have a page called Label Management. This page allows you to create, edit, and delete label sets. This enables your team to save all your label sets. Utilizing this method means you do not have to re-upload or re-create a label set each time you create a new project.

Configuring Span Labeling Settings

At the bottom of the page, you'll see a section called 'Span Labeling' where you will be able to configure several things.

Limit selection to a span of 1 token: you need every token in the document labeled.

Spans should have at most one label: multiple labels to a single token or span of tokens will not be allowed.

Allow arrows to be drawn between labels: allows you to draw arrows from one label to another to annotate relationships between words. For example, this is useful for showing that an adjective is related to a noun, or a pronoun is referring to a person.

Default text selection: select whether labelers will perform a token or character selection for their labels. For example, will labelers be applying labels to whole words at once (token) or will they be able to label the individual letters within a word (character). Note: Some languages may require you to change the selection to character selection, i.e. Mandarin, Korean, or Thai.

Step 4: Assignment (Video Tutorial)

In this step, we get to choose who will be a labeler and who will be a reviewer. When assigning personnel, you will have three roles available: Labeler, Labeler & Reviewer, and Reviewer.

Admins will have only two choices: Labeler & Reviewer, and Reviewer. An admin will always have, at a minimum, access to Reviewer Mode for any project.

Peer Review Consensus

Here we set how many labelers need to agree on a label for it to be automatically accepted by Datasaur. Peer review consensus slider allows you to determine the threshold at which labels will be automatically accepted. For highly sensitive projects where there is no room for error, you may want to ensure unanimity from all assigned labelers. For less sensitive projects where efficiency and cost are more important than accuracy, a majority vote may be sufficient. Any label where the threshold is not met will need to be manually reviewed by you, the project creator / reviewer.

If you check on No consensus, all of your labelers’ labels will be treated as conflicting labels.

Allow dynamic review assignment allows you to assign your team member as a reviewer automatically when the labelers have conflicts in a project. The detailed information can be found here.

Step 5: Configuring project settings (Video Tutorial)

In this step, we chose some final, advanced admin settings for the project.

Keep in mind that most of these choices are intended for advanced requirements.

Labeling Settings

Label set modification allows your labelers to add, edit, or remove labels in the project through the labeling interface.

Text modification enables your labelers to edit the text of the dataset.

Mask Personally Identifiable Information (PII) allows admin to decide whether sensitive information should be covered by asterisks or random characters. It also allows the admin to decide what type of information should be masked (for example: addresses, social security numbers, company name, etc.)

Allow marking unapplied label classes as N/A will present all the labels that were not applied in the project and allow the labelers to mark them as not applicable (N/A).

Reviewing Settings

Show labeler names in Review Mode if you would like to mitigate the chances of bias you can select to not show their name in Reviewer Mode.

Show rejected labels in Review Mode will allow reviewers to be able to see all labels that they have rejected.

Show labels from inactive label set in Review Mode if your project has multiple label sets, this enables the reviewers to see the labels from every label set all at once.

Show original sentences in Review Mode means that the reviewers will see all the original sentences compared to any edits the labelers have made.

Set notification for labeler's project completion – by default, reviewers will be notified that the project is ready when all of the labelers have marked their work as complete. This slider allows you to manually set which number of completion will trigger the email notification. At this point, we have finished configuring the project and can click the 'Launch Project' button to create the project. Happy Labeling!

Last updated