Reviewing NLP Projects

Overview

The “Reviewer Mode” is designed to facilitate efficient and effective oversight of the labeling process. As a reviewer, your role involves ensuring the accuracy and consistency of labeled data while maintaining a smooth workflow for labelers. This mode provides you with the tools and insights you need to uphold the quality standards of your project.

How it works?

You must have the Reviewer role first to use the Reviewer Mode. Roles in Datasaur can be viewed at the following link.

Span Labeling Reviewer Mode

You can see how conflicts in token labeling look. We have three types of conflicts in token labeling:

Contents conflict
Spans conflict
Arrows conflict

For more information about the difference between the three types of conflict, please refer to this link.

You can also hover over the conflicting label between two labelers and choose the best label answer by clicking the label.

You can also Go to the next conflict and the previous conflict in the Go toolbar. Or by clicking Alt+Shift+Right for going to the next conflict and Alt+Shift+Left for going to the previous conflict.

We also differentiate the label color based on the label's status

Your label color will be gray when your label has already reached consensus among the labelers.

Your label color will be yellow when you use our Assisted Labeling Functionality; it will show you that this label comes from the Assisted Labeling Extension by making the color yellow.

Your label color will be blue when your labeler and reviewer have different answers; it will show you that this label has an incorrect or rejected status by the reviewer.

Your label color will be purple when your reviewer labels the token; it will show you that the label is labeled by the reviewer.

Your label color will be red when the status is unresolved or conflicted with another labeler; it will show you that the label is unresolved or conflicted.

In token labeling, you can also see the number of token labels applied, the last labeled row, and the total solved rows. You can see it in the lower-right corner of the table display

Row Labeling Reviewer Mode

Unlike token labeling, the reviewing process in row labeling involves accepting answers within the Document Labeling extension. When reviewing a row labeling project, there are two primary things:

Table

Line color
- White color: Rows containing a consensus or those already resolved by the reviewer.
- Yellow color: Rows containing both a consensus answer and a conflicting answer. This occurs when a question has multiple values enabled.
- Red color: Rows without any consensus
- Blue color: Selected rows.
Answer in the table
Submitting answers in the Document labeling extension will trigger the display of answers in the table.
- Empty answers in the table
  - No consensus
- Answers displayed in the table
  - Meet consensus
  - Mix consensus and conflict
    Only display the consensus answers. The conflicted answer will be displayed after the reviewer resolves it.
  - Resolved answers by a reviewer

Document labeling extensions

Consensus rows
- Answers are in shown blue-colored labels
- Answers are selected in the question field
Mix consensus and conflict rows

Conflict answers are shown in the red-colored labels
Consensus answers are shown in blue-colored labels and selected

Document Labeling Reviewer Mode

In essence, the behavior is similar to row labeling, but let's delve into the specifics of the Document labeling extension. ✨

Bounding Box Labeling Reviewer Mode

We are planning to enable the review feature for bounding box labeling in the near future. It’s coming soon!

LLM Evaluation Reviewer Mode

Reviewer can input their own rating and answer by click the Edit button and edit the answer (and rating) the click the submit button on the bottom.

Reviewer also able to select the labelers’ answer as shown on the image below. Click the submit button after you select one of the labelers’ answer.

LLM Evaluation Report

When you click “Mark Project as Complete” in the LLM Evaluation project on Reviewer mode, a window appears. This window includes the LLM Evaluation Report Card and a prompt to download the 'LLM Evaluation Format.csv'. The calculations within this report are derived from the reviewers' answers.

LLM Evaluation Report Card

The LLM Evaluation Report Card consists of:

Average Rating Score (0.00 - 1.00): This score is the average of all prompts rated across all files. Datasaur rounds the LLM Evaluation Score: values equal to or greater than 0.5 are rounded up, while values less than 0.5 are rounded down.
Rating Distribution Bar Chart: This chart visualizes the distribution of 1–5 star ratings and includes a section for unrated items.
Unrated Definition: “Unrated” refers to prompts that have not received any reviewers' answers.