Common Terminology

A glossary of words we frequently use

Natural Language Processing (NLP)

The field of artificial intelligence specific to text and linguistics.

Labeler

The person doing the labeling. Sometimes also referred to as annotators or taggers.

Reviewer

Someone assigned to review the labels of another colleague.

Project

At Datasaur, every labeling task starts with a project. A project can have multiple files, and each file will likely contain many labels.

Cabinet

It's an isolated version of a project where individual team members can work independently. This approach allows labelers to work on their own set of documents, which are later consolidated in the Reviewer mode if any conflicts arise. Each labeler has their own cabinet. The reviewers, on the other hand, only has a single cabinet which will be shared together by all reviewers.

  • For example, if a project with two labelers and one reviewer, there will be two labeler cabinets and one reviewer cabinet. Each cabinet will contain the respective assigned documents.

Consensus

The level of agreement required among labelers before a label is automatically accepted. This mechanism ensures that multiple labelers reach a mutual agreement on the annotation.

  • For example, if the consensus is set at 2, and Labeler 1 (L1) labels a span as a PERSON, it will not be automatically accepted since it hasn't met the minimum requirement of two agreements. However, if Labeler 2 (L2) labels the same span as a PERSON, thereby meeting the consensus requirement, the label will be automatically accepted.

Entity

A conceptual person, object, or location mentioned in a document. Oftentimes, the token or span of tokens to be labeled in a NER project.

Cell

A Cell is a box that is used to display data in the Editor. For example, in the above picture, the box that contains the text "Sherlock Holmes become widely popular in 1891" is a Cell. Cells are structured in a matrix-like manner.

Cell's Line is its position relative to the vertical axis, it's numbered from 0 starting from top to bottom.

Cell's Index is its position relative to the horizontal axis, it's numbered from 0 starting from left to right. We refer to a Cell by using its Line and Index. For example, Cell with Line equals 3 and Index equals 0, is the Cell that contains the text "All but one are set in the Victorian or Edwardian eras, between about 1880 and 1914."

Spans and Characters

Typically, the atomic unit in a document. This can be a single word but can also refer to punctuation such as '.'.

Spans are indexed starting from 0 within each cell, and its characters are indexed starting from 0 as well. For example, the Span "popularity" has index 1 in the current Cell and thecaller "u" has index 3 in "popularity".

Label Set

A Label Set is a set of labels that are related one to another. Each projects in Datasaur can have up to 5 different label sets. Label Set are indexed starting from 0 to 4. We used to refer to Label Set Index as Layer.

Last updated