Cohen's Cappa Calculation

Explain how Datasaur implements the Cohen's Cappa algorithm.

Cohen's Kappa is one of the algorithms that is supported by Datasaur to calculate the agreement while taking into account the possibility of chance agreement. We will deep dive into how Datasaur collects all labels from labelers and reviewers in a project and process them into an Inter-annotator Agreement matrix.

Sample Data

Suppose there are 2 labelers—Labeler A and Labeler B—who labeled the same sentences.

And there is a reviewer who labeled the same sentences.

Calculating the Data

Agreement Records

Based on the screenshots above, we map those labels into the agreement records below:

Position in sentence	Labeler A	Labeler B	Reviewer
The Tragedy of Hamlet	EVE	TITLE	TITLE
Prince of Denmark	PER	<EMPTY>	<EMPTY>
Hamlet	PER	TITLE	PER
William Shakespeare	PER	PER	PER
1599	YEAR	YEAR	YEAR
1601	YEAR	YEAR	YEAR
Shakespeare	ORG	ORG	PER
30,557	<EMPTY>	<EMPTY>	QTY

Position in sentence

Labeler A

Labeler B

Reviewer

The Tragedy of Hamlet

EVE

TITLE

Prince of Denmark

PER

<EMPTY>

Hamlet

PER

TITLE

PER

William Shakespeare

PER

1599

YEAR

1601

YEAR

Shakespeare

ORG

PER

30,557

<EMPTY>

QTY

Agreement Table / Confusion Matrix

Then, we construct the records into the agreement table. We use Labeler A and Labeler B data for the simulation.

Calculating the Kappa

From the table above, there are 7 records with 4 agreements.

The observed proportionate agreement is:

To calculate the probability of random agreement, we note that:

Labeler A labeled EVE once and Labeler B didn't label EVE. Therefore, the probability of random agreement on the label EVE is:

Compute the probability of random agreement for all labels:

The full random agreement probability is the sum of the probability of random agreement for all labels:

Finally, we can calculate the Cohen's Kappa:

Kappa for Labeler A and Reviewer

Kappa for Labeler B and Reviewer

Summary

We apply the same calculation for agreement between labelers, and between reviewer and labelers.
Missing labels from a single labeler will be counted as having applied empty labels.
The percentage of chance agreement will vary depending on:
- The number of the labels in a project.
- The number of label options.
When both labelers agree but the reviewer rejects the labels:
- The agreement between the two labelers increases.
- The agreement between the labelers and the reviewer decreases.

Last updated 1 month ago