# Inter-annotator Agreement (IAA) Calculation

Explain how Datasaur turns labelers and reviewers labels into IAA matrix.

In Datasaur, we use Cohen's Kappa Datasaur to calculate the agreement while taking into account the possibility of chance agreement. We will deep dive into how Datasaur collects all labels from labelers and reviewers in a project and process them into an Inter-annotator Agreement matrix.

Suppose there are 2 labelers who labeled the same sentences:

Labeler A

Labeler B

Reviewer

Based on the screenshots above, we map those labels into the agreement records below:

Position in sentence | Labeler A | Labeler B | Reviewer |
---|---|---|---|

The Tragedy of Hamlet | EVE | TITLE | TITLE |

Prince of Denmark | PER | <EMPTY> | <EMPTY> |

Hamlet | PER | TITLE | PER |

William Shakespeare | PER | PER | PER |

1599 | YEAR | YEAR | YEAR |

1601 | YEAR | YEAR | YEAR |

Shakespeare | ORG | ORG | PER |

30,557 | <EMPTY> | <EMPTY> | QTY |

Then, we construct the records into the agreement table. We use Labeler A and Labeler B data for the simulation.

From the table above, there are

**7**records with**4**agreements.The observed proportionate agreement is:

To calculate the probability of random agreement, we note that:

- Labeler A labeled
`EVE`

once and Labeler B didn't label`EVE`

. Therefore, the probability of random agreement on the label`EVE`

is:

- Compute the probability of random agreement for all labels:

The full random agreement probability is the sum of the probability of random agreement for all labels:

Finally, we can calculate the Cohen's Kappa:

- We apply the same calculation for agreement between labelers, and between reviewer and labelers.
- Missing labels from a single labeler will be counted as having applied empty labels.
- The percentage of chance agreement will vary depending on:
- The number of the labels in a project.
- The number of label options.

- When both labelers agree but the reviewer rejects the labels:
- The agreement between the two labelers increases.
- The agreement between the labelers and the reviewer decreases.

Last modified 6mo ago