Krippendorff's Alpha Calculation

Explain how Datasaur implements the Krippendorff's Alpha algorithm.

Krippendorff's Alpha is one of the algorithms that is supported by Datasaur to calculate the agreement while taking into account the possibility of chance agreement. We will deep dive into how Datasaur collects all labels from labelers and reviewers in a project and process them into an Inter-annotator Agreement matrix.

Sample Data

Suppose there are 2 labelers and 1 reviewer — Labeler A, Labeler B, and Reviewer — who labeled the same spans. Labeler A work is visualized in Image 1, Labeler B work is visualized in Image 2, and Reviewer work is visualized in Image 3.

Calculating the Agreement

In this section, we will see the calculation detail between Labeler A and Reviewer.

1. Arranging the data

First, we need to arrange the sample data into Table 1 for the better visualization.

Table 1. Sample Data

SpanLabeler AReviewer

The Tragedy of Hamlet

EVE

TITLE

Prince of Denmark

PER

Hamlet

PER

PER

William Shakespeare

PER

PER

1599

YEAR

YEAR

1601

YEAR

YEAR

Shakespeare

ORG

PER

30557

QTY

2. Cleaning the data

Second, we need to remove spans that only have 1 label i.e. Prince of Denmark and 30557. They should be removed because spans with a single label will introduce a calculation error. The calculation result will still show the agreement level between 2 annotators. The cleaned data is shown in Table 2.

Table 2. Cleaned Data

SpanLabeler AReviewer

The Tragedy of Hamlet

EVE

TITLE

Hamlet

PER

PER

William Shakespeare

PER

PER

1599

YEAR

YEAR

1601

YEAR

YEAR

Shakespeare

ORG

PER

3. Creating the agreement table

Third, we need to create an agreement table based on the cleaned data. The table is visualized in Table 3.

Based on the table, 5 values are calculated: nn, rir_i, rkr_k, rr, and rr'.

Total spans in the data

  • nn is the total spans in the data.

    • Here, n=6n=6 because there are 6 spans.

Total labels in each span

ri=k=1mrik(1)r_i=\sum\limits_{k=1}^{m}r_{ik} (1)
  • rir_i is the total labels that span ii has.

  • mm is the total number of label.

    • Here, m=5m=5 because there are 5 labels.

  • rikr_{ik} is the number of kk label in span ii.

Here is the calculation result.

  • r1=r1,EVE+r1,ORG+r1,PER+r1,TITLE+r1,YEAR=1+0+0+1+0=2r_1=r_{1,EVE}+r_{1,ORG}+r_{1,PER}+r_{1,TITLE}+r_{1,YEAR}=1+0+0+1+0=2

  • r2=r2,EVE+r2,ORG+r2,PER+r2,TITLE+r2,YEAR=0+0+2+0+0=2r_2=r_{2,EVE}+r_{2,ORG}+r_{2,PER}+r_{2,TITLE}+r_{2,YEAR}=0+0+2+0+0=2

  • r3=r3,EVE+r3,ORG+r3,PER+r3,TITLE+r3,YEAR=0+0+2+0+0=2r_3=r_{3,EVE}+r_{3,ORG}+r_{3,PER}+r_{3,TITLE}+r_{3,YEAR}=0+0+2+0+0=2

  • r4=r4,EVE+r4,ORG+r4,PER+r4,TITLE+r4,YEAR=0+0+0+0+2=2r_4=r_{4,EVE}+r_{4,ORG}+r_{4,PER}+r_{4,TITLE}+r_{4,YEAR}=0+0+0+0+2=2

  • r5=r5,EVE+r5,ORG+r5,PER+r5,TITLE+r5,YEAR=0+0+0+0+2=2r_5=r_{5,EVE}+r_{5,ORG}+r_{5,PER}+r_{5,TITLE}+r_{5,YEAR}=0+0+0+0+2=2

  • r6=r6,EVE+r6,ORG+r6,PER+r6,TITLE+r6,YEAR=0+1+1+0+0=2r_6=r_{6,EVE}+r_{6,ORG}+r_{6,PER}+r_{6,TITLE}+r_{6,YEAR}=0+1+1+0+0=2

Total of each label

rk=i=1nrik(2)r_k=\sum\limits_{i=1}^{n}r_{ik} (2)
  • rkr_k is the total of kk label in the data.

  • nn is the total spans in the data.

  • rikr_{ik} is the number of kk label in span ii.

Here is the calculation result.

  • rEVE=r1,EVE+r2,EVE+r3,EVE+r4,EVE+r5,EVE+r6,EVE=1+0+0+0+0+0=1r_{EVE}=r_{1,EVE}+r_{2,EVE}+r_{3,EVE}+r_{4,EVE}+r_{5,EVE}+r_{6,EVE}=1+0+0+0+0+0=1

  • rORG=r1,ORG+r2,ORG+r3,ORG+r4,ORG+r5,ORG+r6,ORG=0+0+0+0+0+1=1r_{ORG}=r_{1,ORG}+r_{2,ORG}+r_{3,ORG}+r_{4,ORG}+r_{5,ORG}+r_{6,ORG}=0+0+0+0+0+1=1

  • rPER=r1,PER+r2,PER+r3,PER+r4,PER+r5,PER+r6,PER=0+2+2+0+0+1=5r_{PER}=r_{1,PER}+r_{2,PER}+r_{3,PER}+r_{4,PER}+r_{5,PER}+r_{6,PER}=0+2+2+0+0+1=5

  • rTITLE=r1,TITLE+r2,TITLE+r3,TITLE+r4,TITLE+r5,TITLE+r6,TITLE=1+0+0+0+0+0=1r_{TITLE}=r_{1,TITLE}+r_{2,TITLE}+r_{3,TITLE}+r_{4,TITLE}+r_{5,TITLE}+r_{6,TITLE}=1+0+0+0+0+0=1

  • rYEAR=r1,YEAR+r2,YEAR+r3,YEAR+r4,YEAR+r5,YEAR+r6,YEAR=0+0+0+2+2+0=4r_{YEAR}=r_{1,YEAR}+r_{2,YEAR}+r_{3,YEAR}+r_{4,YEAR}+r_{5,YEAR}+r_{6,YEAR}=0+0+0+2+2+0=4

Total labels in the data

r=i=1nri(3)r=\sum\limits_{i=1}^nr_i (3)
  • rr is the total labels in the data.

  • nn is the total spans in the data.

  • rir_i is the total labels that span ii has.

Here is the calculation result.

  • r=r1+r2+r3+r4+r5+r6=12r=r_1+r_2+r_3+r_4+r_5+r_6=12

Average number of labels per span

r=rn(4)r'=\frac{r}{n} (4)
  • rr' is the average number of labels per span.

  • nn is the total spans in the data.

Here is the calculation result.

  • r=rn=126=2r'=\frac{r}{n}=\frac{12}{6}=2

4. Choosing weight function

Fourth, we need a weight function to weight the labels. Every label is treated equally because one label is no difference than the other. Hence, the weight function that will be used is stated in Formula 5.

wik=rik(5)w_{ik}=r_{ik} (5)
  • wikw_{ik} is the weighted number of kk label in span ii.

  • rikr_{ik} is the number of kk label in span ii.

5. Calculating Pa

Fifth, the observed weighted percent agreement is calculated.

Weighted number of labels

We will start by calculating the weighted number of label using Formula (6).

rik+=l=1mwklril(6)r_{ik+}=\sum\limits_{l=1}^{m} w_{kl}r_{il} (6)
  • rik+r_{ik+} is the weighted number of kk label in span ii.

  • mm is the total number of label.

  • wklw_{kl} is the weighted number of ll label in span kk.

  • rilr_{il} is the number of ll label in span ii.

For example, we can apply Formula (6) to calculate the weighted EVE label in span 1.

r1,EVE+=l=15wEVE,lr1,l=11+00+00+01+00=1r_{1,EVE+}=\sum\limits_{l=1}^{5} w_{EVE,l}r_{1,l}=1*1+0*0+0*0+0*1+0*0=1

We need to calculate all of the span and label combination. The complete calculation result is visualized in Table 4.

Agreement percentage

After we got the weighted number of labels, we need to calculate the agreement percentage for a single span and label using Formula (7).

paik=rik(rik+1)r(ri1)(7)p_{a|ik}=\frac{r_{ik}(r_{ik+}-1)}{r'(r_i-1)} (7)
  • paikp_{a|ik} is the agreement percentage of kk label in span ii.

  • rikr_{ik} is the number of kk label in span ii.

  • rik+r_{ik+} is the weighted number of kk label in span ii.

  • rr' is the average number of labels per span.

  • rir_i is the total labels that span ii has.

For example, we can apply Formula (7) to calculate the agreement percentage of EVE label in span 1.

pa1,EVE=r1,EVE(r1,EVE+1)r(r11)=1(11)2(21)=0p_{a|1,EVE}=\frac{r_{1,EVE}(r_{1,EVE+}-1)}{r'(r_1-1)}=\frac{1(1-1)}{2(2-1)}=0

We need to calculate all of the span and label combination. The complete calculation result is visualized in Table 5.

Agreement percentage of a single span

We can simplify the result by getting the agreement percentage of a single span using Formula (8).

pai=k=1mpaik(8)p_{a|i}=\sum\limits_{k=1}^{m} p_{a|ik} (8)
  • paip_{a|i} is the agreement percentage of span ii.

  • mm is the total number of label.

  • paikp_{a|ik} is the agreement percentage of kk label in span ii.

For example, we can apply Formula (8) to calculate the agreement percentage of span 1.

pa1=k=15pa1,k=0+0+0+0+0=0p_{a|1}=\sum\limits_{k=1}^{5} p_{a|1,k}=0+0+0+0+0=0

We need to calculate the agreement percentage of all spans. The complete calculation result is visualized in Table 6.

Average agreement percentage

From the previous calculation, we can calculate the average agreement percentage using Formula (9).

pa=1ni=1nPai(9)p_a'=\frac{1}{n}\sum\limits_{i=1}^{n}P_{a|i} (9)
  • pap_a' is the average agreement percentage.

  • nn is the total spans in the data.

  • paip_{a|i} is the agreement percentage of span ii.

We can apply Formula (9) to calculate the average agreement percentage.

pa=16i=16Pai=16(0+1+1+1+1+0)=0.6666p_a'=\frac{1}{6}\sum\limits_{i=1}^{6}P_{a|i}=\frac{1}{6}(0+1+1+1+1+0)=0.6666

Calculating Pa

Finally, the observed weighted percent agreement is calculated using Formula (10).

pa=pa(11nr)+1nr(10)p_a=p_a'(1-\frac{1}{nr'})+\frac{1}{nr'} (10)
  • pap_a is the observed weighted percent agreement.

  • pap_a' is the average agreement percentage.

  • nn is the total spans in the data.

  • rr' is the average number of labels per span.

We can apply Formula (10) to calculate the observed weighted agreement percentage.

pa=pa(11nr)+1nr=0.6666(116×2)+16×2=0.6944p_a=p_a'(1-\frac{1}{nr'})+\frac{1}{nr'}=0.6666(1-\frac{1}{6\times2})+\frac{1}{6\times2}=0.6944

6. Calculating Pe

Sixth, the chance weighted percent agreement is calculated.

Classification probability

We start by calculating the classification probability for each label using Formula (11).

πk=rkr(11)\pi_k=\frac{r_k}{r} (11)
  • πk\pi_k is the classification probability for kk label.

  • rkr_k is the total of kk label in the data.

  • rr is the total labels in the data.

Here is the calculation result.

  • πEVE=rEVEr=112=0.0833\pi_{EVE}=\frac{r_{EVE}}{r}=\frac{1}{12}=0.0833

  • πORG=rORGr=112=0.0833\pi_{ORG}=\frac{r_{ORG}}{r}=\frac{1}{12}=0.0833

  • πPER=rPERr=512=0.4166\pi_{PER}=\frac{r_{PER}}{r}=\frac{5}{12}=0.4166

  • πTITLE=rTITLEr=112=0.0833\pi_{TITLE}=\frac{r_{TITLE}}{r}=\frac{1}{12}=0.0833

  • πYEAR=rYEARr=412=0.3333\pi_{YEAR}=\frac{r_{YEAR}}{r}=\frac{4}{12}=0.3333

Calculating Pe

To calculate the chance weighted percent agreement, Formula (11) can be applied to Formula (12).

pe=k=1mπk2(12)p_e=\sum\limits_{k=1}^{m}{\pi_k}^2 (12)
  • pep_e is the chance weighted percent agreement.

  • mm is the total number of label.

  • πk\pi_k is the classification probability for kk label.

Here is the chance weighted percent agreement calculation.

pe=k=1mπk2p_e=\sum\limits_{k=1}^{m}{\pi_k}^2

pe=πEVE2+πORG2+πPER2+πTITLE2+πYEAR2p_e={\pi_{EVE}}^2+{\pi_{ORG}}^2+{\pi_{PER}}^2+{\pi_{TITLE}}^2+{\pi_{YEAR}}^2

pe=0.08332+0.08332+0.41662+0.08332+0.33332p_e=0.0833^2+0.0833^2+0.4166^2+0.0833^2+0.3333^2

pe=0.3055p_e=0.3055

7. Calculating the Alpha

Finally, Krippendorff's alpha is calculated using Formula (13).

α=pape1pe(13)\alpha=\frac{p_a-p_e}{1-p_e} (13)
  • α\alpha is the Krippendorff's alpha between Labeler A and Reviewer.

  • pap_a is the observed weighted percent agreement.

  • pep_e is the chance weighted percent agreement.

We can get the α\alpha by applying pap_a and pep_e to Formula (13).

α=pape1pe=0.69440.305510.3055=0.56\alpha=\frac{p_a-p_e}{1-p_e}=\frac{0.6944-0.3055}{1-0.3055}=0.56

Summary

  • We apply the same calculation for agreement between labelers, and between reviewer and labelers.

  • Missing labels from a single labeler will be removed.

  • The percentage of chance agreement will vary depending on:

    • The number of the labels in a project.

    • The number of label options.

  • When both labelers agree but the reviewer rejects the labels:

    • The agreement between the two labelers increases.

    • The agreement between the labelers and the reviewer decreases.

Last updated