Labeling Function Analysis
Allows users to view the results of their labeling functions, including coverage, overlaps, and conflicts, and to improve performance by training the label model
To get the results from labeling function analysis, you need to have some labeling functions and predict labels.
The labeling function analysis window has two ways to view the results: by clicking on the labeling function button, or through the "See labeling function analysis" button after predicting the labels.
Data programming extension
If you haven't predicted the labels, the labeling function analysis page will show an empty value.\
After clicking predict labels, the results will be shown in the labeling function analysis. There are three metrics: coverage, overlaps, and conflicts.
- 1.Coverage is the fraction of the dataset each labeling function labels.
- 2.Overlaps are the fraction of the dataset where each labeling function and at least another labeling function label.
- 3.Conflicts are the fraction of the dataset where each labeling function and at least another labeling function label, and they disagree.
If you have a new labeling function or have made changes to your labeling function, you need to re-predict labels in order to update the analysis value of the labeling function.
Outdated labeling function value
The ideal situation for labeling function is to have high coverage, high overlap, and low conflicts. Below is a use case of labeling function performance conditions:
It means our LFs can label a lot of data points and the majority of data points were assigned more than one LFs with different labels. We have one example of performance metrics value below.
- 1.Coverage = 50%
- 2.Overlaps = 30%
- 3.Conflicts = 27%
This number shows that even though there is large coverage and overlaps, the disagreement between labeling functions happens in almost half of the coverage. To improve this, we need to train the label model to get the performance value between labeling functions. The performance value of labeling functions could estimate accuracies and correlations between labeling functions since we know some labeling functions could give high or low signals regarding the label.
We need to add several new labeling functions and try to identify which labeling function creates more conflicts by experimenting one by one. After identifying it, we can re-evaluate the labeling functions.