How to accelerate annotation with machine learning
Kili Technology allows to use machine learning to speed up your annotation project.
You can:
Import labels. They can be:
- predictions from your custom model
- predictions from a weakly supervised learning framework
- human labeled data from previous project or other sources
Orchestrate online learning with an autoML framework
Implement active learning strategies with queue prioritization
Import labels
Kili Technology allows you to directly import already existing labels, so that annotators can start working on pre-annotated assets. Their work complexity will be reduced : they will just need to validate the pre-annotations, eventually correct a few of them and complete the annotation. It is always easier than starting from scratch!
You can also use this feature to run quality checks: for instance upload groundtruth labels to review the annotators work (see e.g Honeypot).
Predictions from your custom model
You have a custom, in-house model that already detects or adds labels to your assets ? Once the inference phase is done on your dataset, you can upload your predictions using this recipe on Kili-Playground.
In case you have multiple models, you can still "tag" your predictions with the source model. Simply fill in the modelName
field in the API. You'll then be able to filter by models when working with the assets and labels.
Predictions from a weakly supervised learning framework
Weakly supervised learning maturity depends on your task complexity. Our experience shows that is can be extremely powerful on text annotation, classification and Named Entities Recognition tasks.
Weak supervision is the ability to combine weak predictors in order to build a more robust one, for instance:
- Hard-coded heuristics: usually regular expressions (regexes)
- Syntactics: for instance, Spacy’s dependency trees
- Distant supervision: external knowledge bases
- Noisy manual labels: crowdsourcing
- External models: other models with useful signals
To know a bit more about weak supervision, start here.
We are used to working with Snorkel, a framework created at Standford. After having defined your own pre-annotation functions, you can upload your predictions to Kili. You can find helphere on Kili-Playground.
Human labeled data
For a variety of good reasons you could need to review/re-annotate human labeled data.
For instance: reviewing or re annotating an annotated dataset sourced outside, making a quality check on an already annotated datasets, labeling the human generated logs from a chat bot framework... In such cases the import process does not change : you can upload your predictions, assets and already existing labels into Kili.
Customer success stories
Using preannotations helped speed up the labeling process for a wide variety of use cases and tasks :
- Semantic segmentation :
- Performance increased by 70%, for a client doing medical imaging.
- Bounding box detections :
- Performance increased by 45%, for a client doing facilities inspection.
- NER and text classification :
- Performance increased by 30% for a bank and insurance client.
- Video object tracking :
- Performance increased by 50% using clever pre annotations.