Kili Technology provides an interface for voice transcription in text. It allows you to listen to a soundtrack, to transcribe its content, to add metadata, speed up and slow down playback.
Audios can either be video files or audio files. Currently, we allow you to upload audios as URLs in a CSV. You will find information on the accepted format here.
The audio interface is divided into three parts.
1. Audio player
At the very left of the interface, the player allows to select a playback rate (ranging from x0.2 to x3.5 the actual speed).
The player will read the video if a video exists.
While playing, you can also replay (
⟲) the five last seconds before the current time.
The transcript works as a text editor. You can insert text in real time, divided in paragraphs. Each paragraph is characterized by a starting timestamp, a speaker and a text. To begin to type text, click on the white zone at the right of the speaker.
Create new paragraphs. When you hit "Enter" while typing, it will start a new paragraph at the time of the current timestamp. This allows you to transcribe in real time.
Adjust timestamps. If you want to adjust one timestamp, click on it. Valid timestamps are:
- full timestamps (for instance,
01:55:15for 1 hour, 55 minutes and 15 seconds)
- truncated timestamps with only minutes and seconds (for instance,
55:15for 0 hour, 55 minutes and 15 seconds)
- seconds (for instance,
4for 00:00:04 or
Special tokens. Some speech-to-text algorithms require special tokens to identify music and/or noises.
When you click on the music icon (
♪), a token indicating music is playing is inserted in the text editor.
When you click on the inaudible icon, a token indicating the audio is inaudible is inserted in the text editor.
Both tokens can be customized in the JSON interface under the key
Two modes are possible when transcribing: either the video keeps playing when you type the transcription (enabled by default), or the video is paused when you type the transcription. Clicking on the filled pause icon (⏸) will toggle between these two modes.
3. Identify speakers
It is possible to identify speakers from a pre-defined list.
If you add a classification job in the JSON interface, it will be rendered on the right panel of the interface.
Here we defined several possible speakers / classification categories for a paragraph (
Click on a category, then click in the corresponding paragraph on the speaker to set it. All the features of Kili relative to classification are included (shortcuts, search, etc).
Import transcriptions. A pre-transcription can be inserted to speed up the annotation. Insert prediction labels as described in this recipe.
Export transcriptions. The export will provide you with both paragraph-level and word-level timestamps. Word-level timestamps are derived from paragraph-level timestamps by interpolation.