Kili Docs

Kili Docs

›Data ingestion

Introduction to Kili Technology

  • Introduction to Kili Technology
  • Kili Technology allows
  • Compatible browser

Getting Started

  • Getting started with Kili - Classification

Hosting

  • SaaS
  • On-Premise Data
  • On-Premise Entreprise

Concepts

  • Definitions
  • Status Lifecycle
  • Architecture

Users and roles

  • Roles by project
  • Users
  • Users and roles management

Projects

  • Audit labelers
  • Customize interface
  • Dataset
  • New project
  • Project overview
  • Projects
  • Projects list
  • Settings
  • Shortcuts

Image interfaces

  • Bounding Box
  • Classification
  • Point
  • Polygon
  • Polyline
  • Segmentation
  • Simple and intuitive interfaces

Text & PDF interfaces

  • Classification
  • Image transcription / OCR
  • Named entities recognition
  • Relations extraction

Video interfaces

  • Classification
  • Multi-frames classification
  • Multi-frames object detection
  • Transcription

Audio interfaces

  • Voice transcription / Speech to text

Data ingestion

  • Data ingestion made easy
  • Load data from a workstation
  • Load data from a public cloud
  • Data on premise or on private cloud
  • How to generate non-expiring signed URLs on AWS

Quality management

  • Consensus
  • Honeypot or Gold Standard
  • Instructions
  • Quality KPIs
  • Quality management
  • Questions and Issues
  • Review Process
  • Workload distribution

Automation

  • Human in the loop
  • Model based preannotation
  • Online learning
  • Queue prioritisation

Data export

  • Data export
  • Data format
  • Example

Python - GraphQL API

  • GraphQL API
  • Python API

Code snippets

  • Authentication
  • Create a Honeypot
  • Create a user
  • Creating Consensus
  • Delete the data
  • Export data
  • Export labels
  • Import data
  • Import labels
  • Prioritize assets
  • See the Consensus of an annotation
  • See the Honeypot of an annotation
  • Throttling

Recipes

  • AutoML for faster labeling with Kili Technology
  • Create a project
  • Exporting a training set
  • Importing medical data into a frame project
  • Importing assets
  • Import rich-text assets
  • Importing predictions
  • Reading and uploading dicom image data
  • How to query using the API
  • Labelled Image Data & Transfer Learning
  • Webhooks

Change log

  • Change log

Data on premise or on private cloud

Kili Technology works with data hosted On Premise or on a private cloud. The data source is accessible directly from the users' computer and is never shared (or accessible) by Kili Technology.

This means that Kili Technology will not have access to any data (images, text, etc...), and Kili Technology will only store labels and an ID allowing you to make the link with the data.

Private Cloud Data with VPN/VPC

To allow teams of remote workers to access the data with an on Premise or Private Cloud, use a VPN. Here are some tutorials on setting up a VPN for the main operators of the cloud:

https://cloud.google.com/vpn/docs/how-to/choosing-a-vpn

https://aws.amazon.com/blogs/aws/new-vpc-endpoint-for-amazon-s3/

https://azure.microsoft.com/en-us/services/vpn-gateway/

Create a CSV file with URLs

If your data is hosted in a cloud (for example Amazon S3), you can load your data into Kili Technology by creating a CSV file with URLs to each file.

The file format should have two columns: External ID, URL

externalId,url
image_1,https://images.pexels.com/photos/45209/purple-grapes-vineyard-napa-valley-napa-vineyard-45209.jpeg
image_2,https://images.pexels.com/photos/760281/pexels-photo-760281.jpeg

Then select the CSV file on your computer and drag and drop it on the "Connect Cloud Data" section.

On Premise Data

Here is a guide to using Kili Technology with data on your hard disk for Mac OSX and Linux. First you have to start a server local HTTP to serve the files, then generate a CSV of links to the files and upload them to Kili Technology.

Place all files in a single folder

Place all the files you want to label in a single folder on your hard drive.

Get the IP address of your computer

The command below should give your IP address, ex: 192.168.1.112

ifconfig en0 \| grep inet \| grep -v inet6 \| awk\'{print \$2}\'\'

Start the HTTP server on your computer

You can start a local server via the command line.

With Python

python -m SimpleHTTPServer

With NodeJS

npm install -g http-server; http-server -p 8000

Note in this example that the port to serve is 8000.

Create a CSV with your data

If you go on you should see the list of directories with your files.

Run cd in the directory with all your files and run the command below that will generate data.csv

IP_ADDRESS=\$(ifconfig en0 \| grep inet \| grep -v inet6 \| awk
\'{print \$2}\')

CSV=\$(echo \"Data URL\"; for fileName in \$(ls); do echo
"\$fileName,http://\$IP\_ADDRESS:8000/\$fileName"; done)

Load data.csv

In the Dataset tab, click on Add new.

See above for detailed information on the CSV format that Kili Technology is expecting.

Notes:

  • Only users of the same network can see your data if you host it yourself.

  • If the local server is down, you will lose access to your data while using Kili Technology.

  • To give access to the data to external users, it is necessary to display your localhost on the Internet. Here is a tutorial to do it quickly https://dev.to/levivm/exposing-localhost-server-to-the-internet-in-one-minute-2713

  • For security reasons, we avoid mixed contents and enforce assets to be served over HTTPS (not HTTP). So all assets URLs should begin with https://.

← Load data from a public cloudHow to generate non-expiring signed URLs on AWS →
  • Private Cloud Data with VPN/VPC
  • On Premise Data
    • Place all files in a single folder
    • Get the IP address of your computer
    • Start the HTTP server on your computer
  • Notes: