LightTag, a startup recently launched by a former NLP researcher at Citi, has built a “text annotation platform” designed to help data scientists who need to quickly create training data for their learning systems. AI. It’s a classic pick and shovel move, as the Berlin-based company hopes to capitalize on the current boom in AI development.
Specifically, LightTag aims to solve one of the main bottlenecks of deep learning-based AI development: what you get is only as good as the tagged data you put into it. The problem, however, is that data labeling is laborious, and since it is work done by human teams, it is prone to imprecision and inconsistency. LightTag’s team-based workflow, smart user interface, and built-in quality checks are an attempt to mitigate this.
“What I learned from [my previous positions] LightTag is an understanding that tagged data is more important to the success of machine learning than intelligent algorithms,” says founder Tal Perry. “The difference in a successful machine learning project often came down to how well you executed and managed the collection and use of labeled data. There is a huge gap in the tooling to do it right. this is why I built LightTag”.
Perry says LightTag’s annotation interface is designed to keep taggers “efficient and engaged.” It also uses its own “AI” to learn from previous labeling and make annotation suggestions. The platform also automates the work of managing a project, in terms of assigning tasks to taggers and ensuring there is enough overlap and duplication to maintain high accuracy and consistency.
“We made it easy to markup with a team (it sounds obvious, but nothing else makes it easy),” he says. “To ensure the data is right, LightTag automatically assigns work to team members so that there is overlap between them. This allows project managers to measure agreement and recognize performance issues. their project from the beginning. For example, if a specific annotator is performing less well than the others”.
Meanwhile, Perry says the acquisition of labeled data is one of the quiet growth sectors of the recent AI boom, but for many sector industries, such as medicine, law or finance, the outsourcing the work is not an option. Indeed, the data is often too sensitive or too specialized to be processed by non-specialized experts. To solve this problem, LightTag offers an on-premise version in addition to SaaS.
“Every business has huge sets of unstructured textual data (CRM records, call transcripts, emails, etc.). Deep learning has made it algorithmically possible to mine this data, but to use the deep learning, we need to train the model with labeled datasets. Most companies can’t outsource text labeling because the data is too complicated (biology, finance), regulated (CRM records) or both (medical records),” explains the founder of LightTag.
Running in various pilots and in private beta since December 2018, and launching publicly this month, LightTag has already been used by the data science team of a major Silicon Valley tech company that wants its AI to understand also free-form text in profiles. such as by an energy company to analyze oil rig logs to predict drilling problems at certain depths. The startup also completed a pilot project with a medical imaging company tagging reports associated with MRI scans.