A browser-based image annotation tool for computer vision datasets

Finnish researchers have developed a browser-based image labeling tool intended to improve the ease and speed of tedious image annotation processes for computer vision datasets. Installed as an OS-independent extension for popular browser engines, the new tool allows users to “annotate while browsing freely”, rather than having to place a tagging session in the context of a dedicated configuration, or to execute secondary client-code and other special circumstances.

Entitled BRIMA (Low-Overhead Browser-only IMage Annotation tool), the system was developed at the University of Jyväskylä. It removes the need to fetch and compile data sets from local or remote repositories, and can be configured to derive useful data from the various data parameters available on any public platform.

BRIMA in action. Source: https://arxiv.org/pdf/2107.06351.pdf

In this way, BRIMA (which will be presented at ICIP 2021, when the code will also be made available) avoids the potential obstacles that can arise when automated web scraping systems are blocked via ranges of IP addresses or other methods, and prevented from collecting data – a scenario that is expected to become more common as intellectual property protection becomes more of a focus, as was recently the case with the code based on Microsoft’s AI, Copilot.

Since BRIMA is intended for human annotation only, its use is also less likely to trigger other types of roadblocks, such as CAPTCHA challenges or other automated systems designed to block data collection algorithms.

Adaptive data collection capabilities

BRIMA is implemented through a Firefox add-on or Chrome extension on Windows, OSX, or Linux, and can be configured to ingest salient data based on data points that a particular platform may choose to expose. For example, when annotating images in Google Street View, the system can take into account the orientation and viewpoint of the lens, and record the exact geolocation of the specified object under the attention of the user.

BRIMA was tested in September 2020 by its creators, during a collaboration on a participatory initiative to generate an object detection dataset for CCTV objects (CCTV cameras mounted in public spaces or visible from spaces public).

The system is composed of a lightweight JavaScript client-side installation in the form of a browser extension and a server-side aspect that receives and compiles annotation data. Server-side setup reference implementations were written in Python and PHP with Flask and Swagger/OpenAPI, but the researchers point out that the core processing architecture can easily be ported to other languages ​​and configurations.

The browser extension and server communicate via RESTful API and HTTP/XHR requests, with client-side data sent home in a JSON format compatible with MS COCO. This means the data is immediately usable with a variety of the most popular object detection frameworks, including various TensorFlow back-ends, such as Facebook’s Detectron2 and CenterMask2.

Project-specific tooling

Despite the generic nature of BRIMA, it can be configured into very specific data collection configurations, including the imposition of drop-down menus and other types of contextual inputs related to a particular domain. In the image below, we see that a camera information drop-down menu has been written in BRIMA, so that a group of annotators can provide detailed and relevant information for the project.

This additional tooling can be configured locally. The extension also features easy installation and configurable hotkeys, as well as color-coded UI elements.

The work builds on a number of attempts over the past few years to improve image annotation functionality for data obtained from the web or intended for the public. The DARPA-supported PhotoStuff tool offers online annotation through a dedicated web portal and can be run on the Semantic Web or as a standalone application. in 2004, UC Berkeley proposed Annotating photos on a camera phone, which heavily leveraged metadata, due to network coverage limitations and viewport limitations of the time; MIT’s LabelMe 2005 project also addressed browser-based annotation, building on MATLAB tools;

Since its release in 2015, the FOSS Python/QT LabelImg framework has gained popularity in crowdsourced annotation efforts, with a dedicated local installation. However, BRIMA researchers observe that LabelImg focuses on PascalVOC and YOLO standards, does not support MS COCO JSON format, and eschews polygonal outline tools in favor of simple rectangular capture regions (which will require further segmentation) .