Image annotation techniques with implementation in OpenCV

Image annotation is important in computer vision, which is the technique that allows computers to gain high-level understanding from digital images or video and to observe and interpret visual information. in the same way as humans. Annotation, often known as image labeling or tagging, is a crucial step in the development of most computer vision models. This article will focus on creating these annotations using OpenCV. Here are the topics to discuss.


  1. Image annotation
  2. Need image annotation
  3. Types of image annotations
  4. Implementing image annotation with OpenCV

The better the machine learning models, the better the quality of your annotations. Let’s understand image annotations.

Image annotation

The process of labeling, tagging, or specifying images in a particular dataset to train machine learning models is called image annotation. When the manual annotation is complete, the labeled images are processed by a machine learning or deep learning model to repeat the annotations without human intervention.

Accordingly, the image annotation is used to indicate aspects that your system should recognize. Supervised learning is the process of training an ML model from labeled data.

The image annotation establishes the criteria that the model attempts to duplicate, so any errors in labeling are also repeated. Therefore, proper annotation of images creates the foundation for training neural networks, making annotation one of the most critical tasks in computer vision.

Image annotations can be done manually or using an automatic annotation tool. Automatic annotation technologies are often pre-trained algorithms that can accurately label photos. Their annotations are needed for complex annotation tasks such as building segment masks, which take time to generate.

Are you looking for a comprehensive repository of Python libraries used in data science, check here.

Need image annotation

Image labeling is necessary for functional datasets because it informs the training model about relevant aspects of the image (classes), which it can then use to identify those classes in fresh, unpublished images.

Image annotation generates training data from which supervised AI models can learn. How the annotation of images predicts how the AI ​​will behave after seeing and learning from them. Consequently, poor annotation is frequently described in training, resulting in models that make poor predictions.

Annotated data is very important to meet a unique challenge and use AI in a new field. For typical tasks such as image classification and segmentation, pre-trained models are frequently available, and they can be customized for specific use cases using transfer learning with minimal effort. entries.

In contrast, training a complete model from scratch often requires a massive amount of annotated data divided into training, validation, and test sets, which is difficult and time-consuming to generate. Unsupervised algorithms do not need annotated data and can be trained directly on raw data.

Types of image annotations

There are three common methods of annotating images, and which one you choose for your use case will be determined by the complexity of the project. The more high-quality image data is used for each type, the more accurate the AI ​​will make predictions.


Classification is the easiest and fastest approach to annotating images because it simply assigns a tag to an image. For example, you can browse and categorize a collection of photographs of grocery store shelves to determine which contain soda and which do not.

This approach is ideal for capturing abstract information, such as the example above, or the time of day, if there are automobiles in the image, or for filtering out photographs that do not meet the criteria from the start. Although Categorization is the fastest to provide a single high-level label, it is also the most ambiguous of the three categories we emphasize, as it does not identify where the item is within the item. ‘image.

Object detection

Annotators are given particular elements to label in an image using object detection. So, if an image is tagged as containing ice cream, it goes one step further by showing where the ice cream is inside the image, or specifically looking for where the cocoa ice cream is. Object detection can be achieved using a variety of approaches, including:

  • Bounding boxes: Annotators use rectangles and squares to define the position of target objects in 2D. It is one of the most widely used image annotation approaches. Cuboids, also known as 3D bounding boxes, are used by annotators to specify the location and depth of a target object.
  • Polygon segmentation: Annotators use complicated polygons to specify the position of target elements that are asymmetrical and don’t just fit in a box.
  • Lines: Annotators detect essential contour lines and curves in an image to distinguish sections using lines and splines. Annotators can, for example, name the many lanes of a highway for a self-driving car image annotation project.

This approach is still not the most accurate since object detection allows overlap in the use of boxes or lines. What it offers is the general position of the element while still being a fairly quick annotation procedure.

Semantic segmentation

Semantic segmentation overcomes the problem of overlap in object recognition by ensuring that each component of an image belongs to a single class. This approach, which is usually done at the pixel level, requires annotators to assign categories (such as pedestrian, automobile, or sign) to each pixel. This helps teach an AI model how to detect and categorize certain elements, even when hidden. For example, if a shopping cart obscures part of the image, semantic segmentation can be used to define what coconut ice cream looks like down to the pixel level, letting the model know that it is always, by made with coconut ice cream. .

Implementing image annotation with OpenCV

In this article, we will use bounding boxes and color segmentation method for image annotation.

In bounding boxes, methods will manually draw different bounding shapes around the object and add text to them.

In color segmentation, we will use the KNN algorithm to segment the colors of objects in the query image. The colors would be segmented based on the value of ‘K’ which is the number of nearest neighbors and this segmented part on the images can be treated as an annotated part.

Method of bounding boxes

Import the necessary libraries

import cv2 
import numpy as np
import matplotlib.pyplot as plt

Read request image

Image query

As in this article, we are using a colored image, so we must use the ‘cv2.IMREAD_COLOR’. As it is requested to load a color image. Any image transparency will be ignored. This is the default setting. We can also pass the integer value 1 for this flag.

Draw a line on the object

cv2.line(image_line, (900,150), (1100,150), (0,255,255), thickness=5,lineType=cv2.LINE_AA)

The cv2.line takes the input coordinates of the start and end point of the line along with the thickness, transparency and color of the line.

Analytics India Magazine

Draw a circle around the object

image_circle=input_img.copy(), (1030,340),200, (0,255,255), thickness=5,lineType=cv2.LINE_AA)

The ‘’ takes the radius and coordinates of the circle as input. The rest is identical to the row function described earlier.

Analytics India Magazine

Draw a rectangle around the object

cv2.rectangle(image_rect, (900,150),(1100,530), (0,0,255), thickness=5,lineType=cv2.LINE_AA)

It takes the coordinates of the upper left corner and the coordinates of the lower right corner to draw the rectangle.

Analytics India Magazine

KNN method for segmentation

Import the necessary libraries

import cv2 
import numpy as np
import matplotlib.pyplot as plt

Reading and preprocessing

img = cv2.cvtColor(input_img,cv2.COLOR_BGR2RGB)
image_reshape = img.reshape((-1,3))
image_2d = np.float32(image_reshape)

Change the color order because in OpenCV the color of an image is read as blue, green and red (BGR). The requirements are Red, Green and Blue (RGB).

Apply the KNN

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 1.0)
K = 4
center = np.uint8(center)
res = center[label.flatten()]
result_image = res.reshape((img.shape))

Since the image is a high resolution image, so there are a lot of data points to go through, it would take a long time if the number of iterations is high. We have limited the number of iterations to 100 and the epsilon value is set to the maximum. The k nearest neighbor is set to 4 with a retry count of 10.

Analytics India Magazine

The algorithm segmented the colors quite well. Blues, whites, grays and browns could be seen separated. One could hide the image and further adjust the algorithm.


One of the most time-consuming aspects of data processing is data collection and annotation. Nevertheless, it serves as the basis for the training algorithms and must be executed with the greatest possible precision. Proper annotation often saves a lot of time later in the pipeline when building the model. With this article, we have understood the different types of annotations and their implementations.