Run this notebook online:\ |Binder| or Colab: |Colab|

.. |Binder| image:: https://mybinder.org/badge_logo.svg
   :target: https://mybinder.org/v2/gh/deepjavalibrary/d2l-java/master?filepath=chapter_computer-vision/bounding-box.ipynb
.. |Colab| image:: https://colab.research.google.com/assets/colab-badge.svg
   :target: https://colab.research.google.com/github/deepjavalibrary/d2l-java/blob/colab/chapter_computer-vision/bounding-box.ipynb

.. _sec_bbox:

Object Detection and Bounding Boxes
===================================


In the previous section, we introduced many models for image
classification. In image classification tasks, we assume that there is
only one main target in the image and we only focus on how to identify
the target category. However, in many situations, there are multiple
targets in the image that we are interested in. We not only want to
classify them, but also want to obtain their specific positions in the
image. In computer vision, we refer to such tasks as object detection
(or object recognition).

Object detection is widely used in many fields. For example, in
self-driving technology, we need to plan routes by identifying the
locations of vehicles, pedestrians, roads, and obstacles in the captured
video image. Robots often perform this type of task to detect targets of
interest. Systems in the security field need to detect abnormal targets,
such as intruders or bombs.

In the next few sections, we will introduce multiple deep learning
models used for object detection. Before that, we should discuss the
concept of target location. First, import the packages and modules
required for the experiment.

.. code:: java

    %load ../utils/djl-imports

Next, we will load the sample images that will be used in this section.
We can see there is a dog on the left side of the image and a cat on the
right. They are the two main targets in this image.

.. code:: java

    // Load the original image
    Image imgArr = ImageFactory.getInstance()
        .fromUrl("https://github.com/d2l-ai/d2l-en/blob/master/img/catdog.jpg?raw=true");
    imgArr.getWrappedImage();


.. figure:: output_bounding-box_a9e907_4_0.png


Bounding Box
------------

In object detection, we usually use a bounding box to describe the
target location. The bounding box is a rectangular box that can be
determined by the :math:`x` and :math:`y` axis coordinates in the
upper-left corner and the :math:`x` and :math:`y` axis coordinates in
the lower-right corner of the rectangle. We will define the bounding
boxes of the dog and the cat in the image based on the coordinate
information in the above image. The origin of the coordinates in the
above image is the upper left corner of the image, and to the right and
down are the positive directions of the :math:`x` axis and the :math:`y`
axis, respectively.

.. code:: java

    // bbox is the abbreviation for bounding box
    double[] dog_bbox = new double[]{60, 45, 378, 516};
    double[] cat_bbox = new double[]{400, 112, 655, 493};

We can draw the bounding box in the image to check if it is accurate.
Before drawing the box, we will define a helper function
``bboxToRectangle``. In DJL, the rectangle we create are basically
probabilities. Hence, we divide the coordinates by width and height
respectively. It represents the bounding box in the bounding box format
of DJL's ``Image`` API.

.. code:: java

    public Rectangle bboxToRectangle(double[] bbox, int width, int height){
        // Convert the coordinates into the 
        // bounding box coordinates format
        return new Rectangle(bbox[0]/width, bbox[1]/height, (bbox[2]-bbox[0])/width, (bbox[3]-bbox[1])/height);
    }

After loading the bounding box on the image, we can see that the main
outline of the target is basically inside the box.

.. code:: java

    List<String> classNames = new ArrayList();
            classNames.add("dog");
            classNames.add("cat");
    
    List<Double> prob = new ArrayList<>();
    prob.add(1.0);
    prob.add(1.0);
    
    List<BoundingBox> boxes = new ArrayList<>();
    boxes.add(bboxToRectangle(dog_bbox, imgArr.getWidth(), imgArr.getHeight()));
    boxes.add(bboxToRectangle(cat_bbox, imgArr.getWidth(), imgArr.getHeight()));
            
    DetectedObjects detectedObjects = new DetectedObjects(classNames, prob, boxes);
    
    // drawing the bounding boxes on the original image
    imgArr.drawBoundingBoxes(detectedObjects);
    imgArr.getWrappedImage();


.. figure:: output_bounding-box_a9e907_10_0.png


Summary
-------

-  In object detection, we not only need to identify all the objects of
   interest in the image, but also their positions. The positions are
   generally represented by a rectangular bounding box.

Exercises
---------

1. Find some images and try to label a bounding box that contains the
   target. Compare the difference between the time it takes to label the
   bounding box and label the category.