In the previous section, we developed an initial understanding of computer vision. With this understanding, there are several algorithms that have been developed and are used in industrial applications. Studying these not only improve our understanding of the system but can also seed newer ideas to improve overall systems.
In this section, we will extend our understanding of computer vision by looking at various applications and their problem formulations:
- Image classification: In the past few years, categorizing images based on the objects within has gained popularity. This is due to advances in algorithms as well as the availability of large datasets. Deep learning algorithms for image classification have significantly improved the accuracy while being trained on datasets like ImageNet. We will study this dataset further in the next chapter. The trained model is often further used to improve other recognition algorithms like object detection, as well as image categorization in online applications. In this book, we will see how to create a simple algorithm to classify images using deep learning models.
- Object detection: Not just self-driving cars, but robotics, automated retail stores, traffic detection, smartphone camera apps, image filters and many more applications use object detection. These also benefit from deep learning and vision techniques as well as the availability of large, annotated datasets. We saw an introduction to object detection in the previous section that produces bounding boxes around objects and also categorize what object is inside the box.
- Object tracking: Following robots, surveillance cameras and people interaction are few of the several applications of object tracking. This consists of defining the location and keeps track of corresponding objects across a sequence of images.
- Image geometry: This is often referred to as computing the depth of objects from the camera. There are several applications in this domain too. Smartphones apps are now capable of computing three-dimensional structures from the video created onboard. Using the three-dimensional reconstructed digital models, further extensions like AR or VR application are developed to interface the image world with the real world.
- Image segmentation: This is creating cluster regions in images, such that one cluster has similar properties. The usual approach is to cluster image pixels belonging to the same object. Recent applications have grown in self-driving cars and healthcare analysis using image regions.
- Image generation: These have a greater impact in the artistic domain, merging different image styles or generating completely new ones. Now, we can mix and merge Van Gogh's painting style with smartphone camera images to create images that appear as if they were painted in a similar style to Van Gogh's.
The field is quickly evolving, not only through making newer methods of image analysis but also finding newer applications where computer vision can be used. Therefore, applications are not just limited to those explained previously.
Developing vision applications requires significant knowledge of tools and techniques. In Chapter 2, Libraries, Development Platform, and Datasets, we will see a list of tools that helps in implementing vision techniques. One of the popular tools for this is OpenCV, which consists of most common algorithms of computer vision. For more recent techniques such as deep learning, Keras and TensorFlow can be used in creating applications.
Though we will see an introductory image operations in the next section, in Chapter 3, Image Filtering and Transformations in OpenCV, there are more elaborate image operations of filtering and transformations. These act as initial operations in many applications to remove unwanted information.
In Chapter 4, What is a Feature?, we will be introduced to the features of an image. There are several properties in an image such as corners, edges, and so on that can act as key points. These properties are used to find similarities between images. We will implement and understand common features and feature extractors.
The recent advances in vision techniques for image classification or object detection use advanced features that utilize deep-learning-based approaches. In Chapter 5, Convolutional Neural Networks, we will begin with understanding various components of a convolutional neural network and how it can be used to classify images.
Object detection, as explained before, is a more complex problem of both localizing the position of an object in an image as well as saying what type of object it is. This, therefore, requires more complex techniques, which we will see in Chapter 6, Feature-Based Object Detection, using TensorFlow.
If we would like to know the region of an object in an image, we need to perform image segmentation. In Chapter 7, Segmentation and Tracking, we will see some techniques for image segmentation using convolutional neural networks and also techniques for tracking multiple objects in a sequence of images or video.
Finally in Chapter 8, 3D Computer Vision, there is an introduction to image construction and an application of image geometry, such as visual odometry and visual slam.
Though we will introduce setting up OpenCV in the next chapter in detail, in the next section we will use OpenCV to perform basic image operations of reading and converting images. These operations will show how an image is represented in the digital world and what needs to be changed to improve image quality. More detailed image operations are covered in Chapter 3, Image Filtering and Transformations in OpenCV.