Computer Vision is a discipline whose aim is to allow computers to gain high-level understanding of digital images. This is quite a broad definition, because understanding can mean many different things, including finding an object on a picture (object detection), understanding what is happening (event detection), describing a picture in text, or reconstructing a scene in 3D. There are also special tasks related to human images: age and emotion estimation, face detection and identification, and 3D pose estimation, to name a few.
One of the simplest tasks of computer vision is image classification.
Computer vision is often considered to be a branch of AI. Nowadays, most of computer vision tasks are solved using neural networks. We will learn more about the special type of neural networks used for computer vision, convolutional neural networks, throughout this section.
However, before you pass the image to a neural network, in many cases it makes sense to use some algorithmic techniques to enhance the image.
There are several Python libraries available for image processing:
OpenCV is considered to be the de facto standard for image processing. It contains a lot of useful algorithms, implemented in C++. You can call OpenCV from Python as well.
A good place to learn OpenCV is this Learn OpenCV course. In our curriculum, our goal is not to learn OpenCV, but to show you some examples when it can be used, and how.
Images in Python can be conveniently represented by NumPy arrays. For example, grayscale images with the size of 320x200 pixels would be stored in a 200x320 array, and color images of the same dimension would have shape of 200x320x3 (for 3 color channels). To load an image, you can use the following code:
import cv2
import matplotlib.pyplot as plt
im = cv2.imread('image.jpeg')
plt.imshow(im)
Traditionally, OpenCV uses BGR (Blue-Green-Red) encoding for color images, while the rest of Python tools use the more traditional RGB (Red-Green-Blue). For the image to look right, you need to convert it to the RGB color space, either by swapping dimensions in the NumPy array, or by calling an OpenCV function:
im = cv2.cvtColor(im,cv2.COLOR_BGR2RGB)
The same cvtColor
function can be used to perform other color space transformations such as converting an image to grayscale or to the HSV (Hue-Saturation-Value) color space.
You can also use OpenCV to load video frame-by-frame - an example is given in the exercise OpenCV Notebook.
Before feeding an image to a neural network, you may want to apply several pre-processing steps. OpenCV can do many things, including:
im = cv2.resize(im, (320,200),interpolation=cv2.INTER_LANCZOS)
im = cv2.medianBlur(im,3)
or im = cv2.GaussianBlur(im, (3,3), 0)
cv2.threshold
/cv2.adaptiveThreshold
functions, which is often preferable to adjusting brightness or contrast.In our OpenCV Notebook, we give some examples of when computer vision can be used to perform specific tasks:
Image from OpenCV.ipynb
Image from OpenCV.ipynb
Detecting motion using Optical Flow. Optical flow allows us to understand how individual pixels on video frames move. There are two types of optical flow:
Image from OpenCV.ipynb
Let's do some experiments with OpenCV by exploring OpenCV Notebook
Sometimes, relatively complex tasks such as movement detection or fingertip detection can be solved purely by computer vision. Thus, it is very helpful to know the basic techniques of computer vision, and what libraries like OpenCV can do.
Watch this video from the AI show to learn about the Cortic Tigers project and how they built a block-based solution to democratize computer vision tasks via a robot. Do some research on other projects like this that help onboard new learners into the field.
Read more on optical flow in this great tutorial.
In this lab, you will take a video with simple gestures, and your goal is to extract up/down/left/right movements using optical flow.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。