Back | Next | Contents
Semantic Segmentation
The next deep learning capability we'll cover in this tutorial is semantic segmentation. Semantic segmentation is based on image recognition, except the classifications occur at the pixel level as opposed to the entire image. This is accomplished by convolutionalizing a pre-trained image recognition backbone, which transforms the model into a Fully Convolutional Network (FCN) capable of per-pixel labelling. Especially useful for environmental perception, segmentation yields dense per-pixel classifications of many different potential objects per scene, including scene foregrounds and backgrounds.
segNet
accepts as input the 2D image, and outputs a second image with the per-pixel classification mask overlay. Each pixel of the mask corresponds to the class of object that was classified. segNet
is available to use from Python and C++.
As examples of using segNet
we provide versions of a command-line interface for C++ and Python:
segnet-console.cpp
(C++)segnet-console.py
(Python)Later in the tutorial, we'll also cover segmentation on live camera streams from C++ and Python:
segnet-camera.cpp
(C++)segnet-camera.py
(Python)See below for various pre-trained segmentation models available that use the FCN-ResNet18 network with realtime performance on Jetson. Models are provided for a variety of environments and subject matter, including urban cities, off-road trails, and indoor office spaces and homes.
Below is a table of the pre-trained semantic segmentation models available for download, and the associated --network
argument to segnet-console
used for loading them. They're based on the 21-class FCN-ResNet18 network and have been trained on various datasets and resolutions using PyTorch, and were exported to ONNX format to be loaded with TensorRT.
Dataset | Resolution | CLI Argument | Accuracy | Jetson Nano | Jetson Xavier |
---|---|---|---|---|---|
Cityscapes | 512x256 | fcn-resnet18-cityscapes-512x256 |
83.3% | 48 FPS | 480 FPS |
Cityscapes | 1024x512 | fcn-resnet18-cityscapes-1024x512 |
87.3% | 12 FPS | 175 FPS |
Cityscapes | 2048x1024 | fcn-resnet18-cityscapes-2048x1024 |
89.6% | 3 FPS | 47 FPS |
DeepScene | 576x320 | fcn-resnet18-deepscene-576x320 |
96.4% | 26 FPS | 360 FPS |
DeepScene | 864x480 | fcn-resnet18-deepscene-864x480 |
96.9% | 14 FPS | 190 FPS |
Multi-Human | 512x320 | fcn-resnet18-mhp-512x320 |
86.5% | 34 FPS | 370 FPS |
Multi-Human | 640x360 | fcn-resnet18-mhp-512x320 |
87.1% | 23 FPS | 325 FPS |
Pascal VOC | 320x320 | fcn-resnet18-voc-320x320 |
85.9% | 45 FPS | 508 FPS |
Pascal VOC | 512x320 | fcn-resnet18-voc-512x320 |
88.5% | 34 FPS | 375 FPS |
SUN RGB-D | 512x400 | fcn-resnet18-sun-512x400 |
64.3% | 28 FPS | 340 FPS |
SUN RGB-D | 640x512 | fcn-resnet18-sun-640x512 |
65.1% | 17 FPS | 224 FPS |
nvpmodel 0
(MAX-N)note: to download additional networks, run the Model Downloader tool
$ cd jetson-inference/tools
$ ./download-models.sh
The segnet-console
program can be used to segment static images. It accepts 3 command line parameters:
jpg, png, tga, bmp
)jpg, png, tga, bmp
)--network
flag changes the segmentation model being used (see above)--visualize
flag accepts mask
or overlay
modes (default is overlay
)--alpha
flag sets the alpha blending value for overlay
(default is 120
)--filter-mode
flag accepts point
or linear
sampling (default is linear
)Note that there are additional command line parameters available for loading custom models. Launch the application with the --help
flag to recieve more info about using them, or see the Code Examples
readme.
Here are some example usages of the program:
$ ./segnet-console --network=<model> input.jpg output.jpg # overlay segmentation on original
$ ./segnet-console --network=<model> --alpha=200 input.jpg output.jpg # make the overlay less opaque
$ ./segnet-console --network=<model> --visualize=mask input.jpg output.jpg # output the solid segmentation mask
$ ./segnet-console.py --network=<model> input.jpg output.jpg # overlay segmentation on original
$ ./segnet-console.py --network=<model> --alpha=200 input.jpg output.jpg # make the overlay less opaque
$ ./segnet-console.py --network=<model> --visualize=mask input.jpg output.jpg # output the segmentation mask
Let's look at some different scenarios. Here's an example of segmenting an urban street scene with the Cityscapes model:
# C++
$ ./segnet-console --network=fcn-resnet18-cityscapes images/city_0.jpg output.jpg
# Python
$ ./segnet-console.py --network=fcn-resnet18-cityscapes images/city_0.jpg output.jpg
There are more test images called city-*.jpg
found under the images/
subdirectory for trying out the Cityscapes model.
The DeepScene dataset consists of off-road forest trails and vegetation, aiding in path-following for outdoor robots.
Here's an example of generating the segmentation overlay and mask by specifying the --visualize
argument:
$ ./segnet-console --network=fcn-resnet18-deepscene images/trail_0.jpg output_overlay.jpg # overlay
$ ./segnet-console --network=fcn-resnet18-deepscene --visualize=mask images/trail_0.jpg output_mask.jpg # mask
$ ./segnet-console.py --network=fcn-resnet18-deepscene images/trail_0.jpg output_overlay.jpg # overlay
$ ./segnet-console.py --network=fcn-resnet18-deepscene --visualize=mask images/trail_0.jpg output_mask.jpg # mask
There are more sample images called trail-*.jpg
located under the images/
subdirectory.
Multi-Human Parsing provides dense labeling of body parts, like arms, legs, head, and different types of clothing.
See the handful of test images named humans-*.jpg
found under images/
for trying out the MHP model:
# C++
$ ./segnet-console --network=fcn-resnet18-mhp images/humans_0.jpg output.jpg
# Python
$ ./segnet-console.py --network=fcn-resnet18-mhp images/humans_0.jpg output.jpg
Pascal VOC is one of the original datasets used for semantic segmentation, containing various people, animals, vehicles, and household objects. There are some sample images included named object-*.jpg
for testing out the Pascal VOC model:
# C++
$ ./segnet-console --network=fcn-resnet18-voc images/object_0.jpg output.jpg
# Python
$ ./segnet-console.py --network=fcn-resnet18-voc images/object_0.jpg output.jpg
The SUN RGB-D dataset provides segmentation ground-truth for many indoor objects and scenes commonly found in office spaces and homes. See the images named room-*.jpg
found under the images/
subdirectory for testing out the SUN models:
# C++
$ ./segnet-console --network=fcn-resnet18-sun images/room_0.jpg output.jpg
# Python
$ ./segnet-console.py --network=fcn-resnet18-sun images/room_0.jpg output.jpg
For convenience, there's also a Python script provided called segnet-batch.py
for batch processing folders of images.
It's launched by specifying the --network
option like above, and providing paths to the input and output directories:
$ ./segnet-batch.py --network=<model> <input-dir> <output-dir>
That wraps up the segmentation models and command-line utilities. Next, we'll run it on a live camera stream.
Next | Running the Live Camera Segmentation Demo
Back | Coding Your Own Object Detection Program
© 2016-2019 NVIDIA | Table of Contents
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。