Deep Multispectral Semantic Scene Understanding of Forested Environments using Multimodal Fusion
International Symposium on Experimental Robotics (ISER 2016), 2016
Abstract: Semantic scene understanding of unstructured environments is a highly challenging task for robots operating in the real world. Deep Convolutional Neural Network (DCNN) architectures define the state of the art in various segmentation tasks. So far, researchers have focused on segmentation with RGB data. In this paper, we study the use of multispectral and multimodal images for semantic segmentation and develop fusion architectures that learn from RGB, Near-InfraRed (NIR) channels, and depth data. We introduce a first-of-its-kind multispectral segmentation benchmark that contains 15,000 images and 325 pixel-wise ground truth annotations of unstructured forest environments. We identify new data augmentation strategies that enable training of very deep models using relatively small datasets. We show that our UpNet architecture exceeds the state of the art both qualitatively and quantitatively on our benchmark. In addition, we present experimental results for segmentation under challenging real-world conditions.
Images and movies
BibTex reference
@InProceedings{OB16c, author = "A. Valada and G. Oliveira and T.Brox and W. Burgard", title = "Deep Multispectral Semantic Scene Understanding of Forested Environments using Multimodal Fusion", booktitle = "International Symposium on Experimental Robotics (ISER 2016)", month = " ", year = "2016", url = "http://lmb.informatik.uni-freiburg.de/Publications/2016/OB16c" }