Treffer: Pixel level image understanding with deep learning
Chinese
Weitere Informationen
Ph.D. ; Pixel-level image understanding and synthesis are very important problems in computer vision since they provide the most comprehensive and detailed understanding of the visual world. In this thesis, we study these problems in RGB image domain, semantic domain and geometric domain. ; The first problem we address is, how to extract pixel-level semantic knowledge from RGB images. We address it by developing an object clique potential for semantic segmentation. Our object clique potential addresses the misclassified object-part issues arising in solutions based on fully-convolutional networks. Our object clique set, compared to that yielded from segmentproposal based approaches, is with a significantly smaller size, making our method consume notably less computation. Regarding system design and model formation, our object clique potential can be regarded as a functional complement to local-appearance-based CRF models and works in synergy with these effective approaches for further performance improvement. ; However, training neural network for pixel-level semantic understanding is data hungry. So the second problem we tackle is to learn pixel-level semantic information with only image-level supervision. Our method unifies semantic segmentation and object localization with important proposal aggregation and selection modules. They greatly reduce the notorious error accumulation problem that commonly arises in weakly supervised learning. Our proposed training algorithm progressively improves segmentation performance with augmented feedback in iterations. ; When human infer the semantics of the visual world, we not only use the RGB information, but also take the geometry into consideration. Thus, the third problem we handle is to incorporate geometry data for semantic segmentation. To be more specific, we study RGBD semantic segmentation. RGBD semantic segmentation requires joint reasoning about 2D appearance and 3D geometric information. We propose a 3D graph neural network (3DGNN) that builds a k-nearest ...