Treffer: Pixel level image understanding with deep learning

Title:
Pixel level image understanding with deep learning
Contributors:
Qi, Xiaojuan (author.), Jia, Jiaya (thesis advisor.), Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering. (degree granting institution.)
Publication Year:
2018
Collection:
The Chinese University of Hong Kong: CUHK Digital Repository / 香港中文大學數碼典藏
Document Type:
Fachzeitschrift text
File Description:
electronic resource; remote; 1 online resource (xv, 124 leaves) : illustrations (some color); computer; online resource
Language:
English
Chinese
Relation:
cuhk:2187981; local: ETD920200132; local: AAI13837880; local: 991039750259003407
Rights:
Use of this resource is governed by the terms and conditions of the Creative Commons "Attribution-NonCommercial-NoDerivatives 4.0 International" License (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Accession Number:
edsbas.90C7699
Database:
BASE

Weitere Informationen

Ph.D. ; Pixel-level image understanding and synthesis are very important problems in computer vision since they provide the most comprehensive and detailed understanding of the visual world. In this thesis, we study these problems in RGB image domain, semantic domain and geometric domain. ; The first problem we address is, how to extract pixel-level semantic knowledge from RGB images. We address it by developing an object clique potential for semantic segmentation. Our object clique potential addresses the misclassified object-part issues arising in solutions based on fully-convolutional networks. Our object clique set, compared to that yielded from segmentproposal based approaches, is with a significantly smaller size, making our method consume notably less computation. Regarding system design and model formation, our object clique potential can be regarded as a functional complement to local-appearance-based CRF models and works in synergy with these effective approaches for further performance improvement. ; However, training neural network for pixel-level semantic understanding is data hungry. So the second problem we tackle is to learn pixel-level semantic information with only image-level supervision. Our method unifies semantic segmentation and object localization with important proposal aggregation and selection modules. They greatly reduce the notorious error accumulation problem that commonly arises in weakly supervised learning. Our proposed training algorithm progressively improves segmentation performance with augmented feedback in iterations. ; When human infer the semantics of the visual world, we not only use the RGB information, but also take the geometry into consideration. Thus, the third problem we handle is to incorporate geometry data for semantic segmentation. To be more specific, we study RGBD semantic segmentation. RGBD semantic segmentation requires joint reasoning about 2D appearance and 3D geometric information. We propose a 3D graph neural network (3DGNN) that builds a k-nearest ...