Treffer: Swin-AFF: an improved accuracy 6D pose estimation network for high reflection and texture-less workpieces based on Swin transformer.

Title:
Swin-AFF: an improved accuracy 6D pose estimation network for high reflection and texture-less workpieces based on Swin transformer.
Source:
Visual Computer; Sep2025, Vol. 41 Issue 11, p8795-8812, 18p
Database:
Complementary Index

Weitere Informationen

6-Dof pose estimation of texture-less and highly reflective workpieces is an important application of computer vision, especially in the field of robot vision application. In this study, we propose a Swin transformer-based network named Swin-AFF, a method based on a single RGB-D image which is suitable for highly accurate 6D pose estimation of workpiece with texture-less or highly reflective from occluded scenes. The geometric and texture features of the workpiece are effectively extracted from the image feature coding and decoding network based on Swin transformer by using RGB images combined with the normal vector angle images generated by the depth channel, while the geometric features of the point cloud of the target are effectively extracted by using Rand-LA network. Meanwhile, a bidirectional feature fusion module with the function of suppressing abnormal noise of images based on adaptive fusion strategy is constructed. Finally, in order to improve accuracy and efficiency, the 3D-3D algorithm is used to calculate the 6D pose of the object based on the predicted target mask and 3D key points. In addition, a new dataset (HW6D) containing highly reflective symmetrical metal workpieces and texture-less plastic workpieces with different shapes and structures is constructed to verify the proposed method. Using ADD(S) evaluation metric, the experiments on the LineMOD, MP6D and HW6D datasets show that our method outperforms state-of-the-art methods for 6D pose estimation. Ablation research validates the design of Swin-AFF. HW6D dataset and Swin-AFF code are available at: https://doi.org/10.6084/m9.figshare.26411209.v2. [ABSTRACT FROM AUTHOR]

Copyright of Visual Computer is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)