Treffer: Single-View Contrastive Learning for Laryngeal Leukoplakia Classification With NBI Laryngoscopy Images.
Original Publication: New York, NY : J. Wiley, c1989-
G. Marioni, R. Marchese‐Ragona, G. Cartei, F. Marchese, and A. Staffieri, “Current Opinion in Diagnosis and Treatment of Laryngeal Carcinoma,” Cancer Treatment Reviews 32, no. 7 (2006): 504–515, https://doi.org/10.1016/j.ctrv.2006.07.002.
K. Kuznetsov, R. Lambert, and J.‐F. Rey, “Narrow‐Band Imaging: Potential and Limitations,” Endoscopy 38, no. 1 (2006): 76–81, https://doi.org/10.1055/s‐2005‐921114.
J. Fu, B. He, J. Yang, J. Liu, A. Ouyang, and Y. Wang, “CDRNet: Cascaded Dense Residual Network for Grayscale and Pseudocolor Medical Image Fusion,” Computer Methods and Programs in Biomedicine 234 (June 2023): 107506, https://doi.org/10.1016/j.cmpb.2023.107506.
M. Liang, X. Jiang, J. Cao, et al., “HSG‐MGAF Net: Heterogeneous Subgraph‐Guided Multiscale Graph Attention Fusion Network for Interpretable Prediction of Whole‐Slide Image,” Computer Methods and Programs in Biomedicine 247 (April 2024): 108099, https://doi.org/10.1016/j.cmpb.2024.108099.
X. Liu, W. Hu, S. Diao, D. E. Abera, D. Racoceanu, and W. Qin, “Multi‐Scale Feature Fusion for Prediction of IDH1 Mutations in Glioma Histopathological Images,” Computer Methods and Programs in Biomedicine 248 (May 2024): 108116, https://doi.org/10.1016/j.cmpb.2024.108116.
I. Mikhailov, B. Chauveau, N. Bourdel, and A. Bartoli, “A Deep Learning‐Based Interactive Medical Image Segmentation Framework With Sequential Memory,” Computer Methods and Programs in Biomedicine 245 (March 2024): 108038, https://doi.org/10.1016/j.cmpb.2024.108038.
X. Yu, Y. Qin, F. Zhang, and Z. Zhang, “A Recurrent Positional Encoding Circular Attention Mechanism Network for Biomedical Image Segmentation,” Computer Methods and Programs in Biomedicine 246 (April 2024): 108054, https://doi.org/10.1016/j.cmpb.2024.108054.
M. Jian, C. Tao, R. Wu, et al., “HRU‐Net: A High‐Resolution Convolutional Neural Network for Esophageal Cancer Radiotherapy Target Segmentation,” Computer Methods and Programs in Biomedicine 250 (June 2024): 108177, https://doi.org/10.1016/j.cmpb.2024.108177.
S. Zhang, Z. Yuan, X. Zhou, H. Wang, B. Chen, and Y. Wang, “VENet: Variational Energy Network for Gland Segmentation of Pathological Images and Early Gastric Cancer Diagnosis of Whole Slide Images,” Computer Methods and Programs in Biomedicine 250 (June 2024): 108178, https://doi.org/10.1016/j.cmpb.2024.108178.
K. Ranipa, W.‐P. Zhu, and M. N. S. Swamy, “A Novel Feature‐Level Fusion Scheme With Multimodal Attention CNN for Heart Sound Classification,” Computer Methods and Programs in Biomedicine 248 (May 2024): 108122, https://doi.org/10.1016/j.cmpb.2024.108122.
D. C. Bui, B. Song, K. Kim, and J. T. Kwak, “DAX‐Net: A Dual‐Branch Dual‐Task Adaptive Cross‐Weight Feature Fusion Network for Robust Multi‐Class Cancer Classification in Pathology Images,” Computer Methods and Programs in Biomedicine 248 (May 2024): 108112, https://doi.org/10.1016/j.cmpb.2024.108112.
S. Moaveninejad, V. D'Onofrio, F. Tecchio, et al., “Fractal Dimension as a Discriminative Feature for High Accuracy Classification in Motor Imagery EEG‐Based Brain‐Computer Interface,” Computer Methods and Programs in Biomedicine 244 (February 2024): 107944, https://doi.org/10.1016/j.cmpb.2023.107944.
R. Yang, P. Liu, and L. Ji, “ProDiv: Prototype‐Driven Consistent Pseudo‐Bag Division for Whole‐Slide Image Classification,” Computer Methods and Programs in Biomedicine 249 (June 2024): 108161, https://doi.org/10.1016/j.cmpb.2024.108161.
H. Xiong, P. Lin, J. G. Yu, et al., “Computer‐Aided Diagnosis of Laryngeal Cancer via Deep Learning Based on Laryngoscopic Images,” eBioMedicine 48 (October 2019): 92–99, https://doi.org/10.1016/j.ebiom.2019.08.075.
F. Wu, P. Wu, Y. Hou, and H. Shang, “Neural Network for Image Classification of Laryngeal Cancer,” in 2021 International Conference on Networking Systems of AI (INSAI) (2021), 239–243, https://doi.org/10.1109/INSAI54028.2021.00051.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 770–778.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 2818–2826.
Q. Guan, Y. Huang, Z. Zhong, Z. Zheng, L. Zheng, and Y. Yang, “Diagnose Like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification,” (2018), https://doi.org/10.48550/arXiv.1801.09927.
K.‐N. Wang, S. Zhuang, Q. Y. Ran, et al., “DLGNet: A Dual‐Branch Lesion‐Aware Network With the Supervised Gaussian Mixture Model for Colon Lesions Classification in Colonoscopy Images,” Medical Image Analysis 87 (July 2023): 102832, https://doi.org/10.1016/j.media.2023.102832.
D.‐H. Lee, “Pseudo‐Label: The Simple and Efficient Semi‐Supervised Learning Method for Deep Neural Networks,” (2013).
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “RepVGG: Making VGG‐Style ConvNets Great Again,” Presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, https://openaccess.thecvf.com/content/CVPR2021/html/Ding_RepVGG_Making_VGG‐Style_ConvNets_Great_Again_CVPR_2021_paper.html, (2021), 13733–13742.
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad‐CAM: Visual Explanations From Deep Networks via Gradient‐Based Localization,” Presented at the Proceedings of the IEEE International Conference on Computer Vision, https://openaccess.thecvf.com/content_iccv_2017/html/Selvaraju_Grad‐CAM_Visual_Explanations_ICCV_2017_paper.html, (2017), 618–626.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification With Deep Convolutional Neural Networks,” Communications of the ACM 60, no. 6 (2017): 84–90, https://doi.org/10.1145/3065386.
K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large‐Scale Image Recognition,” 3rd International Conference on Learning Representations (2015).
M. Tan and Q. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in Proceedings of the 36th International Conference on Machine Learning, PMLR (2019), 6102–6114.
Z. Liu, Y. Lin, Y. Cao, et al., “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,” 2021 in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021), 10012–10022.
J. Deng, W. Dong, R. Socher, L.‐J. Li, K. Li, and L. Fei‐Fei, “ImageNet: A Large‐Scale Hierarchical Image Database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), 248–255, https://doi.org/10.1109/CVPR.2009.5206848.
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 4700–4708.
I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollar, “Designing Network Design Spaces,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 10428–10436.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale,” International Conference on Learning Representations (ICLR) (2020).
W. Wang, E. Xie, and X. Li, “PVT v2: Improved Baselines With Pyramid Vision Transformer,” Computational Visual Media (2023), https://doi.org/10.1007/s41095‐022‐0274‐8.
Y. Li, X. Tian, X. Shen, and D. Tao, “Classification and Representation Joint Learning via Deep Networks,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, in IJCAI'17 (Melbourne, Australia, AAAI Press: 2017), 2215–2221.
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in Proceedings of the 37th International Conference on Machine Learning, PMLR, https://proceedings.mlr.press/v119/chen20j.html, (2020), 1597–1607.
Weitere Informationen
Background: Laryngeal cancer is the second most common upper respiratory tract cancer. Early and accurate diagnosis can improve the cure rate of patients. Laryngoscopy with NBI is a commonly used tool that can help endoscopists diagnose laryngeal diseases. However, the fine classification of laryngeal leukoplakia using NBI images is challenging for computer-aided diagnosis.
Methods: In this article, we propose a single-view contrastive learning network to locate lesion regions, construct sample pairs for contrastive learning, and provide pseudo-labels to unlabeled data in order to achieve fine classification under small samples. Firstly, we pretrain the backbone network using the original NBI images. Secondly, in order to augment the number of samples for contrastive learning, we design different patch generation methods based on an attention-guided network. The original NBI images are cropped into small patches for the purpose of generating lesion-related regions and complementary samples. The pseudo-labels of these small patches are obtained by applying the pre-trained backbone network. Finally, we combine the contrastive loss function and the cross-entropy loss function for jointly training the backbone network and contrastive learning network. Our NBI dataset is classified into six categories: normal tissue, inflammatory keratosis, mild dysplasia, moderate dysplasia, severe dysplasia, and squamous cell carcinoma.
Results and Conclusion: Experimental results demonstrate that our model achieves an accuracy of 96.12%, which is higher than the current mainstream models. Our model also achieves high specificity and sensitivity. The code is available at https://github.com/hans-bbt/single-view-contrastive-learning.
(© 2025 Wiley Periodicals LLC.)