Treffer: Towards Automated Vocal Mode Classification in Healthy Singing Voice-An XGBoost Decision Tree-Based Machine Learning Classifier.

Title:
Towards Automated Vocal Mode Classification in Healthy Singing Voice-An XGBoost Decision Tree-Based Machine Learning Classifier.
Authors:
Sol J; Institute for Computing and Information Sciences, Radboud University, Nijmegen, the Netherlands., Aaen M; Research & Development, Complete Vocal Institute, Copenhagen K, Denmark; Nottingham University Hospitals, NHS Trust, Queen's Medical, ENT Department, Nottingham, United Kingdom. Electronic address: mathias@shout.dk., Sadolin C; Research & Development, Complete Vocal Institute, Copenhagen K, Denmark., Ten Bosch L; Department of Language and Communication, Centre for Language Studies, Radboud University, Nijmegen, the Netherlands.
Source:
Journal of voice : official journal of the Voice Foundation [J Voice] 2026 Jan; Vol. 40 (1), pp. 251.e1-251.e16. Date of Electronic Publication: 2023 Nov 11.
Publication Type:
Journal Article; Comparative Study
Language:
English
Journal Info:
Publisher: Mosby Country of Publication: United States NLM ID: 8712262 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1873-4588 (Electronic) Linking ISSN: 08921997 NLM ISO Abbreviation: J Voice Subsets: MEDLINE
Imprint Name(s):
Publication: 2003- : St. Louis, MO : Mosby
Original Publication: [New York, N.Y.] : Raven Press, 1987-
Contributed Indexing:
Keywords: Artificial intelligence; Complete vocal technique; Machine learning; Singing voice; Vocal modes
Entry Date(s):
Date Created: 20231112 Date Completed: 20260110 Latest Revision: 20260113
Update Code:
20260113
DOI:
10.1016/j.jvoice.2023.09.006
PMID:
37953088
Database:
MEDLINE

Weitere Informationen

Auditory-perceptual assessment is widely used in clinical and pedagogical practice for speech and singing voice, yet several studies have shown poor intra- and inter-rater reliability in both clinical and singing voice contexts. Recent advances in artificial intelligence and machine learning offer models for automated classification and have demonstrated discriminatory power in both pathological and healthy voice. This study develops and tests an XGBoost decision tree based machine learning classifier to develop automated vocal mode classification in healthy singing voice. Classification models trained on mel-frequency cepstrum coefficients, MFCC-Zero-Time Windowing, glottal features, voice quality features, and α-ratios demonstrated 92% average F1-score in distinguishing metallic and non-metallic singing for male singers and 87% average F1-score for female singers. The model distinguished vocal modes with 70% and 69% average F1-score for male and female samples, respectively. Model performance was compared to human auditory-perceptual assessments of 64 corresponding samples performed by 41 professional singers. The model performed with approximating or subpar performance to human assessors on task-matched problems. The XGBoost gains observed across tested features reveal that the most important attributes for the tested classification problems were MFCCs and α-ratios between high and low frequency energy, with models trained on only these features achieving performance not statistically significantly different from the best tested models. The best automated models in this study do not yet match human auditory-perceptual discrimination but improve on previously reported F1-average accuracies in automated classification in singing voice.
(Copyright © 2026 The Voice Foundation. Published by Elsevier Inc. All rights reserved.)

Declaration of Competing Interest This research was conducted as part of one co-author’s (JS) master’s thesis project at Radboud University in Computing Science: Data Science with co-author LB as supervisor, beyond which co-authors JS and LB have no conflict of interests to declare. During the study, co-author MA was employed in a PostDoc grant from the Danish Innovation Foundation (ref. no. 8054-00039B), which was in part given to Nottingham University Hospitals NHS, with which MA holds an honorary research contract, and in part to Complete Vocal Institute, with which co-author CS is employed.