Treffer: Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.
Original Publication: Austin, Tex. : Psychonomic Society, c2005-
Alhanai, T., Au, R., & Glass, J. (2017). Spoken language biomarkers for detecting cognitive impairment. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, 409–416. https://doi.org/10.1109/ASRU.2017.8268965. (PMID: 10.1109/ASRU.2017.8268965)
Almaghrabi, S. A., Thewlis, D., Thwaites, S., Rogasch, N. C., Lau, S., Clark, S. R., & Baumert, M. (2022). The reproducibility of bio-acoustic features is associated with sample duration, speech task, and gender. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 30, 167–175. https://doi.org/10.1109/TNSRE.2022.3143117. (PMID: 10.1109/TNSRE.2022.314311735038295)
Armitage, R., & Nellums, L. B. (2020). COVID-19 and the consequences of isolating the elderly. The Lancet Public Health, 5(5), e256. https://doi.org/10.1016/S2468-2667(20)30061-X. (PMID: 10.1016/S2468-2667(20)30061-X321994717104160)
Bot, B. M., Suver, C., Neto, E. C., Kellen, M., Klein, A., Bare, C., Doerr, M., Pratap, A., Wilbanks, J., Dorsey, E. R., Friend, S. H., & Trister, A. D. (2016). The mPower study, Parkinson disease mobile data collected using ResearchKit. Scientific Data, 3(1), 160011. https://doi.org/10.1038/sdata.2016.11. (PMID: 10.1038/sdata.2016.11269382654776701)
Bottalico, P., Codino, J., Cantor-Cutiva, L. C., Marks, K., Nudelman, C. J., Skeffington, J., Shrivastav, R., Jackson-Menaldi, M. C., Hunter, E. J., & Rubin, A. D. (2020). Reproducibility of voice parameters: The effect of room acoustics and microphones. Journal of Voice, 34(3), 320–334. https://doi.org/10.1016/j.jvoice.2018.10.016. (PMID: 10.1016/j.jvoice.2018.10.01630471944)
Busquet, F., Efthymiou, F., & Hildebrand, C. (2023). Voice analytics in the wild: Validity and predictive accuracy of common audio-recording devices. Behavior Research Methods, 56(3), 2114–2134. https://doi.org/10.3758/s13428-023-02139-9. (PMID: 10.3758/s13428-023-02139-93725395810228884)
Cai, H., Huang, X., Liu, Z., Liao, W., Dai, H., Wu, Z., Zhu, D., Ren, H., Li, Q., Liu, T., & Li, X. (2023). Exploring multimodal approaches for Alzheimer’s disease detection using patient speech transcript and audio data. arXiv preprint. https://doi.org/10.48550/arXiv.2307.02514.
Cavalcanti, J. C., Englert, M., Oliveira, M., & Constantini, A. C. (2023). Microphone and audio compression effects on acoustic voice analysis: A pilot study. Journal of Voice, 37(2), 162–172. https://doi.org/10.1016/j.jvoice.2020.12.005. (PMID: 10.1016/j.jvoice.2020.12.00533451892)
Coulacoglou, C., & Saklofske, D. H. (2017). Psychometrics and psychological assessment: Principles and applications. Elsevier/AP, Academic Press. an imprint of Elsevier.
De La Fuente Garcia, S., Ritchie, C. W., & Luz, S. (2020). Artificial Intelligence, speech, and language processing approaches to monitoring Alzheimer’s disease: A systematic seview. Journal of Alzheimer’s Disease, 78(4), 1547–1574. https://doi.org/10.3233/JAD-200888. (PMID: 10.3233/JAD-200888331856057836050)
De Oliveira Florencio, V., Almeida, A. A., Balata, P., Nascimento, S., Brockmann-Bauser, M., & Lopes, L. W. (2021). Differences and reliability of linear and nonlinear acoustic measures as a function of vocal intensity in individuals with voice disorders. Journal of Voice, S0892199721001442. https://doi.org/10.1016/j.jvoice.2021.04.011.
Feng, F., Zhang, Z., Tang, L., Qian, H., Yang, L.-Z., Jiang, H., & Li, H. (2024). Test-retest reliability of acoustic and linguistic measures of speech tasks. Computer Speech & Language, 83, 101547. https://doi.org/10.1016/j.csl.2023.101547. (PMID: 10.1016/j.csl.2023.101547)
Fristed, E., Skirrow, C., Meszaros, M., Lenain, R., Meepegama, U., Papp, K. V., Ropacki, M. T., & Weston, J. (2022). A simple, automated, speech-based and AI-enhanced screener for mild cognitive impairment and amyloid beta positivity. Alzheimer’s & Dementia, 18, e065647. (PMID: 10.1002/alz.065647)
Fujimura, S., Kojima, T., Okanoue, Y., Kagoshima, H., Taguchi, A., Shoji, K., Inoue, M., & Hori, R. (2020). Real-time acoustic voice analysis using a handheld device running Android operating system. Journal of Voice, 34(6), 823–829. https://doi.org/10.1016/j.jvoice.2019.05.013. (PMID: 10.1016/j.jvoice.2019.05.01331253388)
García, A. M., Johann, F., Echegoyen, R., Calcaterra, C., Riera, P., Belloli, L., & Carrillo, F. (2023). Toolkit to Examine Lifelike Language (TELL): An app to capture speech and language markers of neurodegeneration. Behavior Research Methods. https://doi.org/10.3758/s13428-023-02240-z.
Godino-Llorente, J. I., Shattuck-Hufnagel, S., Choi, J. Y., Moro-Velázquez, L., & Gómez-García, J. A. (2017). Towards the identification of Idiopathic Parkinson’s disease from the speech. New articulatory kinetic biomarkers. PLOS ONE, 12(12), e0189583. https://doi.org/10.1371/journal.pone.0189583. (PMID: 10.1371/journal.pone.018958329240814)
Grillo, E. U., Brosious, J. N., Sorrell, S. L., & Anand, S. (2016). Influence of smartphones and software on acoustic voice measures. International Journal of Telerehabilitation, 8(2), 9–14. https://doi.org/10.5195/ijt.2016.6202. (PMID: 10.5195/ijt.2016.6202287757975536725)
Haulcy, R., & Glass, J. (2021). Classifying Alzheimer’s disease using audio and text-based representations of speech. Frontiers in Psychology, 11, 624137. https://doi.org/10.3389/fpsyg.2020.624137. (PMID: 10.3389/fpsyg.2020.624137335196517845557)
Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166–1186. https://doi.org/10.3758/s13428-017-0935-1. (PMID: 10.3758/s13428-017-0935-128726177)
Illner, V., Sovka, P., & Rusz, J. (2020). Validation of freely-available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson’s disease. Biomedical Signal Processing and Control, 58, 101831. https://doi.org/10.1016/j.bspc.2019.101831. (PMID: 10.1016/j.bspc.2019.101831)
Illner, V., Novotný, M., Kouba, T., Tykalová, T., Šimek, M., Sovka, P., Švihlík, J., Růžička, E., Šonka, K., Dušek, P., & Rusz, J. (2024). Smartphone voice calls provide early biomarkers of parkinsonism in rapid eye movement sleep behavior disorder. Movement Disorders, mds.29921. https://doi.org/10.1002/mds.29921.
Jadoul, Y., Thompson, B., & De Boer, B. (2018). Introducing Parselmouth: A Python interface to Praat. Journal of Phonetics, 71, 1–15. https://doi.org/10.1016/j.wocn.2018.07.001. (PMID: 10.1016/j.wocn.2018.07.001)
Jannetts, S., Schaeffler, F., Beck, J., & Cowen, S. (2019). Assessing voice health using smartphones: Bias and random error of acoustic voice parameters captured by different smartphone types: Assessing voice health using smartphones. International Journal of Language & Communication Disorders, 54(2), 292–305. https://doi.org/10.1111/1460-6984.12457. (PMID: 10.1111/1460-6984.12457)
Joshi, A. A., Dave, V., Sangle, V. R., Nehete, N. V., & Kulkarni, P. A. (2023). Assessing the reliability of a mobile phone recorder in acoustic voice analysis: A cross-sectional study. International Journal of Phonosurgery & Laryngology, 13(1), 5–8. https://doi.org/10.5005/jp-journals-10023-1243. (PMID: 10.5005/jp-journals-10023-1243)
Karunaimathi, V. P., Gladis, D., & Balakrishnan, D. (2020). Estimation of voice perturbation measures using signal processing algorithms. International Journal of Engineering Research and Technology, 13(11), 3806. https://doi.org/10.37624/IJERT/13.11.2020.3806-3813. (PMID: 10.37624/IJERT/13.11.2020.3806-3813)
Kim, H., Sung, J. E., & Jeong, J. H. (2022). Non-transcription analysis of connected speech in mild cognitive impairment using an information unit scoring system. Journal of Neurolinguistics, 61, 101035. https://doi.org/10.1016/j.jneuroling.2021.101035. (PMID: 10.1016/j.jneuroling.2021.101035)
Kojima, T., Fujimura, S., Hori, R., Okanoue, Y., Shoji, K., & Inoue, M. (2018). An innovative voice analyzer “VA” smart phone program for quantitative analysis of voice quality. Journal of Voice, 33(5), 642–648. https://doi.org/10.1016/j.jvoice.2018.01.026. (PMID: 10.1016/j.jvoice.2018.01.02629801970)
Konig, A., Satt, A., Sorin, A., Hoory, R., Derreumaux, A., David, R., & Robert, P. H. (2018). Use of speech analyses within a mobile application for the assessment of cognitive impairment in elderly people. Current Alzheimer Research, 15(2), 120–129. https://doi.org/10.2174/1567205014666170829111942. (PMID: 10.2174/156720501466617082911194228847279)
König, A., Zeghari, R., Guerchouche, R., Duc Tran, M., Bremond, F., Linz, N., Lindsay, H., Langel, K., Ramakers, I., Lemoine, P., Bultingaire, V., & Robert, P. (2021). Remote cognitive assessment of older adults in rural areas by telemedicine and automatic speech and video analysis: Protocol for a cross-over feasibility study. BMJ Open, 11(9), e047083. https://doi.org/10.1136/bmjopen-2020-047083. (PMID: 10.1136/bmjopen-2020-04708334475154)
Manfredi, C., Lebacq, J., Cantarella, G., Schoentgen, J., Orlandi, S., Bandini, A., & DeJonckere, P. H. (2017). Smartphones offer new opportunities in clinical voice research. Journal of Voice, 31(1), 111.e1-111.e7. https://doi.org/10.1016/j.jvoice.2015.12.020. (PMID: 10.1016/j.jvoice.2015.12.02027068549)
Maryn, Y., Ysenbaert, F., Zarowski, A., & Vanspauwen, R. (2017). Mobile communication devices, ambient noise, and acoustic voice measures. Journal of Voice, 31(2), 248.e11-248.e23. https://doi.org/10.1016/j.jvoice.2016.07.023. (PMID: 10.1016/j.jvoice.2016.07.02327692682)
McFee, B., Raffel, C., Liang, D., Ellis, D., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and music signal analysis in Python. 18–24. https://doi.org/10.25080/Majora-7b98e3ed-003.
Moro-Velazquez, L., Gomez-Garcia, J. A., Arias-Londoño, J. D., Dehak, N., & Godino-Llorente, J. I. (2021). Advances in Parkinson’s Disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects. Biomedical Signal Processing and Control, 66, 102418. https://doi.org/10.1016/j.bspc.2021.102418. (PMID: 10.1016/j.bspc.2021.102418)
Novotny, M., Melechovsky, J., Rozenstoks, K., Tykalova, T., Kryze, P., Kanok, M., Klempir, J., & Rusz, J. (2020). Comparison of automated acoustic methods for oral diadochokinesis assessment in amyotrophic lateral sclerosis. Journal of Speech, Language, and Hearing Research, 63(10), 3453–3460. https://doi.org/10.1044/2020_JSLHR-20-00109. (PMID: 10.1044/2020_JSLHR-20-0010932955982)
Novotny, M., Rusz, J., Cmejla, R., & Ruzicka, E. (2014). Automatic evaluation of articulatory disorders in Parkinson’s disease. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(9), 1366–1378. https://doi.org/10.1109/TASLP.2014.2329734. (PMID: 10.1109/TASLP.2014.2329734)
Quan, H., & Shih, W. J. (1996). Assessing reproducibility by the within-subject coefficient of variation with random effects models. Biometrics, 52(4), 1195. https://doi.org/10.2307/2532835. (PMID: 10.2307/25328358962450)
Rusz, J., Hlavnicka, J., Tykalova, T., Novotny, M., Dusek, P., Sonka, K., & Ruzicka, E. (2018). Smartphone allows capture of speech abnormalities associated with high risk of developing Parkinson’s disease. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(8), 1495–1507. https://doi.org/10.1109/TNSRE.2018.2851787. (PMID: 10.1109/TNSRE.2018.285178729994713)
Rusz, J., Tykalova, T., Ramig, L. O., & Tripoliti, E. (2021). Guidelines for speech recording and acoustic analyses in dysarthrias of movement disorders. Movement Disorders, 36(4), 803–814. https://doi.org/10.1002/mds.28465. (PMID: 10.1002/mds.2846533373483)
Sainburg, T., Thielk, M., & Gentner, T. Q. (2020). Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLOS Computational Biology, 16(10), e1008228. https://doi.org/10.1371/journal.pcbi.1008228. (PMID: 10.1371/journal.pcbi.1008228330573327591061)
Schaeffler, F., Jannetts, S., & Beck, J. (2019). Reliability of clinical voice parameters captured with smartphones—Measurements of added noise and spectral tilt. Interspeech, 2019, 2523–2527. https://doi.org/10.21437/Interspeech.2019-2910. (PMID: 10.21437/Interspeech.2019-2910)
Segal, Y., Hitczenko, K., Goldrick, M., Buchwald, A., Roberts, A., & Keshet, J. (2022). DDKtor: Automatic diadochokinetic speech analysis. Interspeech, 2022, 4611–4615. https://doi.org/10.21437/Interspeech.2022-311. (PMID: 10.21437/Interspeech.2022-311)
Shu, M., Zhang, Y., & Jiang, J. J. (2022). The effect of mandarin vowels on acoustic analysis: A prospective observational study. Journal of Voice, S0892199722001047. https://doi.org/10.1016/j.jvoice.2022.03.028.
Šimek, M., & Rusz, J. (2021). Validation of cepstral peak prominence in assessing early voice changes of Parkinson’s disease: Effect of speaking task and ambient noise. The Journal of the Acoustical Society of America, 150(6), 4522–4533. https://doi.org/10.1121/10.0009063. (PMID: 10.1121/10.000906334972306)
Stegmann, G. M., Hahn, S., Liss, J., Shefner, J., Rutkove, S. B., Kawabata, K., Bhandari, S., Shelton, K., Duncan, C. J., & Berisha, V. (2020). Repeatability of commonly used speech and language features for clinical applications. Digital Biomarkers, 4(3), 109–122. https://doi.org/10.1159/000511671. (PMID: 10.1159/000511671334425737772887)
Stoffel, M. A., Nakagawa, S., & Schielzeth, H. (2017). rptR: Repeatability estimation and variance decomposition by generalized linear mixed-effects models. Methods in Ecology and Evolution, 8(11), 1639–1644. https://doi.org/10.1111/2041-210X.12797. (PMID: 10.1111/2041-210X.12797)
Uloza, V., Padervinskis, E., Vegiene, A., Pribuisiene, R., Saferis, V., Vaiciukynas, E., Gelzinis, A., & Verikas, A. (2015). Exploring the feasibility of smart phone microphone for measurement of acoustic voice parameters and voice pathology screening. European Archives of Oto-Rhino-Laryngology, 272(11), 3391–3399. https://doi.org/10.1007/s00405-015-3708-4. (PMID: 10.1007/s00405-015-3708-426162450)
Vásquez-Correa, J. C., Fritsch, J., Orozco-Arroyave, J. R., Nöth, E., & Magimai-Doss, M. (2021). On modeling glottal source information for phonation assessment in Parkinson’s disease. Interspeech, 2021, 26–30. https://doi.org/10.21437/Interspeech.2021-1084. (PMID: 10.21437/Interspeech.2021-1084)
Vogel, A. P., Rosen, K. M., Morgan, A. T., & Reilly, S. (2014). Comparability of modern recording devices for speech analysis: Smartphone, landline, laptop, and hard disc recorder. Folia Phoniatrica et Logopaedica, 66(6), 244–250. https://doi.org/10.1159/000368227. (PMID: 10.1159/00036822725676365)
Zhang, Z., Shang, X., Yang, L.-Z., Ai, W., Wang, J., Wang, H., Wong, S. T. C., Wang, X., & Li, H. (2023). Artificial intelligence-powered acoustic analysis system for dysarthria severity assessment. Advanced Intelligent Systems, 5(10), 2300097. https://doi.org/10.1002/aisy.202300097. (PMID: 10.1002/aisy.202300097)
Zhang, Z., Yang, L.-Z., Wang, X., Wang, H., Wong, S. T. C., & Li, H. (2024). Detecting Wilson’s disease from unstructured connected speech: An embedding-based approach augmented by attention and bi-directional dependency. Speech Communication, 156, 103011. https://doi.org/10.1016/j.specom.2023.103011. (PMID: 10.1016/j.specom.2023.103011)
Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Statistics in Medicine, 31(29), 3972–3981. https://doi.org/10.1002/sim.5466. (PMID: 10.1002/sim.546622764084)
Weitere Informationen
In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.
(© 2024. The Psychonomic Society, Inc.)
Declarations. Conflicts of interest: The authors declare they have no financial interests. Ethics approval: The study was approved by the ethics committee of the Hefei Institutes of Physical Science and was conducted in accordance with the Declaration of Helsinki. Consent to participate: Informed consent was obtained from all individual participants included in the study. Consent for publication: The present study does not include any images, videos, or textual data from participants. The analyzed data consist solely of abstract acoustic features, which cannot convey any personal information.