Treffer: Forensic deepfake audio detection using segmental speech features.

Title:
Forensic deepfake audio detection using segmental speech features.
Authors:
Yang T; University at Buffalo, Department of Linguistics, Buffalo, 14260, NY, United States. Electronic address: tianleya@buffalo.edu., Sun C; University at Buffalo, Department of Computer Science and Engineering, Buffalo, 14260, NY, United States., Lyu S; University at Buffalo, Department of Computer Science and Engineering, Buffalo, 14260, NY, United States., Rose P; Australian National University, Emeritus Faculty, Canberra, 0200, ACT, Australia.
Source:
Forensic science international [Forensic Sci Int] 2026 Jan; Vol. 379, pp. 112768. Date of Electronic Publication: 2025 Dec 09.
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Elsevier Science Ireland Country of Publication: Ireland NLM ID: 7902034 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1872-6283 (Electronic) Linking ISSN: 03790738 NLM ISO Abbreviation: Forensic Sci Int Subsets: MEDLINE
Imprint Name(s):
Publication: Limerick : Elsevier Science Ireland
Original Publication: Lausanne, Elsevier Sequoia.
Contributed Indexing:
Keywords: Deepfake audio detection; Deepfake speech; Forensic voice comparison; Likelihood ratio
Entry Date(s):
Date Created: 20251212 Date Completed: 20260108 Latest Revision: 20260108
Update Code:
20260109
DOI:
10.1016/j.forsciint.2025.112768
PMID:
41385900
Database:
MEDLINE

Weitere Informationen

This study explores the potential of using acoustic features of segmental speech sounds to detect deepfake audio. These features are highly interpretable because of their close relationship with human articulatory processes and are expected to be more difficult for deepfake models to replicate. The results demonstrate that certain segmental features commonly used in forensic voice comparison (FVC) are effective in identifying deep-fakes, whereas some global features provide little value. These findings underscore the need to approach audio deepfake detection using methods that are distinct from those employed in traditional FVC, and offer a new perspective on leveraging segmental features for this purpose. In addition, the present study proposes a speaker-specific framework for deepfake detection, which differs fundamentally from the speaker-independent systems that dominate current benchmarks. While speaker-independent frameworks aim at broad generalization, the speaker-specific approach offers advantages in forensic contexts where case-by-case interpretability and sensitivity to individual phonetic realization are essential.
(Copyright © 2025 Elsevier B.V. All rights reserved.)

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Specifically, the authors have no funding sources, consultancies, stock ownership, honoraria, paid expert testimony, patent applications or registrations, or other financial or non-financial interests that might be perceived as affecting the objectivity of the research.