Treffer: Forensic deepfake audio detection using segmental speech features.
Original Publication: Lausanne, Elsevier Sequoia.
Weitere Informationen
This study explores the potential of using acoustic features of segmental speech sounds to detect deepfake audio. These features are highly interpretable because of their close relationship with human articulatory processes and are expected to be more difficult for deepfake models to replicate. The results demonstrate that certain segmental features commonly used in forensic voice comparison (FVC) are effective in identifying deep-fakes, whereas some global features provide little value. These findings underscore the need to approach audio deepfake detection using methods that are distinct from those employed in traditional FVC, and offer a new perspective on leveraging segmental features for this purpose. In addition, the present study proposes a speaker-specific framework for deepfake detection, which differs fundamentally from the speaker-independent systems that dominate current benchmarks. While speaker-independent frameworks aim at broad generalization, the speaker-specific approach offers advantages in forensic contexts where case-by-case interpretability and sensitivity to individual phonetic realization are essential.
(Copyright © 2025 Elsevier B.V. All rights reserved.)
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Specifically, the authors have no funding sources, consultancies, stock ownership, honoraria, paid expert testimony, patent applications or registrations, or other financial or non-financial interests that might be perceived as affecting the objectivity of the research.