Treffer: Cepstral‐Basis‐Decomposed Nonnegative Matrix Factorization for Speech Signal Modeling.
Weitere Informationen
This study presents an enhanced nonnegative matrix factorization (NMF) algorithm designed for speech signal modeling. NMF has demonstrated efficacy across various applications to musical instrument signals, including audio source separation and music transcription. Nevertheless, its application to speech signals often results in diminished performance due to inadequate modeling arising from the spectral continuity of the speech signal. Hence, we introduced a pioneering approach termed cepstral‐basis‐decomposed NMF (CBD‐NMF), which incorporates cepstrum analysis to enhance the modeling of speech signals. In the practical experiment, CBD‐NMF is not necessarily convergence‐guaranteed due to the flooring process; however, the experiment has revealed parameters that allow for stable optimization, ensuring that the cost function does not increase. By experimentally modeling Japanese vowel speech signals, we demonstrate that CBD‐NMF induces better representation, in which one basis arises for one mora in Japanese. Additionally, when modeling a word in Japanese speech signals, CBD‐NMF tends to induce a sparse representation equivalent to a sparse NMF with an extremely large weight coefficient. Our proposed framework can be applied to practical applications such as audio source separation and is expected to contribute to performance improvements when targeting speech signals. © 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC. [ABSTRACT FROM AUTHOR]