Vom 20.12.2025 bis 11.01.2026 ist die Universitätsbibliothek geschlossen. Ab dem 12.01.2026 gelten wieder die regulären Öffnungszeiten. Ausnahme: Medizinische Hauptbibliothek und Zentralbibliothek sind bereits ab 05.01.2026 wieder geöffnet. Weitere Informationen

Treffer: THE USE OF WEAK ESTIMATORS TO ACHIEVE LANGUAGE DETECTION AND TRACKING IN MULTILINGUAL DOCUMENTS.

Title:
THE USE OF WEAK ESTIMATORS TO ACHIEVE LANGUAGE DETECTION AND TRACKING IN MULTILINGUAL DOCUMENTS.
Source:
International Journal of Pattern Recognition & Artificial Intelligence; Jun2013, Vol. 27 Issue 4, p1-33, 33p, 12 Charts, 5 Graphs
Database:
Complementary Index

Weitere Informationen

This paper deals with the problems of language detection and tracking in multilingual online short word-of-mouth (WoM) discussions. This problem is particularly unusual and difficult from a pattern recognition perspective because, in these discussions, the participants and content involve the opinions of users from all over the world. The nature of these discussions, consisting of multiple topics in different languages, presents us with a problem of finding training and classification strategies when the class-conditional distributions are nonstationary. The difficulties in solving the problem are many-fold. First of all, the analyst has no knowledge of when one language stops and when the next starts. Further, the features which one uses for any one language (for example, the n-grams) will not be valid to recognize another. Finally, and most importantly, in most real-life applications, such as in WoM, the fragments of text available before the switching, are so small that it renders any meaningful classification using traditional estimation methods almost futile. Earlier, the authors [B. J. Oommen and L. Rueda, Patt. Recogn.39(1) (2006) 328-341.] had recommended that for a variety of problems, the use of strong estimators (i.e. estimators that converge with probability 1) is sub-optimal. In this vein, we propose to solve the current problem using novel estimators that are pertinent for nonstationary environments. The classification results obtained for various data sets which involve as many as eight languages demonstrates that our proposed methodology is both powerful and efficient. [ABSTRACT FROM AUTHOR]

Copyright of International Journal of Pattern Recognition & Artificial Intelligence is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)