Treffer: Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.

Title:
Detecting Potentially Harmful and Protective Suicide-Related Content on Twitter: Machine Learning Approach.
Authors:
Metzler H; Section for the Science of Complex Systems, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria.; Unit Suicide Research and Mental Health Promotion, Center for Public Health, Medical University of Vienna, Vienna, Austria.; Complexity Science Hub Vienna, Vienna, Austria.; Computational Social Science Lab, Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria.; Institute for Globally Distributed Open Research and Education, Vienna, Austria., Baginski H; Complexity Science Hub Vienna, Vienna, Austria.; Institute of Information Systems Engineering, Vienna University of Technology, Vienna, Austria., Niederkrotenthaler T; Unit Suicide Research and Mental Health Promotion, Center for Public Health, Medical University of Vienna, Vienna, Austria., Garcia D; Section for the Science of Complex Systems, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria.; Complexity Science Hub Vienna, Vienna, Austria.; Computational Social Science Lab, Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria.
Source:
Journal of medical Internet research [J Med Internet Res] 2022 Aug 17; Vol. 24 (8), pp. e34705. Date of Electronic Publication: 2022 Aug 17.
Publication Type:
Journal Article; Research Support, Non-U.S. Gov't
Language:
English
Journal Info:
Publisher: JMIR Publications Country of Publication: Canada NLM ID: 100959882 Publication Model: Electronic Cited Medium: Internet ISSN: 1438-8871 (Electronic) Linking ISSN: 14388871 NLM ISO Abbreviation: J Med Internet Res Subsets: MEDLINE
Imprint Name(s):
Publication: <2011- > : Toronto : JMIR Publications
Original Publication: [Pittsburgh, PA? : s.n., 1999-
References:
Br J Psychiatry. 2020 Dec;217(6):693-700. (PMID: 31843026)
Proc SIGCHI Conf Hum Factor Comput Syst. 2016 May;2016:2098-2110. (PMID: 29082385)
CMAJ. 2018 Jul 30;190(30):E900-E907. (PMID: 30061324)
Br J Psychiatry. 2010 Sep;197(3):234-43. (PMID: 20807970)
J Med Internet Res. 2017 Jun 29;19(6):e228. (PMID: 28663166)
Int J Environ Res Public Health. 2020 Aug 15;17(16):. (PMID: 32824149)
Sci Rep. 2020 Oct 7;10(1):16685. (PMID: 33028921)
J Med Internet Res. 2019 May 08;21(5):e11705. (PMID: 31344675)
J Med Syst. 2020 Nov 9;44(12):205. (PMID: 33165729)
Br J Psychiatry. 2017 Aug;211(2):109-115. (PMID: 28522433)
BMJ. 2020 Mar 18;368:m575. (PMID: 32188637)
Am Sociol Rev. 1974;39(3):340-54. (PMID: 11630757)
Crisis. 2021 Jan;42(1):40-47. (PMID: 32366171)
Soc Sci Med. 2006 Jun;62(11):2874-86. (PMID: 16387400)
J Clin Psychiatry. 2018 Nov 20;80(1):. (PMID: 30549483)
Am J Prev Med. 2014 Sep;47(3 Suppl 2):S235-43. (PMID: 25145745)
Soc Sci Med. 2018 Dec;219:19-29. (PMID: 30342383)
Online Soc Netw Media. 2017 Aug;2:32-44. (PMID: 29278258)
Soc Sci Med. 2017 Sep;189:158-166. (PMID: 28705550)
J Am Med Inform Assoc. 2019 Jun 1;26(6):561-576. (PMID: 30908576)
Comput Commun. 2016 Jan 1;73(Pt B):291-300. (PMID: 26973360)
PLoS One. 2013 Apr 22;8(4):e61809. (PMID: 23630615)
Aust N Z J Psychiatry. 2021 Mar;55(3):268-276. (PMID: 33153274)
Health Commun. 2021 Dec;36(14):2022-2029. (PMID: 32867541)
Contributed Indexing:
Keywords: Twitter; deep learning; machine learning; social media; suicide prevention
Entry Date(s):
Date Created: 20220817 Date Completed: 20220819 Latest Revision: 20221207
Update Code:
20250114
PubMed Central ID:
PMC9434391
DOI:
10.2196/34705
PMID:
35976193
Database:
MEDLINE

Weitere Informationen

Background: Research has repeatedly shown that exposure to suicide-related news media content is associated with suicide rates, with some content characteristics likely having harmful and others potentially protective effects. Although good evidence exists for a few selected characteristics, systematic and large-scale investigations are lacking. Moreover, the growing importance of social media, particularly among young adults, calls for studies on the effects of the content posted on these platforms.
Objective: This study applies natural language processing and machine learning methods to classify large quantities of social media data according to characteristics identified as potentially harmful or beneficial in media effects research on suicide and prevention.
Methods: We manually labeled 3202 English tweets using a novel annotation scheme that classifies suicide-related tweets into 12 categories. Based on these categories, we trained a benchmark of machine learning models for a multiclass and a binary classification task. As models, we included a majority classifier, an approach based on word frequency (term frequency-inverse document frequency with a linear support vector machine) and 2 state-of-the-art deep learning models (Bidirectional Encoder Representations from Transformers [BERT] and XLNet). The first task classified posts into 6 main content categories, which are particularly relevant for suicide prevention based on previous evidence. These included personal stories of either suicidal ideation and attempts or coping and recovery, calls for action intending to spread either problem awareness or prevention-related information, reporting of suicide cases, and other tweets irrelevant to these 5 categories. The second classification task was binary and separated posts in the 11 categories referring to actual suicide from posts in the off-topic category, which use suicide-related terms in another meaning or context.
Results: In both tasks, the performance of the 2 deep learning models was very similar and better than that of the majority or the word frequency classifier. BERT and XLNet reached accuracy scores above 73% on average across the 6 main categories in the test set and F <subscript>1</subscript> -scores between 0.69 and 0.85 for all but the suicidal ideation and attempts category (F <subscript>1</subscript> =0.55). In the binary classification task, they correctly labeled around 88% of the tweets as about suicide versus off-topic, with BERT achieving F <subscript>1</subscript> -scores of 0.93 and 0.74, respectively. These classification performances were similar to human performance in most cases and were comparable with state-of-the-art models on similar tasks.
Conclusions: The achieved performance scores highlight machine learning as a useful tool for media effects research on suicide. The clear advantage of BERT and XLNet suggests that there is crucial information about meaning in the context of words beyond mere word frequencies in tweets about suicide. By making data labeling more efficient, this work has enabled large-scale investigations on harmful and protective associations of social media content with suicide rates and help-seeking behavior.
(©Hannah Metzler, Hubert Baginski, Thomas Niederkrotenthaler, David Garcia. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 17.08.2022.)