Treffer: Reassessing feature-based Android malware detection in a contemporary context.

Title:
Reassessing feature-based Android malware detection in a contemporary context.
Authors:
Muzaffar A; Heriot-Watt University, Dubai, United Arab Emirates., Ragab Hassen H; Heriot-Watt University, Dubai, United Arab Emirates., Zantout H; Heriot-Watt University, Dubai, United Arab Emirates., Lones MA; Heriot-Watt University, Edinburgh, United Kingdom.
Source:
PloS one [PLoS One] 2026 Jan 20; Vol. 21 (1), pp. e0341013. Date of Electronic Publication: 2026 Jan 20 (Print Publication: 2026).
Publication Type:
Journal Article
Language:
English
Journal Info:
Publisher: Public Library of Science Country of Publication: United States NLM ID: 101285081 Publication Model: eCollection Cited Medium: Internet ISSN: 1932-6203 (Electronic) Linking ISSN: 19326203 NLM ISO Abbreviation: PLoS One Subsets: MEDLINE
Imprint Name(s):
Original Publication: San Francisco, CA : Public Library of Science
References:
Sci Adv. 2024 May 3;10(18):eadk3452. (PMID: 38691601)
Nature. 2022 Aug;608(7922):250-251. (PMID: 35883008)
Patterns (N Y). 2023 Aug 04;4(9):100804. (PMID: 37720327)
Patterns (N Y). 2024 Aug 28;5(10):101046. (PMID: 39569205)
Sci Rep. 2024 May 10;14(1):10724. (PMID: 38730228)
Entry Date(s):
Date Created: 20260120 Date Completed: 20260120 Latest Revision: 20260123
Update Code:
20260123
PubMed Central ID:
PMC12818673
DOI:
10.1371/journal.pone.0341013
PMID:
41557740
Database:
MEDLINE

Weitere Informationen

We report the findings of a reimplementation of 18 foundational studies in feature-based machine learning for Android malware detection, published during the period 2013-2023. These studies are reevaluated on a level playing field using a contemporary Android environment and a balanced dataset of 124,000 applications. Our findings show that feature-based approaches can still achieve detection accuracies beyond 98%, despite a considerable increase in the size of the underlying Android feature sets. We observe that features derived through dynamic analysis yield only a small benefit over those derived from static analysis, and that simpler models often out-perform more complex models. We also find that API calls and opcodes are the most productive static features within our evaluation context, network traffic is the most predictive dynamic feature, and that ensemble models provide an efficient means of combining models trained on static and dynamic features. Together, these findings suggest that simple, fast machine learning approaches can still be an effective basis for malware detection, despite the increasing focus on slower, more expensive machine learning models in the literature.
(Copyright: © 2026 Muzaffar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)

The authors have declared that no competing interests exist.