Research
MASCOT-Android: A Curated Dataset and Automated Collection Pipeline for Android Malware Source Code Specimens
The article presents MASCOT-Android, a curated dataset of Android malware source code along with an automated collection pipeline that leverages repository-level documentation for scalable malware discovery on GitHub. The system utilizes character-level TF-IDF features from 8,772 malware and 25,747 benign README documents to train a LinearSVC classifier, achieving an accuracy of 96.28% and a false positive rate of 1.06%. This capability is significant for practitioners as it enhances the efficiency of malware source code identification, enabling better resource allocation and improved security measures in Android development.
malwaredatasetandroid