Título: Malware identification on Portable Executable files using Opcodes Sequence
Autores: Alexandre R. de Mello, Vitor Gama Lemos, Flávio G. O. Barbosa, Emı́lio Simoni
Resumo: Malicious software (malware) is a relevant cybersecurity threat, as it can damage target systems, hijack data or credentials, and allow remote code execution. In recent years, researchers and companies have focused on uncovering distinct methods for malware detection to avoid system infection. This paper assesses a method that employs opcode sequence analysis, Graph Theory, and Machine Learning to identify malware on Portable Execution files without the need for execution. An approach used by many researchers is to find patterns through the opcode sequences of a file and use some Artificial Intelligence based strategy to classify the file as malware or benign. In this work, we introduce the OSG (Opcode Sequence Graph), a concept for malware detection based on Opcode Sequence, Graph Theory, and Artificial Intelligence with two new methods: the OSGT (Opcode Sequence Graph Theory) detector and the OSGNN (Opcode Sequence Graph Neural Network). The OSGT extracts the opcode sequence linearly, creates a graph for each file section, calculates features from a combination of Pagerank and node degree of each section, and uses ensemble learning to classify the files. The OSGNN logically extracts the opcode sequence to construct a control flow graph, uses the longest available path to create a graph, and applies a graph neural network to classify the files. We also propose a novel dataset composed of 28,000 files that contain 14,000 updated malware and 14,000 trusted portable executable Windows files. The experimental results show that both methods outperform the baseline methods and provide up to 99% malware detection. The outcomes of this study shows that the OSGT is suitable for real-world application considering the processing time and malware detection capacity, and the OSGNN achieves state-of-art detection capacity for malware with an extra cost of computational cost.
Palavras-chave: Malware detection, Opcode sequence, Graph theory, Opcode graph, Feature Extraction.
Páginas: 8
Código DOI: 10.21528/CBIC2023-006
Artigo em pdf: CBIC_2023_paper006.pdf
Arquivo BibTeX: CBIC_2023_006.bib