Intelligent Malware Detection Using Static Analysis

Exploring how graph neural networks and file structure analysis can transform Windows malware detection.

Exploring how graph neural networks and file structure analysis can transform Windows malware detection

Author(s)

Designation

Date

Arnav Shah; Aarav Gupta

Member, AIGT

President of the Club

August 13, 2025

AI@GT Projects: More Than Just Hackathons

AI@GT isn’t just a club that hosts a hackathon–we also run and manage projects, which are led by students aiming to make a real world impact in the field of computer science. These projects span a wide variety of AI-related topics–one team is working on a RAG Chatbot for GT students, while another team is prototyping a novel computer vision algorithm. It’s easier than ever to work in the field of AI, but that doesn’t mean that it’s easy for our teams to get the resources that they need.

The Need for Compute Power

As the project teams advanced, they began to need resources that a student organization couldn’t offer. Most of our teams needed more compute power and access to GPUs in order to train their models faster. More specifically, our Intelligent Malware Detection team needed access to fast AI training for their massive datasets. Considering the size of the models that would be required to learn the data, training locally simply wasn’t an option. Not knowing what to do, we reached out to a few companies that could help our teams gain access to this hardware. When we came across Voltage Park and read about the services that they provided, we knew that we had found a potential solution.

Why Malware Detection Matters

Out of all the projects that AI@GT works with, the Static Malware Analysis project is the one that has the most potential to make an immediate impact in the real world. Not only is this project competitive with the current state of the art malware detectors, it also showcases the potential of a different approach to malicious file detection, exploring the hypothesis that file structure can be just as informative as, if not more than, the high level features of a file. In addition, the acceleration of these models shows how this tool can be used in a real time environment–the model can be deployed to scan files prior to being downloaded, or scans files in the time between an email being sent and received.

A New Approach: Graphs Over Metadata

To accurately and reliably detect Windows malware, the Intelligent Malware Detection project leverages “control flow graphs” to represent the ways that a file may execute when run. Previous research in the field primarily focused on using textual representations of the code (an executable or metadata extracted from the file, such as size and noisiness) to fit NLP or tree-based models to determine the safety of the file. However, the team wanted to explore a different direction, and decided to research the viability of a graphical representation of the executable’s internals. Could this approach better inform a safety prediction? This principle is the reasoning behind the team’s use of Graph Neural Networks (GNNs).

Blending Disciplines Across Computer Science

Though this may seem focused on ML/DL, the tech being used to build this project spans a breadth of CS disciplines. Working on this project requires experience with compilers and other foundational topics to determine an efficient way to represent these files. In addition, various types of deep learning techniques are used to explore the performance of variations on GNNs–and the best ways to reduce inference time without making performance tradeoffs. By using AI models to understand the execution paths a file can take when run, this project works on quickly and accurately determining the nature of the file and its likelihood of being malicious. However, this training presented a roadblock to the project team, as it would require compute power and hardware beyond what could be obtained as a student organization.

Thankfully, we had the help of Voltage Park, a next generation GPU cloud infrastructure specifically for AI. Thanks to Voltage Park and their GPUs, the Malware Analysis team is able to run disassembly scripts and train their models–according to project lead Aarav Gupta, this project would not have been possible without their help. AI@GT’s collaboration with Voltage Park has given Aarav and his team the ability to speed up the training of models by an order of magnitude while also allowing his team to benchmark these models, making this project possible.

Join ai @ geoRGIA TECH — where ideas meet innovation.

Join Our Community