Angad is a framework to automate classification of an unlabelled malware dataset using multi-dimensional modelling. The input dataset is analyzed to collect various attributes which are then arranged in a number of feature vectors. These vectors are then individually visualized, indexed and then queried for each new input file. Matching vectors are labelled as per their AV detection categories for now but this could be changed to a heuristics approach if needed. If dynamic behavior or network traffic details are available, vectors are also converted into activity graphs that depict evolution of activity with a predefined time scale. This results into an animation of malware/malware category’s behavior traits and is also useful in identifying activity overlaps across the input dataset.
Malware detection is a challenging task as the landscape is ever-evolving. Every other day, a new variant or a known malware family is reported and signature driven tools race against time to add detection. The process worsens when the rate of incoming samples is in thousands on a daily basis, making static/dynamic analysis alone of no use.
Angad tries to address this issue by leveraging well-known data classification techniques to the malware domain. It tries to provide a known interface to the multi-dimensional modelling approach within a standalone package.