Features’ set
Malware detection
Results
Figure 1.
The architecture of graph based virus detection model.
A. Disassembly and Establish CFG
For further analysis, the binaries are transformed to an
intermediate representation. This transformation is obtained
by applying reverse engineering to the binaries. The
intermediate representations thus obtain as in the form of
assembly language code using a disassembly program. In our
model, the disassembly was obtained using Datarescues’
IDA Pro [8]. Assembly codes have more sense than
the binary codes in the syntax and semantics, so that they can
better reflect the structure of executables.
The function is main unit in the assembly codes. These
functions are abstracted as nodes and labeled with serial
numbers (identification of nodes) in accordance with their
location in the virtual memory. We employ an IDC (IDC is
IDA Pro’s built in script language) program to create CFG
论文网http://www.751com.cn/
automatically from disassembly, and the results are save by
graph files. There are some codes don’t belong to any of
aforementioned functions, and they aren’t considered in our
model.
B. Features Extraction
In order to mine information from the CFG, features
should be defined at first, and then they can be extracted by
the rapid arithmetic and used in the data mining. The
features, defined in our model, are divided into three main
categories according to information about nodes, edges and
944本文来自辣.文'论,文·网原文请找腾讯324.9114
subgraphs. Table 1 shows mean values of the some extracted
features in our experimental dataset.
The node represents a function of assembly codes, so it
needs to keep the type of function and its relations with
others. Considering the degrees of nodes, some special nodes
are defined as isolated nodes that have none of in-degrees or
out-degrees and terminal nodes that have in-degrees but none
of out-degrees.
It is interesting to note in Table 1 that entry nodes always
lie in the centre of benign executables, but the entry nodes of
virus are unable to disassembly, or close to the front of
programs.
Statistic values of edges are important information that
can reflect the complexity of relation between nodes. Early
researches usually focus on the codes of function but ignore
the relation between them. Our research shows these
relations are important security characteristics of
上一页 [1] [2] [3] [4] [5] [6] [7] [8] 下一页
基于控制流程图特点的病毒检测方案英文文献和翻译 第5页下载如图片无法显示或论文不完整,请联系qq752018766