Treffer: CATI++: empirical study and evaluation for adjacent instruction enhanced type inference.
Weitere Informationen
Variable-type information is fundamental, and it greatly helps in understanding the program semantics. Previous work applies rule-based and machine learning-based methods to recover variable types from commercial off-the-shelf binaries, heavily relying on the data flow or control flow. However, according to our study, about half of the variables lacked or even had no data flow; this problem has not received much attention from previous work. We empirically explore the severity of this problem to the type inference task and analyze its root causes. Based on compilation properties, we find that the instructions surrounding the instructions that operate on variables provide good contextual information that can be used for co-encoding to overcome the above problem. In this paper, we present an effective machine learning-based method to infer variable types and overcome the challenge of limited data dependency via adjacent instructions co-encoding. Therefore, we implement a system called CATI++, which locates variables from stripped binaries and infers 19 types of variables. We evaluate CATI++ on different compilation options, all of which outperforms state-of-the-art methods. The ablation experiments verify that our scheme is not sensitive to compilation conditions, while our designed method effectively alleviates the problems caused by missing data dependency. [ABSTRACT FROM AUTHOR]