Issue |
Wuhan Univ. J. Nat. Sci.
Volume 30, Number 3, June 2025
|
|
---|---|---|
Page(s) | 222 - 230 | |
DOI | https://doi.org/10.1051/wujns/2025303222 | |
Published online | 16 July 2025 |
Computer Science
CLC number: TP311.5
AI Chain-Driven Control Flow Graph Generation for Multiple Programming Language
面向多种编程语言的人工智能链驱动的控制流图生成
1 State International Science & Technology Cooperation Base of Networked Supporting Software, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
2 Jiangxi Provincial Key Laboratory for High Performance Computing, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
3 Language Intelligence Research Center, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
† Corresponding author. E-mail: zuo803@jxnu.edu.cn
Received:
25
September
2024
Control Flow Graphs (CFGs) are essential for understanding the execution and data flow within software, serving as foundational structures in program analysis. Traditional CFG construction methods, such as bytecode analysis and Abstract Syntax Trees (ASTs), often face challenges due to the complex syntax of programming languages like Java and Python. This paper introduces a novel approach that leverages Large Language Models (LLMs) to generate CFGs through a methodical Chain of Thought (CoT) process. By employing CoT, the proposed approach systematically interprets code semantics directly from natural language, enhancing the adaptability across various programming languages and simplifying the CFG construction process. By implementing a modular AI chain strategy that adheres to the single responsibility principle, our approach breaks down CFG generation into distinct, manageable steps handled by separate AI and non-AI units, which can significantly improve the precision and coverage of CFG nodes and edges. The experiments with 245 Java and 281 Python code snippets from Stack Overflow demonstrate that our method achieves efficient performance on different programming languages and exhibits strong robustness.
摘要
控制流图是程序分析的基本结构,是理解软件内部执行和数据流的基础。由于Java和Python等编程语言的复杂语法,传统的CFG构建方法,如基于字节码分析和基于抽象语法树(AST)的方法,往往面临着泛化能力低和学习成本高等局限性。为了解决这些问题,本文提出了一种基于大型语言模型(LLM)和系统化思维链(CoT)的CFG生成方法。该方法直接从自然语言中解释代码语义,通过遵循单一职责原则的模块化人工智能链策略,将CFG的生成分解为多个独立的、可管理的步骤,不同的步骤对应由单独的人工智能或非人工智能单元进行处理。该方法简化了CFG的构造过程,并增强了方法的泛化性。在Stack Overflow上爬取的245段Java代码和281段Python代码上进行的实验结果表明,该方法在不同的编程语言上均取得了高效的性能,且具有良好的鲁棒性。
Key words: Control Flow Graph / Large Language Model / Chain of Thought / AI chain
关键字 : 控制流图 / 大语言模型 / 思维链 / 人工智能链
Cite this article: ZOU Zhou, ZUO Zhengkang, HUANG Qing. AI Chain-Driven Control Flow Graph Generation for Multiple Programming Language[J]. Wuhan Univ J of Nat Sci, 2025, 30(3): 222-230.
Biography: ZOU Zhou, male, Master candidate, research direction: software engineering. E-mail: zzouzhou@jxnu.edu.cn
Foundation item: Supported by the National Natural Science Foundation of China (62462036, 62262031), Jiangxi Provincial Natural Science Foundation (20242BAB26017, 20232BAB202010), Distinguished Youth Fund Project of the Natural Science Foundation of Jiangxi Province (20242BAB23011), and the Jiangxi Province Graduate Innovation Found Project (YJS2023032)
© Wuhan University 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.