Issue |
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 3, June 2023
|
|
---|---|---|
Page(s) | 237 - 245 | |
DOI | https://doi.org/10.1051/wujns/2023283237 | |
Published online | 13 July 2023 |
Computer Science
CLC number: TP311
Fine-Tuning Pre-Trained CodeBERT for Code Search in Smart Contract
Information Engineering College, Jiangxi University of Technology, Nanchang 330000, Jiangxi, China
Received:
23
September
2022
Smart contracts, which automatically execute on decentralized platforms like Ethereum, require high security and low gas consumption. As a result, developers have a strong demand for semantic code search tools that utilize natural language queries to efficiently search for existing code snippets. However, existing code search models face a semantic gap between code and queries, which requires a large amount of training data. In this paper, we propose a fine-tuning approach to bridge the semantic gap in code search and improve the search accuracy. We collect 80 723 different pairs of <comment, code snippet> from Etherscan.io and use these pairs to fine-tune, validate, and test the pre-trained CodeBERT model. Using the fine-tuned model, we develop a code search engine specifically for smart contracts. We evaluate the Recall@k and Mean Reciprocal Rank (MRR) of the fine-tuned CodeBERT model using different proportions of the fine-tuned data. It is encouraging that even a small amount of fine-tuned data can produce satisfactory results. In addition, we perform a comparative analysis between the fine-tuned CodeBERT model and the two state-of-the-art models. The experimental results show that the fine-tuned CodeBERT model has superior performance in terms of Recall@k and MRR. These findings highlight the effectiveness of our fine-tuning approach and its potential to significantly improve the code search accuracy.
Key words: code search / smart contract / pre-trained code models / program analysis / machine learning
Biography: JIN Huan, female, Associate professor, research direction: service-oriented software engineering. E-mail: 281965782@qq.com
Fundation item: Supported by Jiangxi Higher Education and Teaching Reform Project (JXJG-20-24-2), Science and Technology Project of Jiangxi Education Department (GJJ212023), and Jiangxi University of Technology Education and Teaching Reform Project (JY2104)
© Wuhan University 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.