Open Access
Issue
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 6, December 2023
Page(s) 493 - 507
DOI https://doi.org/10.1051/wujns/2023286493
Published online 15 January 2024

© Wuhan University 2023

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

It is widely recognized that software bugs are inevitable and pervasive. Bug fixing constitutes a significant aspect of computer programming and has become a routine activity for software developers. However, manual bug fixing is a laborious, error-prone, and costly task, accounting for up to 90% of the total cost of a software project [1]. Consequently, to assist developers in debugging programs more effectively and efficiently, Automatic Program Repair (APR) techniques have garnered considerable attention from industry practitioners and academic researchers, making noteworthy contributions to the field.

For example, the generate-and-validate and program synthesis-based approaches are two well-known design philosophies followed by many APR techniques (e.g., Refs. [2-5]). The basic idea behind the first approach is to generate a candidate patch space using predefined strategies and then validate all repair candidates(e.g., Refs. [2-4]), aiming to find one that can make the buggy programs behave as expected. On the other hand, the latter approach utilizes a program synthesis engine to synthesize patches that adhere to the given specification(e.g., Ref. [5]). Despite achieving promising results, existing APR techniques suffer from limitations such as the search space explosion problem and the generation of low-quality patches[6], which hinders their practical application.

To reduce the search space and enhance the quality of generated patches, some APR techniques utilize fix templates to generate patches[7-9]. Fix templates are common repair patterns, such as modifying a branch condition or wrapping code with an "if" statement, and can be either manually crafted or automatically inferred from human-written patches collected from open source repositories [8,9]. Some researchers also explored the mining and extraction of repair models from software repositories [10-13]. For example, in Ref. [10], the extracted repair models consist of probabilistic repair actions, which are modifications or mutators related to bug fixes. A subset of these probabilistic repair actions includes the probabilities of inserting assignments (4.1%) and "if" statements (6.6%). Well-designed fix templates empower APR techniques, such as Genesis [9], to generate repairs tailored to specific types of bugs, such as null pointer and class cast (CC) defects. Additionally, APR techniques can efficiently traverse the search space using repair models to prioritize potentially correct patches among all generated candidate patches[4].

While the literature has demonstrated the effectiveness of applying repair patterns in APR, there is limited work on automatically annotating a bug with the most relevant repair patterns. In this paper, we aim to investigate the feasibility of training a machine learning model to identify a bug's repair pattern automatically. This is achieved through the training of a single model, PatternNet, for bug repair pattern prediction.

Training PatternNet, however, is far from trivial. First, apart from the syntax and semantic information in buggy code, bug reports also contain valuable data that can aid in predicting a bug's repair patterns. For illustration, consider a real BIRT bug #293006, as listed in Listing 1 (Fig. 1), taken from the BIRT project. Evidently, the code will crash if the argument "parent" is NULL. Repairing the bug involves applying a single pattern, i.e., adding a null check to the variable "parent". Simultaneously, its bug report recorded in a bug tracking system is presented in Listing 2 (Fig. 1). The bug report's summary and textual description provide information regarding the code crash. Specifically, the exception occurs when the variable ̔parent' is NULL. This example demonstrates that, in addition to the faulty source code, bug report texts often offer valuable insights, which can assist in predicting repair patterns. Therefore, to achieve superior performance, PatternNet should possess the capability to process multimodal information. We further contend that such multimodal information could prove advantageous in the bug repair pattern prediction model.

thumbnail Fig. 1

Example of bug and its bug report taken from commit be373ea of BIRT project and issue tracking system

One challenge of training PatternNet is how to extract features from code and bug reports. Recently, several deep learning approaches have been proposed, aiming to learn source code embeddings. Among the existing techniques for source code representation learning, transformer-based pre-trained models [14,15] and Recurrent Neural Networks (RNN)-based approaches [16] represent state of the art. For example, UniXcoder [14] is based on a multi-layer transformer, utilizing multi-modal content including the abstract syntax tree of source code and comments, to learn the representation of code fragments through contrastive learning. While this approach is appealing, it is important to note that the model's maximum input token sequence length is 512, which necessitates truncation for longer input sequences, potentially leading to the loss of information from the source code. Therefore, the model's performance may degrade when handling datasets with source code token lengths exceeding 512. ASTNN [16], another method for source code representation, splits the abstract syntax tree of source code into a sequence of statement trees. A Recursive Neural Network (RvNN) encodes each statement tree into a vector. A bidirectional Gated Recurrent Unit (Bi-GRU) is applied to iterate through each statement tree vector to obtain the vector representation of an entire code fragment. Since RNNs are designed to handle input sequences of variable length, the ASTNN model can effectively manage long sequences of code statements. However, it exclusively considers the purely unimodal content, i.e., source code, for learning its representation. In order to further enhance the effectiveness of code representation learning, this paper proposes a hybrid code representation learning for repair pattern prediction, drawing inspiration from recent success in multi-view deep learning [17]. Specifically, PatternNet takes the buggy source code and its bug report as input, performing feature extraction from two different modalities: a transformer-based model and a RNN-based model. The pre-trained transformer-based models (i.e., UniXcoder) are used to extract features from the buggy code-bug report pair, while the RNN-based model (i.e., ASTNN) is employed to obtain features from the abstract syntax tree of the buggy code. Finally, the feature vectors from both views are combined before being utilized in the repair pattern prediction.

Another challenge for training PatternNet lies in the lack of a commonly used standard dataset for bug repair pattern prediction. Creating a comprehensive dataset of all types of bug repair patterns is difficult, since the numbers and types of repair patterns used during bug fixing vary widely across different bugs. Furthermore, the probability distribution of change actions occurring is very imbalanced[10]. Thus, we restrict our attention to those bugs that require the application of single frequent repair pattern to fix. For brevity, these bugs are called single repair pattern bugs, otherwise, that is, bugs that fixed by applying more than one repair pattern, are referred to as multi-repair pattern bugs. In fact, as our empirical analysis shows, most bugs in Defects4j[18] fixed successfully by state-of-art pattern-based APR belong to single repair pattern bugs (Section 4).

The contributions of this paper are as follows: first, we introduce PatternNet, a model for automatically predicting a reported bug's repair pattern. To the best of our knowledge, PatternNet is the first model that predicts a bug's repair pattern based on the semantic features learned from bug reports and source code, demonstrating that a reported bug's repair pattern can be predicted rather than mined[7] or inferred[19]. Second, a multi-view features learning and co-attention feature fusion model are developed in PatternNet to get the high-level representation of buggy code and bug report. Multi-view features learning enables PatternNet to both reap the benefit of different source code representation learning models described above and alleviate the drawback of single view feature learning models. Furthermore, the co-attention feature fusion model allows PatternNet to attend to different statements of buggy code and different tokens of reported bug. Third, we conduct experiments on our newly created dataset for bug repair pattern prediction, and the experimental results quantitatively demonstrate the effectiveness of PatternNet and feature fusion network for software bug repair pattern prediction.

1 Related Work

1.1 Automated Program Repair

A rich body of work has been done on developing automated methods of bug fixing since automated program repair promises to reduce the burden of software debugging on developers. According to a recent work[19], these techniques can be divided into three categories: generate-and-validate methodology[4,7,9], constraint-based repair [20,21] and deep learning-based repair[22-24]. In order to reduce the repair search space and improve the generated patches' quality, bug repair patterns have been widely adopted in these approaches. The adopted repair patterns can be a set of patterns manually pre-defined[25], automatically mined[7] or inferred[9] from the past bug fixes. Compared to previous work on APR techniques, our work uses a learned prediction model to provide general guidelines for selecting the most appropriate bug repair patterns required in bug fixing. Therefore, our work is complementary to previous studies on template-based APR techniques. The most closely related to our work is M3V[26]. M3V is an APR tool based on repair operator prediction. Our work differs from M3V in that M3V only can predict repair operators for bugs caused by null pointer exceptions and index out bounds, PatternNet, however, can predict 25 classes of repair patterns.

1.2 Bug Fixing Mining and Analysis

To obtain an in-depth understanding of how real world bugs are fixed, a large number of empirical studies have been conducted[10-13, 27-29]. Specifically, the mining and analyses are carried out based on the data collected either directly from bug datasets[27] or from bug fixing commits of project repositories[10-12]. The findings of these empirical studies mainly contain the common bug repair patterns identified among the human-written patches and the distribution of these patterns. Obviously, our work follows this line of research, but goes further and presents a bug repair pattern prediction model.

1.3 Source Code Representation Learning

As for source code representation, one kind of deep learning models consists of several pre-trained transformer-based models such as CodeBERT[15], GraphCodeBERT[30] and UniXcoder[14]. UniXcoder has been presented recently representing current state of the art techniques for code understanding and generation. Though these models have been shown the powerful source code representation, long code snippets are hard to handle[31]. ASTNN[16] is another kind of source code representation based on deep RNN. The key feature of ASTNN is that it can encode long code since RNN is used to model the sequence of statements of code. Our work combines these two types of source code representation approaches and demonstrates the benefits of the combination.

2 Background

2.1 Multi-View Deep Learning

Deep multi-view learning is a deep feature representation learning technique by exploiting complementary information of multiple features or modalities[17]. To combine multiple distinct features and utilize the complementary information of each view, several methods have been proposed, such as Canonical Correlation Analysis (CCA)[32], co-attention mechanism[33]. The former intends to discover the common information that is mostly correlated between views, however it may ignore the complementary information in the data, which yields sub-optimal quality of the representation resulting in poor model performance[33]. The latter tries to simultaneously attend multi-view information by weighing the importance of multi-view features for improving representation learning. In fact, co-attention mechanism can be considered a feature fusion method, since it is designed to obtain a better representation of paired views[34]. Moreover, co-attention mechanism has been successfully applied in many fields such as computer vision[34] and software code search[35], and has achieved remarkable performance in multi-view feature representation.

2.2 Tree-Based Neural Network for Source Code Encoder

Source code representation learning has emerged as a fundamental technique supporting various downstream software engineering tasks, including clone detection and code translation. Predicting software bug repair patterns is no exception. Notably, source code exhibits formal syntax and semantics, in contrast to natural language. Moreover, the syntactic usage of code elements is highly contextual[3]. Intuitively, capturing the source code's structural and contextual information as comprehensively as possible can result in superior source code representation, a notion supported by recent finding[36]. As a result, one approach to source code representation focuses on modeling code at the AST level. The AST is a recursive tree model representing an entire program or a specific program structure (e.g., statement or expression), encompassing lexical information and the syntactic structure of the source code. Given the inherently recursive nature of the AST, several recursive source code encoders have been introduced, including recursive Neural Networks [37], Tree-LSTM [38], and ASTNN [16]. These source code encoders recursively compute node embeddings within the AST in a bottom-up manner, and the vector associated with the root node of the AST is employed to represent the source code. ASTNN represents the state-of-the-art in this regard, as it divides each large AST into a sequence of smaller statement trees and encodes these statement trees into vectors. Building on this, ASTNN further captures the sequential information among statement trees using a bidirectional RNN model, ultimately producing a vector representation of a code fragment.

2.3 Transformer Model for Source Code Embedding

Inspired by the success of BERT [39] in natural language processing, the transformer's transfer learning strategy has recently been extended to multi-modal learning for source code understanding and generation. Among the most notable transformer-based models are CodeBERT [15], GraphCodeBERT [30], and UniXcoder [14]. These models have been developed using a pre-training and fine-tuning workflow. Specifically, CodeBERT, a bimodal pre-trained language model for both natural and programming languages, is trained with two objectives: Masked Language Modeling (MLM) and Replaced Token Detection (RTD).

In contrast to CodeBERT, GraphCodeBERT takes source code paired with comments and the corresponding data flow as input. It is pre-trained using standard masked language modeling [39] and two structure-aware tasks. One structure-aware task involves predicting where a variable is identified, while the other focuses on predicting data flow edges between variables.

To support various types of software engineering downstream tasks, UniXcoder, however, is pre-trained on large amounts of multi-modal content, including code, code comments, and AST, using self-supervised objectives such as masked language modeling, unidirectional language modeling [40], and denoising objectives [41]. In addition to these self-supervised objectives, UniXcoder also employs two supervised pre-training tasks: multi-modal contrastive learning and cross-modal generation to learn embeddings that represent the semantics of a code fragment.

3 PatternNet: Multi-View Feature Fusion Model for Repair Pattern Prediction

3.1 Overall Architecture

In this section, we introduce the deep neural network model PatternNet for repair pattern prediction, as shown in Fig. 2. This model comprises three components: the top branch captures features from the semantics of the buggy code's AST. The bottom branch is a transformer-based model that simultaneously represents the tokens of the buggy code and the bug report. The outputs of these two subnetworks are combined to form a single latent representation using a feature fusion network. Finally, the resulting features are sent to the softmax layer for classification. We provide detailed descriptions of the various subnetworks in the following sections.

thumbnail Fig. 2

Overview of PatternNet

3.2 Representation Learning for Buggy Code AST

Splitting ASTs and Creating Sequences of Statement Trees. In general, AST is a tree data structure in which each node represents an abstract syntactic structure of the source code. As mentioned earlier, ASTNN creates a sequence of statement trees from a code fragment at the statement level. An AST node is referred to as a statement node if it represents a statement in the source code. Essentially, a statement tree is a rooted tree in which one node is designated as the tree's root, while the remaining nodes are the root's descendants. Given a code fragment, it is first parsed to build an AST Z. Then, a statement tree sequence is created by using a pre-order traversal of Z, with the ordering determined by the sequence in which the statement nodes are visited. Let s be the statement node currently visited during the AST traversal. If s corresponds to a simple statement, a statement tree can be directly constructed from the subtree rooted at s in Z. In other words, the statement tree, whose root is s, consists of all children of s. On the other hand, if s corresponds to a nested statement, the node s is split into two parts: the header and the body of nested statements. For the header, constructing a statement tree rooted at s is straightforward. Additionally, a recursive process of splitting AST nodes and constructing statement trees is performed for each statement node inside the body of nested statements. Finally, all the constructed statement trees are sequentially added to the statement tree sequence. For more details, we refer the reader to Ref. [16].

Statement Tree Encoder. Since a statement tree is a recursive data structure consisting of AST nodes, ASTNN employs a recursive neural network-based statement tree encoder to learn a representation of statements. On the other hand, ASTs contain a variety of symbols representing either syntactic constructs or lexical information, such as identifiers. Hence, ASTNN utilizes word2vec [42] to learn vector representations of symbols. The trained symbol embeddings are then used to initialize the parameters in the encoder. Specifically, suppose we have a statement tree t, and it has C children. Let denote the pre-trained embedding parameters, where V is the vocabulary size and d is the embedding dimension of symbols. Let n be an internal node. The lexical vector representation of node n can be computed by:

(1)

where, is the one-hot representation of the symbol n and the embedding, respectively. Next, the node n's vector representation can be obtained by the following equation:

(2)

where is the weight matrix, and the dimension size is , is a biased term, is the hidden state for each child , denotes the updated hidden state, and is the identity activation function. In a similar fashion, the vectors of all nodes in Z can be recursively computed and optimized in a bottom-up way. Additionally, the encoder applies the max pooling operation on each node to extract the most significant features. Therefore, the final representation of the statement tree t with N nodes can be obtained by:

(3)

Note that since different nodes have a different number of child nodes, directly calculating equation (2) in a batch manner is impossible. Therefore, a dynamic batch processing algorithm is presented to facilitate batch computations of equation (2). For more details about the algorithm, please refer to Ref. [16].

Representing the Sequence of Statement Trees. Suppose we have a code fragment; a sequence of T statement trees is extracted from its AST. After the statement tree encoder described above is applied to all statement trees, the code fragment can be represented in a sequence of statement tree vectors . To gain some dependency between adjacent statement trees within the code fragment, ASTNN uses a Bi-GRU network to process the sequence of statement tree vectors . Specifically, the Bi-GRU contains a forward GRU which reads from to and a backward GRU which reads from to :

(4)

At time step τ, the forward hidden state and backward hidden state are concatenated to form the annotation of statement tree τ, i.e., , which summarizes the context information surrounding the statement tree τ. After iteratively calculating hidden states of all time steps, all hidden states are denoted as . Following ASTNN, the max pooling operation over is applied, producing a vector , which is the vector representation of the buggy code.

3.3 Representation Learning for BuggyCode-Bug Report Pair

To capture the semantics of both the buggy code and the bug report, PatternNet leverages UniXcoder to obtain the features representing buggy code and bug report. In particular, given a pair of buggy code and bug report, they are tokenized separately into m code tokens and n text tokens using RobertaTokenizer, which are denoted as [,,⋯,] and [,,⋯,], respectively. Since UniXcoder takes a sequence of tokens as input, these two sequences of tokens need to be concatenated into one sequence. Therefore, we add the unique token [SEP] in the middle of two sequences and the unique token [CLS] and [EOS] at the start and the end of the whole sequence, respectively. The embedding of each token is obtained by feeding the concatenated token sequence to UniXcoder, as shown in Fig. 3. These token embeddings are denoted as , where L=m+n+3. In Ref. [14], the mean pooling operation is performed over the embeddings of all tokens, resulting in a vector , which can be used to represent the bimodal feature of buggy code and bug report.

thumbnail Fig. 3

Example of UniXcoder bimodal embedding

3.4 Feature Fusion

After obtaining two sets of features, one derived from the AST of the buggy code and the other learned from the tokens of the buggy code and the bug report, it is evident that these two sets of features from different views need to be combined. In this regard, we employ two strategies discussed in Refs. [43,44] to achieve feature fusion: concatenation fusion and co-attention network-based feature fusion. In more detail, given a buggy code and bug report pair, we first suppose that the buggy code's AST has T statement trees, and the number of bimodal tokens of buggy code and bug report is N. Let and represent the vector representation of the buggy code and the sequence of the feature vector of T statement trees. Furthermore, let represent the bimodal features of buggy code and bug report and the embeddings of the bimodal tokens are denoted by (without generalization, superscripts are omitted). The two feature fusion methods are described as follows:

Concatenation Fusion. The concatenation fusion strategy is straightforward to concatenate the two vector representations into one vector directly, which can be defined as follows,

(5)

where [⋅] denotes the concatenation operation.

Feature Fusion based on Co-Attention Networks. Since a co-attention mechanism models the dependencies between multi-modal features and projects the multi-modal features to the common feature subspace[44,45], we propose a feature fusion based on co-attention networks. For ease of presentation, let and be the feature matrix formed by and Q, respectively, where ,, T, N denote the dimension of statement tree embeddings, the dimension of bimodal token embeddings, the number of statement trees, and the number of bimodal tokens, respectively. Following Ref.[44], we first compute the affinity matrix denoted as between and as follows:

(6)

where are learnable parameters. The element of represents the similarity between the-th bimodal token of and the -th statement tree of .

Then the attention weights of each statement tree and bimodal token denoted by and are calculated as follows:

(7)

where , , and are the learnable weight parameters.

In what follows, the statement tree and bimodal token attention vectors are calculated as the weighted sum of statement tree features and bimodal token features as follows,

(8)

Finally, inspired by the work in Ref. [44], we use a multi-layer perceptron to encode the multi-view features, that is,

(9)

where , and are the learnable weight parameters. [⋅] denotes the concatenation operation.

3.5 Bug Repair Pattern Prediction

As said before, this paper aims to develop a model that can automatically perform bug repair pattern prediction. We cast the prediction problem as a multi-class classification problem. Thus, given the fused feature vector of the -th instance, i.e., a pair of buggy code and its bug report, the probability that the -th instance belongs to the repair pattern can be obtained as follows:

(10)

where and denote the weight matrix and bias term.

In our model, we use cross entropy as the loss function, which is defined as:

(11)

where denotes a set of bug repair pattern types, is equal to 1 if the target bug repair pattern type for the-th instance is a; otherwise, it is equal to 0, and represents the number of training instances.

4 Dataset

4.1 Bug Repair Pattern Types

Developing a general-purpose prediction model capable of forecasting repair pattern types for bug fixing is exceedingly challenging. One of the reasons for this challenge is that previous research [27] has revealed the recurring nature of certain repair patterns in bug fixing, while others are seldom employed. The infrequent occurrence of some repair patterns makes data collection a challenging task. As a result, our model is tailored to focus on commonly recurring repair patterns, leaving the rarely occurring ones for future exploration. Based on the bug repair pattern taxonomies outlined in Ref. [27], we initially construct a real-world bug dataset, each comprising a triple (buggy file, bug report, repair patterns). Subsequently, we perform an analysis of the dataset and identify the top 25 repair pattern categories most frequently involved in bug fixes, as presented in Table 1. It is worth noting that certain repair pattern categories mentioned in Ref. [27] have been further divided into more precise subcategories. For instance, the repair pattern mcParValChange denotes method call parameter value modification, which is further subdivided into more refined categories like constChangeArgument and wrongVarRefArgument.

Table 1

Bug repair pattern types

4.2 Empirical Analysis of the Distribution of the Number of Repair Patterns Used by FixMiner

FixMiner[7] is considered one of the state-of-the-art tools for applying mined repair patterns to fix bugs, and its performance is evaluated using Defects4j[18]. Based on the experimental results, we compiled a bug dataset consisting of all bugs that FixMiner can successfully fix. For each bug in the dataset, a line-based edit script can be generated by applying GNU diff to the buggy version and the patched version of the program. Using these edit scripts, each bug is manually annotated with repair patterns. Figure 4 illustrates the distribution of the number of repair patterns applied in these successful bug fixes. The figure reveals that the current state-of-the-art APR approach, which relies on repair patterns, demonstrates strong efficacy in addressing simple bugs that affect a single statement, and their corresponding fixes require the application of a single repair pattern.

thumbnail Fig. 4

Distribution of number of repair patterns used in successful bug fixes of FixMiner

4.3 Bug Repair Pattern DataSet

Several publicly available datasets exist regarding real-world software bugs and bug reports, including Bugs.jar[46,47]. Bugs.jar encompasses 1 158 real bugs extracted from 8 prominent, open-source Java projects. Additionally, Bench4BL[47] comprises bug reports and their corresponding buggy source code files, featuring 10 017 bug reports gathered from 51 open-source projects. We constructed a new bug repair pattern dataset by leveraging these two existing datasets to expedite the data curation process.

Building our bug repair pattern dataset involves two key steps: bug data aggregation and bug annotation. For each bug report in Bench4BL, the corresponding Git commit ID of the bug-fixed version is provided. In contrast, Bugs.jar utilizes the distributed version control system Git to manage and store bug data. This means that within each branch of every project repository, one can find the buggy version of the source code, the associated bug report, the developer's patch to fix the bug, and the test suite. For Bugs.jar, we systematically iterate through all bugs. During each iteration, we apply the developer's patch to the original buggy source file, resulting in the fixed version of the source file. This process allows us to collect (buggy file, fixed file, bug report) triples for Bugs.jar. At the end of these iterations, we have gathered a comprehensive collection of (buggy file, fixed file, bug report) triples for Bugs.jar. As for Bench4BL, we first clone the projects' Git repositories locally. Similarly, we iterate through each bug-fixing commit ID, using the Git checkout command to extract both the pre- and post-fix versions of the buggy file. This process enables us to compile a set of (buggy file, fixed file, bug report) triples for Bench4BL. Finally, we combine the two sets of (buggy file, fixed file, bug report) triples into one unified dataset. Notably, bugs that affect more than one Java file are discarded, and any duplicate bugs are eliminated. Furthermore, we address missing values in bug reports through manual data completion.

Upon acquiring the set of (buggy file, fixed file, bug report)-triples, each bug requires annotation with repair patterns. We employed a semi-automatic method for bug annotation. We initially performed automated annotation for each pair of buggy and fixed files using the PPD tool[48]. PPD is a tool proficient in the automatic extraction of repair patterns between the faulty and patched code, utilizing GumTree[49] and Spoon[50]. It is worth noting that we made slight modifications to PPD in two ways. First, we extended PPD by incorporating additional repair pattern detections, enhancing its ability to identify more valuable repair patterns, for instance, the wrongVarRefArgument pattern. Secondly, we selectively disabled the tool from detecting certain repair patterns, as some of them offered minimal benefit to APR techniques. An example is the Single Line pattern. Subsequently, we verified the accuracy of the annotation results and rectified any errors based on the edit scripts generated by the GNU diff tool between the buggy file and the fixed file. In the end, we meticulously curated a dataset comprising 7 891 (buggy file, bug report, repair patterns)-triples drawn from Bugs.jar and Bench4BL. Given the emphasis of this paper on single repair pattern bugs and frequently occurring repair patterns, we refined the dataset by filtering out multi-repair pattern bugs and eliminating repair patterns with probabilities below 0.57%. This process yielded a final dataset of 4 242 (buggy file, bug report, repair pattern)-triples, which we refer to as sPATTERN.

As previously mentioned, some repair patterns occur frequently, while others are infrequent. Consequently, the sPATTERN dataset exhibits an inherent imbalance. The distribution of repair patterns in the sPATTERN dataset is presented in Table 2, affirming earlier findings[27]. Basic descriptive statistics pertaining to the sPATTERN dataset are summarized in Table 3.

Table 2

Bug repair pattern types and their percentages in sPATTERN

Table 3

Descriptive statistics of sPATTERN

5 Empirical Evaluation

5.1 Research Questions

PatternNet comprises several design components:1) the utilization of multi-view information, 2) adopting a hybrid model that integrates a transformer-based and an RNNs-based model, 3) incorporating feature fusion networks. Hence, our evaluation not only assesses the efficiency and effectiveness of each individual model to justify their hybrid integration but also examines how these design choices impact PatternNet's performance. Note that there are two strategies for feature fusion, i.e., concatenation fusion and co-attention fusion mentioned above. By default, PatternNet uses co-attention fusion strategy to combine different features, which is also called as PatternNet-CoAtt. Similarly, we name PatternNet with concatenation fusion strategy as PatternNet-Concat. Therefore, we examine the following research questions:

• RQ1: To what extent does the length of code affect the prediction performance of UniXcoder?

• RQ2: How accurately does PatternNet predict repair patterns for a reported bug with respect to other techniques?

• RQ3: How do feature fusion methods affect the model performance?

• RQ4: How much longer does it take to train PatternNet compared to the baseline models?

5.2 Experiment Implementation and Setup

We employed PyTorch[51] to implement PatternNet, and it was trained and validated using the sPATTERN dataset at the method level. The ASTs of Java buggy methods were extracted using javalang. During the training of PatternNet, we set the maximum sequence length to 512. In alignment with ASTNN, we utilized the Skip-gram algorithm with word2vec[42] to generate distributed representations of AST symbols, configuring the embedding size to be 128. The hidden dimensions of the ST-tree encoder and the Bi-GRU were both set to 128 and 384, respectively. Furthermore, the embedding dimensions of transformer-based models were all set to 768. As for the co-attention network, the size of the hidden layer was set to be 512 and 768×3, and we set k=1.

As the task of bug repair pattern prediction can be formulated as a multi-class classification problem, and given the imbalanced nature of our dataset, all the subsequent experiments employ accuracy and F1-macro as evaluation metrics. We utilize Adam as the optimizer and cross-entropy as the loss function for the model. To mitigate overfitting, we incorporate dropout in the hidden layer with a probability of p = 0.5, and during training, we employ early stopping based on the validation set. Our hybrid model is fine-tuned using various combinations of hyperparameters to ensure optimal performance. The hyperparameters are tuned through a grid search procedure with the following parameters and values: we explore learning rates from the set {2E-4, 3E-5, 5E-5, 2E-5}, and batch sizes from the set {8, 16, 24, 32, 64}. The best model identified through grid search is ultimately tested, and the resulting performance metric values are reported.

Data used. While we have introduced sPATTERN, a bug repair pattern prediction dataset, its size remains limited. To enhance the generalization ability of deep learning models, it is necessary to perform code data augmentation by employing semantic and syntax-preserving program transformations[52]. Since each example in sPATTERN consists of a (buggy method, bug report, repair pattern)-triple, we need to apply both code data augmentation and text data augmentation concurrently. Specifically, for each example in sPATTERN, we utilize a Java program transformer[53] and Eda[54] to generate two synthetic buggy methods and their corresponding bug reports. Following data augmentation, the dataset is randomly divided into three subsets for training, testing, and validation, with proportions of 60%, 20%, and 20%, respectively, in a stratified manner. This stratification accounts for the uneven distribution of bug repair patterns. Subsequently, oversampling is applied to balance the training sets concerning the various bug repair pattern classes. Ultimately, the training set, validation set, and test set comprise 25 425, 2 036, and 2 545 (buggy method, bug report, repair pattern)-triples, respectively.

On the other hand, we partition sPATTERN into two subsets: one subset where each buggy method contains 512 tokens or more, and another subset where the code token length of each buggy method is below 512. Similarly, we apply data augmentation, splitting, and oversampling to these two subsets, following the same procedures as those employed for sPATTERN. This results in two datasets, referred to as LongCode-sPATTERN and ShortCode-sPATTERN. For LongCode-sPATTERN, the sizes of the training set, validation set, and testing set are 8,425, 684, and 854, respectively. ShortCode-sPATTERN consists of a training dataset with 17 000 samples, a validation dataset with 1 353 samples, and a testing dataset with 1 691 samples.

Baseline models. We compare PatternNet with five types of baseline models utilizing different input modalities. These baseline models are Bi-GRU, ASTNN, UniXcoder, CodeBERT, and GraphCodeBERT. To elaborate, we use these baseline models to learn feature representations of reported bug for repair pattern prediction. Note that neither our proposed models nor baseline models use any hand-crafted features since previous work indicates that hand-crafted features usually fail to capture the semantic characteristics of source code[55].

Bi-GRU processes the entire bug report as a single sequence, and it employs the average of the hidden states of all words as a feature for bug repair pattern prediction.

ASTNN is an AST-based neural network used for source code representation and is a subnetwork of PatternNet. It is employed to extract the features of buggy code for bug repair pattern prediction.

UniXcoder is a cross-modal pre-trained model for source code representation, currently represents the state of the art and serves as a subnetwork of PatternNet. UniXcoder can take multi-modal data as inputs and output the embeddings of the inputs. As mentioned earlier, each example in the dataset contains buggy method code and its bug report. A bug report typically consists of the bug summary and textual description. In our evaluation of UniXcoder, we consider all possible combinations of input modalities. Specifically, we feed the following combinations into UniXcoder to obtain representations of reported bugs: buggy code and bug report summary, buggy code and bug report description, buggy code alone, and buggy code with complete bug report (i.e., summary and description). These feature representation learning methods are denoted as UniXcoder (code+summary), UniXcoder (code+description), UniXcoder (code), UniXcoder (code+summary+description), UniXcoder (summary), and UniXcoder (description), respectively.

CodeBERT is a bimodal pre-trained model based on transformer architecture for code understanding and generation. We use CodeBERT as a reported bug encoder to encode a pair of buggy code and bug report summary into its CodeBERT embedding.

GraphCodeBERT is a pre-trained model for programming languages that leverages semantic-level information of code, particularly data flow. We utilize GraphCodeBERT to encode a reported bug, which includes a buggy code paired with a bug report summary and the corresponding data flow. This information is fed into GraphCodeBERT to obtain the representation of the reported bug, which is subsequently used to predict bug repair pattern.

5.3 Experiment Results

To address the first research question regarding the impact of code length on UniXcoder's performance, we conducted evaluations on two datasets with varying lengths of buggy code to investigate differences in bug repair pattern prediction performance. Specifically, we tested UniXcoder on LongCode-sPATTERN and ShortCode-sPATTERN; the results are presented in Table 4. As expected, when the token length of buggy code increases, UniXcoder's performance deteriorates. This decline in performance for the long code dataset is due to UniXcoder's practice of simply truncating all code tokens that exceed the 512-token length limit, resulting in the loss of valuable code information.

For the second research question regarding PatternNet's accuracy in predicting repair pattern for reported bugs compared to other techniques, we evaluated PatternNet and baseline models trained on the sPATTERN dataset using different input modalities. The experimental results are summarized in Table 5. The table provides three key observations. First, models trained with bimodal data exhibit slightly superior performance compared to those trained with unimodal data. For instance, UniXcoder (code+summary) achieves an F1-macro score of 0.80, which is 0.02 higher than UniXcoder (code) alone. Second, due to the maximum input length constraint of 512 tokens, longer code or more detailed bug reports do not necessarily result in improved UniXcoder performance. Third, our multi-view and co-attentive representation learning model, PatternNet-CoAtt, outperforms the state-of-the-art UniXcoder by 0.05 in terms of F1-macro.

To address the third research question, which investigates the impact of feature fusion methods on model performance, we modified PatternNet-CoAtt by replacing its feature fusion module with the concatenation fusion strategy, resulting in what we refer to as PatternNet-Concat. Subsequently, to test the performance of PatternNet-Concat, we conducted the same experiment that evaluated PatternNet-CoAtt using identical parameter settings. Table 5 shows that PatternNet-CoAtt outperforms PatternNet-Concat in terms of F1-macro by 0.02. This suggests that the co-attention mechanism, as utilized in PatternNet-CoAtt, can enhance feature extraction performance in bug repair pattern prediction when compared with the concatenation fusion strategy.

To address the fourth research question, we assessed the training efficiency of PatternNet and compared the training times among PatternNet and the baseline models. Specifically, we reported the training times per epoch for PatternNet and all the baseline models, as illustrated in Fig. 5. It is evident from the figure that PatternNet takes approximately 1-2 times longer than UniXcoder to complete one epoch of training. In other words, the co-attention mechanism not only enhances PatternNet's performance, but also extends the training duration.

thumbnail Fig. 5

Training times per epoch of PatternNet and baseline models

Table 4

The bug repair prediction performance of different code token lengths based on UniXcoder

Table 5

Evalution results in terms of accuracy and F1-macro

6 Conclusion

In this paper, we introduce a novel deep neural network named PatternNet for predicting repair patterns of reported bugs. PatternNet utilizes a multi-view deep learning framework to extract high-level multi-view features from both buggy code and bug reports, employing different source code representation learning models. Furthermore, it captures co-attentive features between different statements within the buggy source code and the words in the bug report text. These multi-view features are then integrated into a unified representation using a feature fusion network. We provide quantitative evidence of the effectiveness of PatternNet and the feature fusion network for software bug repair pattern prediction.

In the future, we plan to expand our sPATTERN dataset by including more examples of single repair pattern bugs, which will necessitate further evaluations to assess the generalizability of PatternNet. Additionally, future research could explore how effectively PatternNet can enhance the performance of search-based APR techniques.

References

  1. Le Goues C, Nguyen T, Forrest S, et al. GenProg: A generic method for automatic software repair[J]. IEEE Transactions on Software Engineering, 2012, 38(1): 54-72. [CrossRef] [Google Scholar]
  2. Saha R K, Lyu Y J, Yoshida H, et al. ELIXIR: Effective object oriented program repair[C]//ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. New York: IEEE, 2017: 648-659. [Google Scholar]
  3. Wen M, Chen J J, Wu R X, et al. Context-aware patch generation for better automated program repair[C]//Proceedings of the 40th International Conference on Software Engineering. New York: ACM, 2018: 1-11. [Google Scholar]
  4. Jiang J J, Xiong Y F, Zhang H Y, et al. Shaping program repair space with existing patches and similar code[C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2018: 298-309. [Google Scholar]
  5. Mechtaev S, Yi J Y, Roychoudhury A. Angelix: Scalable multiline program patch synthesis via symbolic analysis[C]//Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016: 691-701. [Google Scholar]
  6. Long F, Rinard M. An analysis of the search spaces for generate and validate patch generation systems[C]//Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016: 702-713. [Google Scholar]
  7. Koyuncu A, Liu K, Bissyandé T F, et al . Fixminer: Mining relevant fix patterns for automated program repair[J]. Empirical Software Engineering, 2020, 25(3):1980-2024. [CrossRef] [Google Scholar]
  8. Kim D, Nam J, Song J, et al. Automatic patch generation learned from human-written patches[C]//2013 35th International Conference on Software Engineering (ICSE). New York: IEEE, 2013: 802-811. [Google Scholar]
  9. Long F, Amidon P, Rinard M. Automatic inference of code transforms for patch generation[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2017: 727-739. [Google Scholar]
  10. Martinez M, Monperrus M. Mining software repair models for reasoning on the search space of automated program fixing[J]. Empirical Software Engineering, 2015, 20(1):176-205. [CrossRef] [MathSciNet] [Google Scholar]
  11. Zhong H, Su Z D. An empirical study on real bug fixes[C]//2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. New York: IEEE, 2015: 913-923. [Google Scholar]
  12. Liu K, Kim D, Koyuncu A, et al. A closer look at real-world patches[C]//2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). New York: IEEE, 2018: 275-286. [Google Scholar]
  13. Wang Y, Meng N, Zhong H. An empirical study of multi-entity changes in real bug fixes[C]//2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). New York: IEEE, 2018: 287-298. [Google Scholar]
  14. Guo D Y, Lu S A, Duan N, et al. UniXcoder: Unified cross-modal pre-training for code representation[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: Association for Computational Linguistics, 2022: 7212-7225. [Google Scholar]
  15. Feng Z Y, Guo D Y, Tang D Y, et al. Codebert: A pre-trained model for programming and natural languages[C]//Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg: Association for Computational Linguistics, 2022:1536-1547. [Google Scholar]
  16. Zhang J, Wang X, Zhang H Y, et al. A novel neural source code representation based on abstract syntax tree[C]//Proceedings of the 41st International Conference on Software Engineering. New York: IEEE Press, 2019:783-794. [Google Scholar]
  17. Yan X Q, Hu S Z, Mao Y Q, et al. Deep multi-view learning methods: A review[J]. Neurocomputing, 2021, 448: 106-129. [CrossRef] [Google Scholar]
  18. Just R, Jalali D, Ernst M D. Defects4J: A database of existing faults to enable controlled testing studies for Java programs[C]//Proceedings of the 2014 International Symposium on Software Testing and Analysis. New York: ACM, 2014: 437-440. [Google Scholar]
  19. Le Goues C, Pradel M, Roychoudhury A. Automated program repair[J]. Communications of the ACM, 2019, 62(12):56-65. [Google Scholar]
  20. Xuan J F, Martinez M, Demarco F,et al. Nopol: Automatic repair of conditional statement bugs in java programs[J]. IEEE Transactions on Software Engineering, 2017, 43(1):34-55. [CrossRef] [Google Scholar]
  21. Zheng G L, Nguyen T, Brida S G, et al. Atr: Template-based repair for alloy specifications[C]//Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. New York : ACM, 2022: 666-677. [Google Scholar]
  22. Chakraborty S, Ray B. On multi-modal learning of editing source code[C]//2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). New York: IEEE, 2021: 443-455. [Google Scholar]
  23. Mashhadi E, Hemmati H. Applying CodeBERT for automated program repair of Java simple bugs[C]//2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). New York: IEEE, 2021: 505-509. [Google Scholar]
  24. Meng X X, Wang X, Zhang H Y, et al. Template-based neural program repair[C]//2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). New York: IEEE, 2023: 1456-1468. [Google Scholar]
  25. Liu K, Koyuncu A, Kim D, et al. TBar: Revisiting template-based automated program repair[C]//Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2019: 31-42. [Google Scholar]
  26. Xu X Z, Wang X D, Xue J L. M3V: Multi-modal multi-view context embedding for repair operator prediction[C]//2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). New York: IEEE, 2022: 266-277. [Google Scholar]
  27. Sobreira V, Durieux T, Madeiral F, et al. Dissection of a bug dataset: Anatomy of 395 patches from Defects4J[C]//2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). New York: IEEE, 2018: 130-140. [Google Scholar]
  28. Islam M R, Zibran M F. How bugs are fixed: Exposing bug-fix patterns with edits and nesting levels[C]//Proceedings of the 35th Annual ACM Symposium on Applied Computing. New York: ACM, 2020: 1523-1531. [Google Scholar]
  29. Soto M, Thung F, Wong C P, et al. A deeper look into bug fixes: Patterns, replacements, deletions, and additions[C]//Proceedings of the 13th International Conference on Mining Software Repositories. New York: ACM, 2016: 512-515. [Google Scholar]
  30. Guo D Y, Ren S, Lu S, et al. Graphcodebert: Pre-training code representations with data flow[EB/OL]. [2020-12-18]. http://arXivpreprintarXiv:2009.08366. [Google Scholar]
  31. Hu F, Wang Y L, Du L, et al. Long code for code search[EB/OL]. [2022-08-25]. http://arXivpreprintarXiv:2208.11271. [Google Scholar]
  32. Zhuang X W, Yang Z S, Cordes D. A technical review of canonical correlation analysis for neuroscience applications[J]. Human Brain Mapping, 2020, 41(13): 3807-3833. [CrossRef] [PubMed] [Google Scholar]
  33. Zheng L C, Cheng Y, Yang H X, et al. Deep co-attention network for multi-view subspace learning[C]//Proceedings of the Web Conference 2021. New York: ACM, 2021: 1528-1539. [Google Scholar]
  34. Nguyen D K, Okatani T. Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 6087-6096. [Google Scholar]
  35. Shuai J H, Xu L, Liu C, et al. Improving code search with co-attentive representation learning[C]//Proceedings of the 28th International Conference on Program Comprehension. New York: ACM, 2020: 196-207. [Google Scholar]
  36. Ding Z S, Li H, Shang W Y, et al. Can pre-trained code embeddings improve model performance? Revisiting the use of code embeddings in software engineering tasks[J]. Empirical Software Engineering, 2022, 27(3): 1-38. [CrossRef] [PubMed] [Google Scholar]
  37. Socher R, Lin C C, Manning C, et al. Parsing natural scenes and natural language with recursive neural networks[C]//Proceedings of the 28th International Conference on Machine Learning (ICML-11). New York: ACM, 2011: 129-136. [Google Scholar]
  38. Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: Association for Computational Linguistics, 2015: 1556-1566. [Google Scholar]
  39. Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2018-09-26]. https://arxiv.org/abs/1810.04805.pdf. [Google Scholar]
  40. Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[C]//North American Chapter of the Association for Computational Linguistics. Denver: NAACL, 2018. [Google Scholar]
  41. Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. The Journal of Machine Learning Research, 2020, 21(1):5485-5551. [MathSciNet] [Google Scholar]
  42. Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems. New York: ACM, 2013, 2:3111-3119. [Google Scholar]
  43. Kuncheva L I. Combining Pattern Classifiers[M]. New York: Wiley, 2014. [CrossRef] [Google Scholar]
  44. Lu J S, Yang J W, Batra D, et al. Hierarchical question-image co-attention for visual question answering[C]//Advances in Neural Information Processing Systems. New York: ACM, 2016, 29: 289-297. [Google Scholar]
  45. Feng G A, Hu Z W, Zhang L H, et al. Encoder fusion network with co-attention embedding for referring image segmentation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 2021: 15506-15515. [Google Scholar]
  46. Saha R K, Lyu Y J, Lam W, et al. Bugs.jar: A large-scale, diverse dataset of real-world Java bugs[C]//Proceedings of the 15th International Conference on Mining Software Repositories. New York: ACM, 2018: 10-13. [Google Scholar]
  47. Lee J, Kim D, Bissyandé T F, et al. Bench4BL: Reproducibility study on the performance of IR-based bug localization[C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2018: 61-72. [Google Scholar]
  48. Delfim F M, Durieux T, Sobreira V, et al. Towards an automated approach for bug fix pattern detection[EB/OL]. [2018-07-30]. http://arxiv.org/pdf/1807.11286.pdf. [Google Scholar]
  49. Falleri J R, Morandat F, Blanc X, et al. Fine-grained and accurate source code differencing[C]//Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2014: 313-324. [Google Scholar]
  50. Pawlak R, Monperrus M, Petitprez N, et al. SPOON: A library for implementing analyses and transformations of Java source code[J]. Software: Practice and Experience, 2016, 46(9): 1155-1179. [CrossRef] [Google Scholar]
  51. Paszke A, Gross S, Massa F, et al. Pytorch: An imperative style, high-performance deep learning library[C]//Advances in Neural Information Processing Systems. New York: ACM, 2019, 32:1-12, . [Google Scholar]
  52. Yu S W, Wang T, Wang J. Data augmentation by program transformation[J]. Journal of Systems and Software, 2022, 190: 111304. [CrossRef] [Google Scholar]
  53. Rabin M R I, Alipour M A. ProgramTransformer: A tool for generating semantically equivalent transformed programs[J]. Software Impacts, 2022, 14: 100429. [CrossRef] [Google Scholar]
  54. Wei J, Zou K. EDA: Easy data augmentation techniques for boosting performance on text classification tasks[EB/OL]. [2019-11-26]. https://arxiv.org/abs/1901.11196.pdf. [Google Scholar]
  55. Xu Y, Huang B, Zou X N, et al. Predicting effectiveness of generate-and-validate patch generation systems using random forest[J]. Wuhan University Journal of Natural Science, 2018, 23(5): 525-534. [CrossRef] [Google Scholar]

All Tables

Table 1

Bug repair pattern types

Table 2

Bug repair pattern types and their percentages in sPATTERN

Table 3

Descriptive statistics of sPATTERN

Table 4

The bug repair prediction performance of different code token lengths based on UniXcoder

Table 5

Evalution results in terms of accuracy and F1-macro

All Figures

thumbnail Fig. 1

Example of bug and its bug report taken from commit be373ea of BIRT project and issue tracking system

In the text
thumbnail Fig. 2

Overview of PatternNet

In the text
thumbnail Fig. 3

Example of UniXcoder bimodal embedding

In the text
thumbnail Fig. 4

Distribution of number of repair patterns used in successful bug fixes of FixMiner

In the text
thumbnail Fig. 5

Training times per epoch of PatternNet and baseline models

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.