Automatic Detection of Weld Defects in Pressure Vessel X-Ray Image Based on CNN

.


Introduction
In the manufacturing process of pressure vessels, many components need to be welded.In the process of welding, due to physical environment or human error, the welded joint will form defects at the weld, resulting in potential safety hazards.In order to ensure the quality and safety of welded parts and prevent accidents, defect detection of welded parts becomes very important.At present, non destructive testing (NDT) methods commonly used in industry include X-ray testing, ultrasonic testing, magnetic particle testing, eddy current testing, etc.According to different image characteristics, welding defects are generally divided into cracks, lack of penetration, lack of fusion, porosity and so on.Due to the different detection standards, the classification of defects will be different.For example, the strip and circle defects in the Chinese National Standard (NBT 47013.2-2015) [1]include porosity and slag inclusions in the European Standard (EN ISO5817) [2] , but the definitions are different.In the current actual production, the main audit method is to manually analyze the weld image, judge whether there are defects and the type, location and size of defects based on experience, so as to evaluate the welding quality and give the corresponding rating.The manual evaluation method is affected by the testing staffs level, experience, fatigue and other human factors and external conditions, which is inefficient, unreliable and inconsistent.
In recent years, with the development of internet and the continuous improving of Graphics Processing Unit (GPU) hardware, computer vision based on deep learning has developed rapidly and gradually been the dominant method in the field of image processing and pattern recognition.The defect detection method of industrial automation based on computer vision has also shown an explosive growth.Classification [3][4][5][6] , target detection [7][8][9][10] , segmentation [11,12] and other methods in the field of computer vision can be used for industrial defect detection.Ref. [13] used a classification network based on convolution neural network to detect cloth defects; Ref. [14] employed the target detection algorithm to detect whether the fasteners of overhead catenary system support are missing; Ref. [15] used a semantic segmentation algorithm to detect metal surface defects.Considering that the annotating of segmentation is expensive and it is difficult to obtain enough annotation data, our work uses the object detection algorithm to detect weld defects.
Most of the object detection algorithms take the Microsoft Common Objects in Context (MS COCO) [16] benchmark as the target dataset and optimize the algorithms on COCO benchmark.However, the X-ray weld images of pressure vessel are quite different from the COCO benchmark in terms of image resolution and defect scale.Therefore, the performance of many vanilla object detection algorithms on weld defect detection dataset has decreased.In order to solve the problems of low efficiency, unreliability and poor consistency of manual evaluation, we have done a lot of research on automatic defect detection, and statistically analyzed the weld defect dataset.On the basis of these research and analysis, we propose an automatic defect detection algorithm based on Convolutional Neural Network (CNN), which is mainly aimed at X-ray image detection.The reference standard is European Standard (EN ISO5817).The proposed detector can identify defects include porosity, slag inclusion, cracks, lack of penetration, lack of fusion, intensive dispersed and intensive chain, while most other automatic weld defect detectors can only detect no more than five kinds of defect.
The main contributions of our work are as follows: 1) We analyze the characteristics of the weld image and the weld defects thoroughly.
2) According to the characteristics of the weld defects to be detected, we propose DRepDet(dilation Rep-Points Detector) which can locate and identify a variety of defects including porosity, slag inclusions, cracks, lack of penetration, lack of fusion, defect intensive dispersed and intensive chain.
3) Extensive experiments are carried out on our pri-vate weld image dataset to validate the effectiveness of the proposed algorithm.The results show that the proposed DRepDet can effectively improve the precision and recall of large defects while the precision and recall of small defects are not affected.
1 Related Work

Object Detection
The object detection task in computer vision needs to locate the object to be detected in a picture, and give the size and classification of the object.The anchorbased algorithm needs to preset the anchor box size and aspect ratio according to the statistical information of the ground truth bounding box.When the scale range of the object changes greatly, it can not adapt to the object well.RetinaNet [17] , Faster R-CNN [9] and Cascade R-CNN [10] are all such algorithms; The anchor-free algorithm directly regresses the center point, corner point or bounding point used to calculate the bounding box from the feature points, and then calculates the bounding box by clustering or transformation algorithm.CornerNet [18] first predicted the corner points in the upper left corner and the lower right corner, and then used a special clustering method to obtain the bounding box of the target.ExtremeNet [19] used segmentation supervision to locate the extreme point of the target in the X-Y direction.Rep-Points [20] predicted a set of point representations from each feature map point, generated a pseudo bounding box from this set of point representations, and then used the ground truth box for semi-supervised learning.Whether the algorithm is based on anchor or the algorithm without anchor, the scale distribution of detection object/defect is very important for the selection and design of detection algorithm.Therefore, we conduct a multi-dimensional detailed analysis of weld defect data set.

Receptive Field
Receptive field is a basic concept of deep convolution neural network.It is a region of the input on which the convolution network depends.The value outside this region will not affect the output value.Ref. [21] proposed that the effective receptive field is smaller than the theoretical receptive field and follows the Gaussian distribution.The receptive field must be large enough to cover the whole relevant image area.Considering both efficiency and accuracy, dilated convolution is often used to expand the receptive field in object detection and segmentation tasks.Deeplab series [22,23] respectively explored the dilated convolution of different structures (including width, depth, etc.) to improve the accuracy of semantic segmentation.Based on the Inception [24] structure, RFBNet [25] used different scales of conventional convolution and dilated convolution on each branch to generate different receptive fields through different convolution kernel of conventional convolution.Ref. [26]  used Kronecker revolution to solve the problem of local information loss in dilated convolution.Inspired by these works and combined with the distribution characteristics of weld defects, we design the Dilated Rep-Points Detector (DRepDet).DRepDet uses dilated convolution to expand the receptive field, and uses different dilated rates according to different aspect ratios of defects to better solve the problem of huge difference in aspect ratio.

Defect Detection
With the rapid development of deep learning and computer vision technology, many advanced algorithms have been proposed for industrial defect detection.Ref. [27] proposed an automatic defect detection framework for aluminum conductor composite core (ACCC) based on image classification network, which takes inception ResNet as the backbone network.Based on the mask region-based CNN architecture [28] , an X-ray image casting defect detection system was constructed.In order to improve the accuracy of porosity defect detection, Ref. [29] proposed a semantic segmentation network based on encoder-decoder structure to recognize porosity defects at the pixel level.Ref. [30] used a three-stage system based on AdaBoost (defect extraction, defect detec-tion and defect identification) to identify five weld defects (crack, lack of penetration, lack of fusion, circle and strip).The real defect detection scene of industrial site needs to detect more than 10 kinds of defects.In order to make the automatic defect system more suitable for the field environment, we design and implement a detection system that could detect 7 kinds of defects, including porosity, slag inclusion, cracks, lack of penetration, lack of fusion, intensive dispersed and intensive chain.As far as we know, this is a Liquefied Natural Gas (LNG) weld defect automatic detector with the most types of defects at present.

Problem Description
Compared with the images in COCO dataset, the situation of weld images is more complicated, mainly with differences in the following aspects: 1) In order to detect defects in more complex weld images, most of the images are 300 dpi-570 dpi, which leads to higher resolution images with size about 3 000×2 000-8 000×2 000; 2) The scale of weld defects varies greatly, the smallest defect is less than 10×10, and the largest defect is about 7 000×300; 3) The height/width ratio of weld defects ranges from 1:1 to 91:1, most of which are 1:1 and 2:1, as shown in Fig. 1; 4) The edge line between the defect and the background is vague, as shown in Fig. 2 (d), which makes the identification of defects more difficult.This paper mainly considers the scale difference and aspect ratio of weld defects, and designs a convolution network DRepDet suitable for weld defects to solve the problem of difficult identification of large-scale and abnormal aspect ratio defects.In Section 4.2, we conduct a There are several ways to expand the receptive field: 1) Make the network deeper by stacking more layers, which will increase the amount of parameters and calculation; 2) Make more down sampling, which will reduce the resolution of the feature map, and lowresolution features will reduce the accuracy of object positioning and recognition; 3) Use dilated convolution, which can effectively avoid the shortcomings of the first two methods, but the grid problem will occur.Table 1 lists the comparison of down sampling rate, receptive field, parameter and calculation complexity of the two models with different depths of the classical neural network ResNet in computer vision.Table 1 shows that increasing the depth of the model will greatly increase the number of parameters and calculations of the model, which has high requirements for memory and computing power, and is not applicable in industrial scenarios.Too much down sampling will lead to the loss of a large amount of detail information, which is not friendly for the detection of small defects, and there are a large number of small porosity defects in the weld defects.Therefore, increasing the down sampling rate is not suitable for the detection of weld defects.In this paper, dilated convolution will be used to solve the problem of insufficient receptive field, and different dilated rates will be used to solve the problem of grid and abnormal aspect ratio defects.

DRepDet Model and Algorithm
In this section, we first analyze the receptive field required by weld defects, then introduce RepPoints network architecture and the improved DRepDet architecture, and finally describe the DRepDet algorithm in detail.

Enlarge Receptive Field
In a typical neural network structure, the value of each output node of the fully connected (FC) layer depends on all inputs of the FC layer, while the value of each output node of the convolution layer depends on only one area of the convolution layer input.Other input values outside this area will not affect the output value, which is the receptive field.The pixels in the image area outside the receptive field area will not affect the feature vector on the feature map, so it is unlikely that the neural network only relying on a feature vector can find the object outside the corresponding input receptive field.
General tasks require that the larger the receptive field, the better.For example, the receptive field of the last convolution layer in image classification should be larger than that in the input image.The deeper the network depth, the larger the receptive field, and the better the performance.Dense prediction tasks require that the  receptive field of output feature map is large enough to ensure that important information is not ignored when making decisions.Generally, the deeper the better.The anchor boxes preset in the object detection task should strictly match the receptive field.If the anchor box is too large or deviates from the receptive field, it will seriously affect the detection performance.
The calculation formula of receptive field is as follows: s is the step size of the convolution layer, j out is the overall jump of the output characteristic graph, which is equal to the jump of the input characteristic graph, j in times the step of the current convolution layer s, k is the size of the convolution kernel, dr is the dilated rate of the convolution kernel, r in is the receptive field of the input characteristic graph, and r out is the receptive field of the output characteristic graph.
We have made statistics on the defect area, width and height of the training set according to the size of the receptive field, as shown in Table 2.The receptive field is increased from 675 to 867.It can be seen from Table 2 that the receptive field of most defects is below 291× 291.The large-size defects are mostly harmful defects.In the field of industrial defect detection, the detection of hazardous defects is more important.The detection of this part of defects means that the products are unqualified.Therefore, the detection of this part of defects is particularly important.Ref. [21] proposed the effective receptive field (ERF) theory.The paper found that not all pixels in the receptive field contribute the same to the output vector.In many cases, the distribution of pixels in the receptive field region follows the Gaussian distribution, the effective receptive field is only a part of the rational receptive field, and the Gaussian distribution decays rapidly from the center to the edge.In order to improve the detection performance of large-scale defects, it is necessary to output the feature map with a larger receptive field.

DRepDet Architecture
The overall architecture of DRepDet network is illustrated in Fig. 3.It is modified on the basis of anchorfree detection network RepPoints.
The detection head (bounding box regression and classification) of DRepDet network is the same as Rep-Points.It is an anchor-free detection framework.It represents the object as a set of points for positioning and recognition, while most object detectors rely on rectangular bounding box for detection and recognition.The bounding box indicates that only the rectangular spatial area of the target is considered, without considering the shape, posture and the location of the semantically important local area.In order to overcome the above disadvantages, RepPoints models a group of adaptive sampling points, which will automatically adjust to the bounding position of the object.In the training process, this set of sampling points is used to generate a pseudo box, which is compared with the ground truth box to calculate the loss.

DRepDet Algorithm
The core of DRepDet algorithm is the DResBlock module, which is similar to the residual module of ResNeXt network.Each branch of the residual module of ResNeXt network uses convolution kernels with the same size, width and height, and the extracted features cannot well represent objects with large differences in size and shape.The DResBlock module we proposed uses multiple groups of group convolution with different dilated rates to solve this problem.When designing the DResBlock module, we adopted the bottleneck design similar to ResNeXt, mainly considering the following schemes (In these four schemes of DResBlock and corresponding figures, d denotes dimension): 1) Four groups of 32×4d grouping convolutions with different dilated rates, and the results of the four groups of convolutions are added elementwise and added with the residual as the output, as shown in Fig. 4 (a); 2) Four groups of 8×4d grouping convolutions with different dilated rates.The input channel of each group is reduced by 1×1 convolution.The results of the four groups of convolutions are concatenated and the residual is added as the output, as shown in Fig. 4 (b); 3) Four groups of 8×4d group convolutions with different dilated rates.After 1×1 convolution, the channel dimension is directly divided into 4 groups.The results of the 4 groups of convolutions are concatenated and add the residual as the output, as shown in Fig. 4 (c); 4) Four groups of 32×4d grouping convolution with different dilated rates.After concatenated, the 4 groups of results are output after 1×1 convolution and residual connect, as shown in Fig. 4 (d).
The purpose of DResBlock module is to expand the receptive field to match the size of weld defects.The receptive fields obtained by the above four schemes are the same, but the model complexity and model performance are slightly different.We conduct a detailed experimental comparison in Section 4.2.
The backbone network of DRepDet algorithm is built based on DResBlock and follows the ResNeXt architecture.There is a stem module and four convolution stages.Each stage is composed of 3, 4, 6 and 3 bottleneck convolution blocks, respectively.From the second convolution stage, the convolution layer of 3×3

Loss Function
DRepDet is a one-stage detector, and there is no Region Proposal Network (RPN).Positive and negative samples are extremely imbalanced.To alleviate this imbalance, the classification loss in training one stage detector mostly uses Focal Loss.We follow RetinaNet in supervising classification.SmoothL1Loss is a common regression loss for two-stage detectors, but this loss is sensitive to scale.When SmoothL1Loss is the same, IoU may vary greatly, which has a great impact on one-stage detectors without candidate proposal mechanism.Therefore, during the training of DRepDet, we choose GIoU loss to supervise the bounding box regression.GIoU loss is used both in the initial stage and the refine stage. FL( where α t is a balance variant and γ is a tunable focusing parameter to adjust the rate at which easy examples are down-weighted, p t is the predicted classification score.
where A and B are bounding boxes, C is the smallest enclosing object of A and B, C\A È B is the area occupied by C excluding A and B.
The final loss L is a weighted sum of classification loss L class and the two stage regression loss(L regress -init , L regressrefine ), as in formula (5).

Data Sets and Settings
We have conducted extensive experiments on our private weld defect dataset to validate the effectiveness of the algorithm proposed in this paper.The weld defect images are collected from the on-field pressure vessel, with a resolution of about 8 000×2 000.The dataset includes 5 400 annotated images, 80% for training and 20% for validating.Due to the huge image size, 4 000×1 000 cutting is performed on the training set, and the overlap rate is 50%.Image cutting is not performed during inference.
We use the stochastic gradient descent (SGD) optimizer for training.The batch size is 8,4 GPUs (two pictures per GPU).The pretraining model on the COCO dataset is used for fine tuning, the initial value of the learning rate is 0.000 5, and the learning rate scheduling follows the "2×" setting [31] .Data augmentation during training only uses random horizontal flip with probability of 0.5, and data augmentation is not used in inference process.

Comparison of different dilated convolution experiments
In Table 3, we list the accuracy and recall of small defects and large defects under different dilated rates.The precision of oversized defects and defects with large height/width ratio is increased from 60.4% to 63.5%.The AP50 and Recall50 of big defects are improved by 3.1% and 3.3%, respectively.It can be seen from Table 3 that after using dilated convolution, the performance of small defects is hardly affected, while the performance of large defects is greatly improved.Since most of the large defects are hazardous and the number of these samples is small, the overall performance is not as good as the small defects.There are many long strip defects in hazardous defects, so we design the dilated rate with different width and height.Because there are both horizontal and vertical strip defects, and only one type of dilated convolution can improve the defect performance of the corresponding shape, we fuse the characteristics of different dilated rates.The fused features significantly improve the overall detection performance.

Comparison of four schemes of DResBlock
We select mAP50, mRecall50, FLOPs and params as the evaluation metrics of the four schemes of DRes-Block.mAP50 is a commonly used metric to evaluate the comprehensive ability of all classes.It is an average of AP50 values on all classes.Similarly, mRecall 50 is an average of Recall50 values on all classes.In addition, we use FLOPs and parameters to evaluate the computational complexity of different models.Table 4 lists the comparison of evaluation metric of four schemes of DResBlock.It shows that scheme (d) has relatively high computation and memory complexity, but the accuracy and recall rate are better than other schemes.Since the industrial scene of weld defects requires higher accuracy, we choose scheme (d) for subsequent experiments if not specified.

Comparison of various network architectures
The results of experiments on anchor-free networks, anchor-based networks and our proposed networks are illustrated in Fig. 5. Figure 5 shows that the AP50 and Re-call50 of the network without anchor is higher than those of based on anchor.The reason is that the defect scale and aspect ratio of the weld defect vary greatly, and the preset anchor cannot cover such a large-scale range.The performance of our proposed network is further improved because the receptive field is adjusted according to the defects of large-scale and large aspect ratio of the dataset.The final performance of the whole network is improved a large margin, with 6% AP50 and 4.2% Recall50 compared with Cascade R-CNN and 1.4% AP50 and 2.9% Recall50 compared with RepPoints .

Conclusion
In this paper, we first carry out a multi-dimensional detailed analysis of weld defects, and find that weld defects had the problem of large difference in size and aspect ratio.To solve this problem, we analyze the receptive field of the network and study the method of expanding the receptive field.Based on these studies, a DResBlock module is proposed, which uses multi branch convolution layers with different dilated rates to solve the problem of insufficient receptive fields.Based on DResBlock module, we design DRepDet model to detect weld defects.DRepDet can detect seven kinds of weld defects, and the receptive field is increased from 675 to 867.The precision of oversized defects and defects with large height/width ratio is increased from 60.4% to 63.5%, and the precision of defects with normal size is also improved.

Fig. 1
Fig. 1 Statistics of height/width ratio of weld defects

6 a
, c, d, e, f, g a, b, c, d, e, f, g a, b, c, d, e, f, g a, b, c, d, e b, c, d, e, f, b, c, d, e, f, , c, d, e, f, g a, b, c, d, e, f, g a, b, c, d, e, f, g b, c, d, e, f, g b, c, d, e, g a, b, c, d, e, : porosity; b: slag inclusion; c:intensive dispersed; d: intensive chain; e: lack of fusion; f: lack of penetration; g: crack； * is the receptive field of RepPoints backbone field of RepPoints and the scale of defects.The receptive field of the original backbone network of RepPoints cannot meet the distribution requirements of weld defects.Therefore, we design a multi branch convolution module DResBlock with different dilated rates to solve the problem that the scale of defects changes greatly, but the receptive field is not large enough.The backbone network of RepPoints is based on RetinaNet, generating five pyramid levels from stage 3 (down sampling rate of 8) to stage 7 (down sampling rate of 128).In order to expand the receptive field, we replace stage 4 with DResBlock.As shown in the dashed box at top center in Fig.3, this module uses four groups of convolutions with dilated rates (dr) of (1,1),(2,