Issue 
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 2, April 2023



Page(s)  150  162  
DOI  https://doi.org/10.1051/wujns/2023282150  
Published online  23 May 2023 
Computer Science
CLC number: U 495
Complex Traffic Scene Image Classification Based on Sparse Optimization Boundary Semantics Deep Learning
^{1}
School of Electronic & Control Engineering, Chang'an University, Xi'an 710064, Shaanxi, China
^{†} To whom correspondence should be addressed. Email: conquest8888@126.com
Received:
18
July
2022
With the rapid development of intelligent traffic information monitoring technology, accurate identification of vehicles, pedestrians and other objects on the road has become particularly important. Therefore, in order to improve the recognition and classification accuracy of image objects in complex traffic scenes, this paper proposes a segmentation method of semantic redefine segmentation using image boundary region. First, we use the SegNet semantic segmentation model to obtain the rough classification features of the vehicle road object, then use the simple linear iterative clustering (SLIC) algorithm to obtain the over segmented area of the image, which can determine the classification of each pixel in each super pixel area, and then optimize the target segmentation of the boundary and small areas in the vehicle road image. Finally, the edge recovery ability of condition random field (CRF) is used to refine the image boundary. The experimental results show that compared with FCN8s and SegNet, the pixel accuracy of the proposed algorithm in this paper improves by 2.33% and 0.57%, respectively. And compared with Unet, the algorithm in this paper performs better when dealing with multitarget segmentation.
Key words: traﬀic scene / SegNet / image classification / simple linear iterative clustering(SLIC) / conditional random field / boundary number
Biography: ZHOU Xiwei, male, Ph.D. candidate, research direction: deep learning and embedded system. Email: zhouxiwei@chd.edu.cn
Fundation item: Supported in part by the Shaanxi Natural Science Basic Research Program(2022JM298),the National Natural ScienceFoundation of China(52172324), Shaanxi Provincial Key Research and Development Program(2021SF483) and the Science and Technology Project of Shaan Provincal Transportation Department(21202K,2038T)
© Wuhan University 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
0 Introduction
With the continuous growth of the number of private cars, the accidents of malignant traffic occur frequently on expressways, which causes certain losses to the economy. In order to solve the traffic safety problems and ensure the safety of passengers' lives and property, the concept of Advanced Driver Assistance System (ADAS) has been proposed. ADAS mainly uses a variety of onboard sensors to obtain environmental information inside and outside the vehicle, and through appropriate information processing, analysis, fusion and the supplement of decisionmaking control, realtime perception of driver's own state, vehicle status and external environment, the ADAS can judge the potential dangers that the vehicles may face. Warn can be issued when it is necessary, or vehicle control systems are directly involved to perform part of the operation and improve vehicle driving safety^{[1,2]}.
According to the statistics of relevant literature, more than 90% of environmental information is acquired by visual means^{[3]}. As one of methods of the most effective perception in ADAS system, visual perception can provide the most intuitive, reliable and abundant environmental information for ADAS system. At present, the main research of vision sensing technology of traﬀic environment is divided into object detection algorithm and image segmentation algorithm. The segmentation of road images is one of the most basic and important research fields, while the algorithm for target segmentation is mainly deeplearning algorithms based on convolutional neural network and the traditional machine learning algorithms.
Traditional image segmentation algorithms include methods based on image threshold^{[46]}, edge detection^{[710]} and region image segmentation^{[1114]},etc. Ref.[4] proposed an adaptive gray enhancement and linear region threshold segmentation algorithm, which enhances the gray contrast of the target and background, avoids the bad influence of image segmentation method based on single threshold, and improves the accuracy of target recognition and measurement. Ref.[5] proposed a multilevel threshold optimization algorithm incorporating Kapur entropy, and this algorithm uses whale optimization algorithm (WOA) to improve the segmentation accuracy. Ref.[12] proposed an image segmentation method based on sector ring region, by which the object and background in the image can be accurately separated. Ref.[15] divided the image area, and used the improved Hough transform and the tangent relationship of the lane line model to realize the recognition and reconstruction of the lane line. However, due to the complexity of road scenes and the richness of target categories, these traditional image segmentation methods can not accurately distinguish target categories, and the probability of missed detection and false detection is high in practical application. Moreover, these algorithms have long calculation time, low realtime, and large limitations in real scenes. Since 2012, deeplearning algorithms have been rapidly applied to target recognition^{[1619]}, target detection^{[2022]} and other tasks, and have achieved remarkable results. Deep learning is widely used in image semantic segmentation algorithms and intelligent vehicle assisted driving, providing reliable guidance and decisionmaking for intelligent vehicle assisted or active driving. Fully Convolutional Networks (FCN) proposed by Long et al^{[23]} is used for image semantics segmentation and pixel level classification. When any size image is input and the full connection layer in traditional convolution neural network (CNN) is replaced by convolution layer, arbitrary size images can be classified by the finetuning model. However, the disadvantage of FCN is that the results are not precise enough and lack of spatial consistency. SegNet algorithm proposed by Badrinarayanan et al^{[24]} is used for semantics image segmentation. Its central idea is based on FCN, which adopts symmetrical encodedecode structure. The encoder part uses the first 13 layers of convolutional network of VGG16. Each encoder layer corresponds to a decoder layer. Finally, the output of the decoder is sent to the softmax classifier to generate class probability for each pixel independently, which improves the accuracy of image segmentation. Gan etal^{[25]} proposed an improved Unet model of self attention mechanism, which had certain improvement in image segmentation of similar regions. Qian et al^{[26]} improved the RCNN network and introduced the characteristic pyramid network into the backbone network, which achieved higher accuracy in traffic sign segmentation. Ref.[27] fused polarization features and intensity images under low visibility conditions, and used depth neural network to segment and identify the vehicle road environment, which ultimately improved the perception effect under adverse weather conditions. Although deep learning has developed rapidly in the field of image segmentation, the difficulties of multitarget recognition and segmentation in complex traffic environments under changing conditions lie in the variety of targets, mutual occlusion between targets, and many similar regions. The existing network needs to have higher accuracy and robustness.
Based on the above ideas, this paper proposes a target boundary optimization algorithm for complex vehicle road images. The LFSegNet^{[28]} network model is used to classify different target categories in the road images, but the network modeling ability is insuﬀicient, resulting in poor network stability and target classification. The recognition accuracy is not high. Therefore, it is necessary to further optimize the classification to improve the overall performance of image classification. In this paper, two prior constraints are introduced by using the superpixel feature. The first constraint is based on the correlation between adjacent pixels, which are likely to belong to the same category. The second constraint is the label map and the original image. The boundary information is basically the same. With these two conditions, the road image boundary of the SegNet network segmentation can be optimized.
1 Overall Design
The proposed image boundary optimization algorithm feeds back the boundary and contour information of the original road image extracted by superpixels to the convolution neural network classification image, further enhances and improves the preliminary model, and achieves the accurate classification of complex road targets. Firstly, the SegNet algorithm is used to extract the target pixel level features. Then, the simple linear iterative clustering (SLIC)^{[1924,29]} algorithm is used to extract the features of image superpixel blocks and edge information. Then, the image boundary optimization is realized by combining the features of pixel level and superpixel blocks. Finally, the precise edge recovery capability of Condition Random Field (CRF) is used to optimize the segmentation results. This paper optimizes the boundary of road image and the false segmentation of small area targets. The specific scheme framework is shown in Fig.1.
Fig.1 The flow chart of the proposed method 
From Fig.1, we can see that the road image is semantically classified and superpixel segmented by SegNet algorithm and SLIC algorithm, respectively. The boundary optimization algorithm proposed in this paper is used to optimize the image boundary and to deal with local false segmentation. However, this method is not effective for the object with slender size, so the boundary recovery ability of conditional random field is used to deal with such object edges finally. Information is restored to further improve the effect of target classification.
2 Semantic Segmentation Algorithm Based on SuperPixel Boundary Optimization
The boundary optimization algorithm based on superpixel is mainly divided into three parts:
1) Pixel classification: using convolution neural network algorithm to obtain image features, and classify and recognize each pixel.
2) Region segmentation: combining the pixels with similar features in street scene image to form several representative regions and obtain the over segmented regions of the image, and the classification of pixels in each region determining by combining the neural network.
3) Recognition of pixel categories: according to the pixel classification algorithm in each region, the pixels in the whole image are reclassified. The flow chart of the boundary optimization algorithm is shown in Fig.2.
Fig.2 Image semantic segmentation based on superpixel 
2.1 Semantic Segmentation Algorithm Based on Boundary Optimization
In this paper, SegNet^{[28]} network is used to classify images at the pixel level. SegNet network is mainly composed of encoder and decoder. The network model is shown in Fig.3. The coding process is mainly based on the pretraining model VGG16 network, which retains 13 convolution layers to extract image features. Only the last three full connection layers need to be discarded, which greatly reduces the number of learning parameters. SegNet network has 13 convolution layers, 5 pooling layers, 13 deconvolution layers and 5 upper sampling layers. The pool layer uses 2×2 window and strides 2 step size. Each pool layer is equivalent to a halfresolution reduction of the image. During each max pool, the location of the maximum value in each pooling window in feature maps is recorded. The decoding process mainly uses the largest pooled index recorded to sample the feature maps of the input image after convolution pooling. The upsampling processes of SegNet and FCN are shown in Fig.4(a) and (b), respectively.
Fig. 3 The model of SegNet network structure 
Fig. 4 Upsampling differences between SegNet and FCN 
In Fig.4(a), the upsampling process of SegNet is that characteristic graph values 1, 2, 3, 4 are mapped to a new feature graph by the maximum pooled coordinates previously saved; in Fig.4(b), the upsampling process of FCN is that by deconvoluting the eigenvalues 1, 2, 3 and 4, the new characteristic chart is added to the corresponding convolutional characteristic chart.
The test accuracy of the trained SegNet model is higher than that of the FCN algorithm. It is also used in the image segmentation of vehicleroad environment. By restoring the original image details hierarchically with multiple decoders, better boundary accuracy can be obtained to a certain extent.
Classification of each target is obtained by convolution layer, pooling layer and deconvolution layer, and each pixel is classified by Softmax function. The Softmax function is
${S}_{i}^{p}=\frac{{e}^{{c}_{i}^{p}}}{{\sum}_{\mathrm{1}}^{N}{e}^{{c}_{i}^{p}}}$(1)
In image classification, ${S}_{i}^{p}$ denotes the probability that the pixel p belongs to class i, and N represents the total number of categories, and ${e}^{{c}_{i}^{p}}$ denotes the scoring value of point p belonging to class i in score graph. Then according to the error between the predicted value and the real value, the crossentropy loss function is constructed:
${l}_{p}={\sum}_{i}{y}_{i}^{p}\mathrm{l}\mathrm{n}{S}_{i}^{p}$(2)
where ${y}_{i}^{p}$ denotes the true probability that the pixel p belongs to class i, and ${y}_{i}^{p}$ is defined as follows:
$y(\mathrm{1})=\left[\begin{array}{c}\mathrm{1}\\ \mathrm{0}\\ \mathrm{0}\\ \vdots \\ \mathrm{0}\end{array}\right],y(\mathrm{2})=\left[\begin{array}{c}\mathrm{0}\\ \mathrm{1}\\ \mathrm{0}\\ \vdots \\ \mathrm{0}\end{array}\right],y(\mathrm{3})=\left[\begin{array}{c}\mathrm{0}\\ \mathrm{0}\\ \mathrm{1}\\ \vdots \\ \mathrm{0}\end{array}\right],\dots ,y(k\mathrm{1})=\left[\begin{array}{c}\mathrm{0}\\ \mathrm{0}\\ \mathrm{0}\\ \vdots \\ \mathrm{1}\end{array}\right],y(k)=\left[\begin{array}{c}\mathrm{0}\\ \mathrm{0}\\ \mathrm{0}\\ \vdots \\ \mathrm{0}\end{array}\right]$(3)
If the pixel belongs to the third class, then the crossentropy loss function represents the error between the predicted value and the real value, and the smaller ${l}_{p}$ is, the higher the accuracy of the prediction is. By restoring the original image details hierarchically with multiple decoders in SegNet network, better boundary accuracy can be obtained to a certain extent. However, due to the complexity of road images and the ambiguity of some target boundaries, it is impossible to classify all targets directly by using SegNet network only.
2.2 SLIC Algorithm
In this paper, SLIC algorithm is used for image region segmentation. SLIC algorithm converts image from RGB color space to CIELAB color space. The color values of (l, a, b) and (x, y) coordinates of each pixel constitute a five dimensional vector V[l, a, b, x, y]. The similarity of two pixels can be measured by their vector distance. The larger the distance, the smaller the similarity. The detailed flow of SLIC algorithm is as follows:
Step 1: Setting the number of superpixels is K for the color image in CIELAB color space. Initial clustering centers are initialized by ${\mathit{C}}_{i}=({l}_{i},{a}_{i},{b}_{i},{x}_{i},{b}_{i}{)}^{\mathrm{T}}$ and the step length of clustering center of superpixels is
Step 2: Clustering. Class labels are assigned to each pixel in the neighborhood of each seed point. By calculating the distance between the seed point and each pixel, the lab color space distance between the seed point and the seed point is initialized and infinite, which represents the space coordinate distance and the comprehensive distance between the seed point and the seed point.
${d}_{\mathrm{c}}=\sqrt[]{{\left({l}_{j}{l}_{i}\right)}^{\mathrm{2}}+{\left({a}_{j}{a}_{i}\right)}^{\mathrm{2}}+{\left({b}_{j}{b}_{i}\right)}^{\mathrm{2}}}$(5)
$D=\sqrt[]{{\left(\frac{{d}_{\mathrm{c}}}{{N}_{\mathrm{c}}}\right)}^{\mathrm{2}}+{\left(\frac{{d}_{\mathrm{s}}}{{N}_{\mathrm{s}}}\right)}^{\mathrm{2}}}$(6)
$D\text{'}=\sqrt[]{{\left(\frac{{d}_{\mathrm{c}}}{m}\right)}^{\mathrm{2}}+{\left(\frac{{d}_{\mathrm{s}}}{S}\right)}^{\mathrm{2}}}=\sqrt[]{{{d}_{\mathrm{c}}}^{\mathrm{2}}+{\left(\frac{{d}_{\mathrm{s}}}{S}\right)}^{\mathrm{2}}{m}^{\mathrm{2}}}$(7)
${N}_{\mathrm{s}}$ is the maximum spatial distance within a class, which is defined as ${N}_{\mathrm{s}}$=S and it is suitable for each cluster. The maximum color distance is ${N}_{\mathrm{c}}$. The final distance metric $D\text{'}$ is as follows:
$D\text{'}=\sqrt[]{{\left(\frac{{d}_{\mathrm{c}}}{m}\right)}^{\mathrm{2}}+{\left(\frac{{d}_{\mathrm{s}}}{S}\right)}^{\mathrm{2}}}=\sqrt[]{{{d}_{\mathrm{c}}}^{\mathrm{2}}+{\left(\frac{{d}_{\mathrm{s}}}{S}\right)}^{\mathrm{2}}{m}^{\mathrm{2}}}$(8)
Step 3: Iterative optimization. Repeat Step 2 until all clustering centers do not change.
Step 4: Remove outliers and enhance connectivity. When the number of input superpixels K and the maximum color distance m are different, different segmentation effects will be produced. The effect is shown in Fig.5. It can be seen from the graph that using hundreds or thousands of superpixels to optimize massive image data can obtain accurate image boundary, realize efficient image processing, understanding and expression, and serve for efficient and flexible perception of road environment information. Figure 5 shows that we can not only get clear edge areas of the image, but also greatly reduce the computational complexity of the pixel samples and improve the computational efficiency by using hundreds or thousands of superpixels instead of massive image data .
Fig.5 Results of superpixel segmentation with different parameters 
2.3 Reclassification
For the reclassification of boundary and missegmented pixels in each region, the specific steps are as follows: All the pixels in each superpixel ${S}_{p}$ are $C=\{{C}_{\mathrm{1}},{C}_{\mathrm{2}},\cdots ,{C}_{S}\}$ , the number of pixel labels ${n}^{k}$ for each class in this superpixel is counted:
${n}_{p}=\{{n}_{\mathrm{1}},{n}_{\mathrm{2}},{n}_{\mathrm{3}},\cdots ,{n}_{K}\}$(9)
${n}_{i}$ indicates the number of labels of the pixels owned by class $i$ in region $P$, then the maximum value ${n}_{j}$ in ${n}_{p}$ is found, and the pixels in region $P$ are classified as class $j$. But there is also a case that when the number of maximum number tags ${n}_{i}$ is close to the number of sub maximum number tags ${n}_{j}$ in ${n}_{p}$,it is impossible to determine whether the region $P$ belongs to class $i$ or $j$, but in the neural network, it is defined as the highest probability of classification, so there will be a phenomenon of false segmentation. Thus we need to define a threshold $T$.
$T=\frac{{n}_{i}{n}_{j}}{S}$(10)
In this paper, $T$ is 0.2. If $T$ is larger than 0.2, the pixels in region $P$ are classified as class $i$. Otherwise, they are still classified according to the result of semantic segmentation of convolutional neural network.
2.4 Specific Flow of Boundary Optimization Algorithms
There are many ways to realize the optimization of the image boundary, and the method used in this paper is to combine superpixels with convolutional neural network to achieve the optimization of the image boundary. Firstly, SegNet image semantics segmentation algorithm based on VGG16 network model is used to realize the rough segmentation of images and extract rough features. SLIC algorithm is used to process the image of image segmentation generated superpixel. Then rough features are optimized by the boundaries of these superpixel objects. This method can improve the accuracy of object boundary segmentation in a way. The key algorithm of boundary optimization has been fully proved in Algorithm 1.
The flow chart of the Algorithm 1 is as follows:
Step 1: Input original image I and rough feature graph L extracted by SegNet algorithm.
Step 2: After segmenting the original image with SLIC algorithm, K superpixels ${S}_{p\text{}}=\text{}\{{S}_{\mathrm{1}},\text{}{S}_{\mathrm{2}},\cdots ,\text{}{S}_{K}\}$ are obtained, and the area of each superpixels is marked with label i.
Step 3: for $i=\mathrm{1}:K$
1) All the pixels in the superpixel ${S}_{i}$ are ${S}_{i}\text{}=\text{}\{{C}_{\mathrm{1}},\text{}{C}_{\mathrm{2}},\cdots ,\text{}{C}_{N}\text{}\}$ , where ${C}_{j}$ corresponds to one of the pixels in class j in the feature map;
2) Initialize the number of occurrences of the same label to n = 0;
3) for $j=\mathrm{1}:K$
The feature label of ${C}_{j}$ is saved as ${L}_{{C}_{j}}$ , and the number of pixels with the same label is calculated, and the whole superpixel is traversed.
${n}_{j}=\sum {C}_{j\text{}}$(11)
${W}_{{C}_{j}}$ is the proportion of pixels in the same label ${L}_{{C}_{j}}$ .
${W}_{{C}_{j}}\text{}=\text{}\frac{{n}_{j}}{N}$(12)
4) Redistribution of labels.
If ${W}_{{C}_{j}}$ > 0.8, mark the superpixel with the corresponding label ${L}_{{C}_{\mathrm{m}\mathrm{a}\mathrm{x}}}$of ${W}_{{C}_{j}}$ and jump to Step 4.
Else search for maximum ${W}_{\mathrm{m}\mathrm{a}\mathrm{x}}$ and submaximum ${W}_{\mathrm{s}\mathrm{u}\mathrm{b}}$; If ${W}_{\mathrm{m}\mathrm{a}\mathrm{x}}$  ${W}_{\mathrm{s}\mathrm{u}\mathrm{b}}$ > 0.2, mark the super pixel with ${L}_{{C}_{\mathrm{m}\mathrm{a}\mathrm{x}}}$corresponding to ${W}_{\mathrm{m}\mathrm{a}\mathrm{x}}$, and jump to Step 4.
Else use the class of the segmentation result of DeepLabV2 to mark the super pixel.
Step 4: Use ${L}_{{C}_{\mathrm{m}\mathrm{a}\mathrm{x}}}$ to reassign the current superpixel classification and output the image ${I}^{\text{'}}$.
Through this algorithm, we can optimize the boundary of the road image and the scene of small area of missegmentation in road image, which shows that there are many small "spots" in the image. Through SLIC algorithm processing, these "spots" can be eliminated by its similarity with the block labels of surrounding superpixels to a certain extent. The specific effect is shown in Fig.6.
Fig. 6 The effect diagram of local optimization 
The two images on the left side of Fig.6 are superimposed by the vehicle road image of SegNet segmentation and superpixel segmentation, but its boundary is not optimized. As shown in the enlarged part 1,2,3,4 of the image, we can see that the boundary of the image and some parts of the local area are not classified. Therefore, the optimization of the algorithm in this chapter can show that the boundary of the image is optimized and the error of local small. Figure 6 is the effect diagram of local optimization. Thus,area classification can be eliminated. For example, 1 and 2 of the local graph show that the boundary is optimized, and 3 and 4 show that some small "spots" are correctly classified. The specific optimization diagram is shown in Fig.7.
Fig. 7 Specific optimization diagram 
2.5 Boundary Restoration
Although the superpixel boundary optimization algorithm can optimize the road image boundary and the wrong segmentation area, the optimized image boundary is no longer smooth, and it is difficult to segment into several subblocks for such objects as poles, fences and so on. Therefore, CRF model is used to refine the image boundary in the subsequent optimization in this section to restore the boundary more accurately.
CRF^{[30]} is a typical discriminant model, which is similar to the probabilistic indirected graph model. If the pixel label in the image is regarded as a node, the weight relationship between adjacent two pixels is regarded as an edge. The graph $G=(V,E)$ is given by the method of graph theory, where $V\text{}=\text{}\{{V}_{\mathrm{1}},\text{}{V}_{\mathrm{2}},\text{}\xb7\text{}\xb7\text{}\xb7,\text{}{V}_{N}\text{}\}$ is the vertex set and $E\text{}=\text{}\{{E}_{\mathrm{1}},\text{}{E}_{\mathrm{2}},\text{}\xb7\text{}\xb7\text{}\xb7,\text{}{E}_{N}\text{}\}$ is called the edge set. Each edge $E$ in E_{i} connects two vertices in V. Let's assume that the class label of the whole picture is $L=\{{l}_{\mathrm{1}},{l}_{\mathrm{2}},\xb7\text{}\xb7\text{}\xb7,{l}_{n}\}$, where $i$ denotes a random pixel representing class ${l}_{i}$ and an input pixel of image I is N, plus normalization, and the final conditional probability of (I, L) is
$P(L\text{}=l\text{}I)\text{}=\text{}\frac{\mathrm{1}\text{}}{Z(I)}\text{}\cdot \text{}\mathrm{e}\mathrm{x}\mathrm{p}(E(l\text{}\text{}I))$(13)
where $E(l)$ is the marked Gibbs energy $l\in {L}^{N}$ , $Z(I)$ is the partition function and $E(l\text{}\text{}I)$ is the energy function. Fully connected CRF model using Energy Function is
$E(l)=\sum _{i}{\psi}_{i}({l}_{i})+\sum _{i.j}{\psi}_{i,j}({l}_{i},{l}_{j})$(14)
Among them, $\sum _{i}{\psi}_{i}({l}_{i})$ is the onedimensional potential function of the probability of taking label l_{i} for pixel i, which comes from the optimized output of frontend superpixels; $\sum _{i,j}({l}_{i},{l}_{j})$ is the binary potential function of assigning label ${l}_{i}$,${l}_{j}$ to pixel i, j at the same time, and similar pixels are assigned the same label, while the pixels with large differences are assigned different labels. In this section, the onedimensional potential can be regarded as boundary optimization feature mapping, which can help improve the performance of CRF model. The twodimensional potential function usually simulates the relationship between adjacent pixels and weights them by color similarity. The expression of the binary potential function is as follows:
$\begin{array}{l}{\psi}_{i,j}({l}_{i},{l}_{j})=\mu ({l}_{i},{l}_{j})[{\omega}_{\mathrm{1}}\mathrm{e}\mathrm{x}\mathrm{p}(\frac{{\Vert {\mathit{P}}_{i}{\mathit{P}}_{j}\Vert}^{\mathrm{2}}}{\mathrm{2}{\sigma}^{\mathrm{2}}\alpha}\frac{{\Vert {\mathit{I}}_{i}{\mathit{I}}_{j}\Vert}^{\mathrm{2}}}{\mathrm{2}{\sigma}^{\mathrm{2}}\beta})\\ +{\omega}_{\mathrm{2}}\mathrm{e}\mathrm{x}\mathrm{p}(\frac{{\Vert {\mathit{P}}_{i}{\mathit{P}}_{j}\Vert}^{\mathrm{2}}}{\mathrm{2}{\sigma}^{\mathrm{2}}\gamma}]\end{array}$(15)
Among them, ${\mathit{I}}_{i}$ and ${\mathit{I}}_{j}$ are color vectors, ${\mathit{P}}_{i}$ and ${\mathit{P}}_{j}$ are pixel positions, so the value of binary potential function depends on the pixel position and color information. σα, σβ control the proximity and similarity between two pixels. As shown in Potts model^{[31]}, if ${l}_{i}$=${l}_{j}$ , then µ(${l}_{i}$,${l}_{j}$) is 1, otherwise 0, indicating that adjacent similar pixels should be assigned different labels. The similar pixels are assigned the same label.
The "distance" refers to the relevant color space distance and the actual distance. Therefore, the accuracy of image boundary optimization can be improved by using CRF algorithm. Figure 8 is an enlarged image of local details optimized by CRF algorithm.
Fig. 8 CRF for edge detail recovery 
From Fig.8, we can see that although the superpixel boundary optimization algorithm can eliminate the local error segmentation and optimize the image edge, it still needs to strengthen the edge optimization for thin edge and complex overlapping area. After adding CRF algorithm, we can see that the image boundary details can be optimized, and more effective information can be applied.
3 Experimental Results and Analysis
In this section, we evaluate and analyze the experimental results from subjective performance and objective performance, respectively, to verify the effectiveness and performance of our proposed algorithm.
3.1 Evaluation and Analysis of Subjective Performance
In this paper, KITTI data set is used to train and simulate the network. After many times of network iteration training, the network training results are obtained. The classification results are optimized by the fusion of superpixel and convolution neural network. The segmentation results of road map tested by training model are shown in Fig.9.
Fig. 9 Contrast chart of subjective evaluation 
From Fig.9, we can see that the segmentation results of our algorithm and SegNet's algorithm are not very different from each other from a visual point of view because the image boundary pixels are relatively small. However, the phenomenon of misjudgment in small area in the image has been significantly improved. From the right grassland in the first line image, the lane part in the second line image, the traffic signs in the third line image, and the right grassland in the fifth line image, there are misjudged pixels in these areas. By using this algorithm, some areas can be classified correctly.
3.2 Evaluation and Analysis of Objective Performance
In image segmentation, many criteria are usually used to measure the accuracy of the algorithm. These criteria are usually variations of pixel accuracy and intersectionoverunion (IoU). In the formula for evaluating indicators, the basic principle is that an image has $k+\mathrm{1}$ categories, ${p}_{ij}$ indicates the number of pixels that should belong to class i but are mispredicted to class j.
1) Pixel Accuracy (PA): it represents the proportion of correctly marked pixels to total pixels.
$\mathrm{P}\mathrm{A}=\frac{{\displaystyle \sum _{i=\mathrm{0}}^{k}}{p}_{ii}}{{\displaystyle \sum _{i=\mathrm{0}}^{k}}{\displaystyle \sum _{j=\mathrm{0}}^{k}}{p}_{ij}}$(16)
2) Mean Pixel Accuracy (MPA): it represents the proportion of the number of pixels correctly classified in each class, then the average of all classes is calculated.
$\mathrm{M}\mathrm{P}\mathrm{A}=\frac{\mathrm{1}}{k+\mathrm{1}}{\displaystyle \sum _{i=\mathrm{0}}^{k}}\frac{{p}_{ii}}{{\displaystyle \sum _{j=\mathrm{0}}^{k}}{p}_{ij}}$(17)
3) Mean Intersection over Union (MIoU): it means calculating the ratio of the intersection and union of two sets. In the problem of semantic segmentation, the two sets are ground truth and predicted segmentation. This ratio can be transformed into the ratio of positive and true numbers to the sum of true, false negative and false positive (union). IoU is calculated on each class and then averaged.
$\mathrm{M}\mathrm{I}\mathrm{o}\mathrm{U}=\frac{\mathrm{1}}{k+\mathrm{1}}{\displaystyle \sum _{i=\mathrm{0}}^{k}}\frac{{p}_{ii}}{{\displaystyle \sum _{j=\mathrm{0}}^{k}}{p}_{ij}+{\displaystyle \sum _{j=\mathrm{0}}^{k}}\left({p}_{ji}{p}_{ii}\right)}$(18)
The indicators overall performance of the three different models studied in this paper are shown in Table 1 below:
The model proposed in this paper is mainly composed of three parts. SegNet model is used to roughly segment the network, SLIC algorithm is used to fine segment the segmentation result under super pixel, and finally CRF algorithm is used to restore the image boundary. To prove the effectiveness of the proposed model in this paper, the SegNet model, SegNet+SLIC algorithm (no CRF algorithm added) are compared with it. The experimental results are shown in Table 1. Table 1 shows that before adding CRF to the algorithm in this paper, the pixel accuracy of this algorithm is very similar to that of SegNet model, but after adding CRF algorithm, the pixel accuracy of our algorithm is improved by 0.71% and the average pixel accuracy is increased by 2.25%. The MIoU increased by 0.10% without CRF and increased by 0.57% with CRF compared with SegNet. Compared with Unet, the MPA and MIoU of this algorithm are 18.15% and 10.08% higher, respectively.Experiments show that the proposed algorithm improves the overall performance of road image segmentation, but the segmentation accuracy of slender objects such as poles and fences still needs to be improved.
Using the KITTI data set to test the value of IoU for each class, the contrastive values are shown in Table 2.
From Table 2, compared with FCN8s and SegNet, it can be seen that IoU of road, sidewalk, building, fence, pole, traffic light, traffic sign, vegetation, terrain, sky, rider, car, truck, bus, motorcycle and bicycle has been improved, and the performance of wall remains unchanged. The performance of person and train has declined compared with FCN8s, because there are not many human images in the data set, and most of them are concentrated in the distance of the scene. There is also the phenomenon of occlusion, it is easy to classify them into other categories by using the boundary optimization algorithm based on superpixel. After adding CRF algorithm to the image boundary restoration, the IoU of all target categories has been basically improved, which shows that CRF algorithm has a strong ability to restore image boundary details. In short, the algorithm in this paper is helpful to image boundary optimization.
Comparing the proposed algorithm with the Unet model, it can be found that although the total pixel accuracy of Unet is one percentage point higher than that of the proposed algorithm , the MPA and MIoU of Unet are significantly lower than that of the proposed model . It can be seen from Table 3 that although Unet has strong recognition ability for target categories that account for a large proportion of pixels such as road and terrain, its performance for multitarget segmentation tasks is not as good as that of the algorithm proposed in this paper. The comparison experiment shows that the algorithm in this paper has better performance for multi segmentation tasks.
Overall performance index for different algorithms
IoU indicators for various targets under different models (unit:%)
IoU differences of different models in segmenting key vehicle road environmental objects (unit:%)
4 Conclusion
This paper proposes a segmentation method of semantic redefine segmentation using image boundary region. First, the rough features of the target image are extracted using the SegNet model. Then, the SLIC algorithm is used to extract the contour information of the image edge at the super pixel level, and the super pixel feature is applied to the edge information to improve the segmentation accuracy of the target image. In addition, the CRF algorithm is used to restore the image boundary, further refining the segmentation effect. At the same time, the ablation experiment proves that the use of CRF algorithm improves the performance of the algorithm proposed in this paper.
Through the algorithm in this paper, the object segmentation result in the image is more accurate, and the recognition accuracy of each category is higher in the multi object segmentation scene. The final experiment in Table 1 shows that the MIoU of this algorithm on KITTI dataset can reach 63.39%, 2.33% higher than FCN8s, 0.57% higher than SegNet. At the same time, compared with Unet, the MPA and MIoU of this algorithm are 18.15% and 10.08% higher, respectively.
References
 Lee J, Cho C W, Shin K Y, et al. 3D gaze tracking method using Purkinje images on eye optical model and pupil[J]. Optics and Lasers in Engineering, 2012, 50(5): 736751. [NASA ADS] [CrossRef] [Google Scholar]
 Biondi F, Rossi R, Gastaldi M, et al. Beeping ADAS: Reflexive effect on drivers' behavior[J]. Transportation Research Part F Psychology & Behaviour, 2014, 25: 2733. [Google Scholar]
 D'Amato A, Pianese C, Arsie I, et al. Development and onboard testing of an ADASbased methodology to enhance cruise control features towards CO2 reduction [C]// IEEE International Conference on Models and Technologies for Intelligent Transportation Systems.New York:IEEE, 2017: 503508. [Google Scholar]
 Wang W D, Xin B J, Deng N, et al. Single vision based identification of yarn hairiness using adaptive threshold and image enhancement method [J]. Journal of the International Measurement Confederation, 2018, 128: 220230. [NASA ADS] [CrossRef] [Google Scholar]
 Yan Z, Zhang J, Yang Z, et al. Kapur's entropy for underwater multilevel thresholding image segmentation based on whale optimization algorithm[J]. IEEE Access, 2021, 9: 4129441319. [CrossRef] [Google Scholar]
 Kumar V, Lal T, Dhuliya P, et al. A study and comparison of different image segmentation algorithms[C]// Proceedings of the 2016 2nd International Conference on Advances in Computing, Communication, & Automation (ICACCA). New York: IEEE, 2016. [Google Scholar]
 Yan S J, Tang G A, Li F Y, et al. An edge detection based method for extraction of loess shoulderline from grid DEM[J]. Geomatics and Information Science of Wuhan University, 2011, 36 (3): 363367. [Google Scholar]
 Zhou W, Du X, Wang S. Techniques for image segmentation based on edge detection[C]// 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology. New York:IEEE, 2021: 400403. [Google Scholar]
 Sun Q D, Qiao Y M, Wu H, et al. An edge detection method based on adjacent dispersion[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2016, 30(10): 943951. [Google Scholar]
 Guan W L, Wang T T, Qi J Q, et al. Edgeaware convolution neural network based salient object detection[J]. IEEE Signal Processing Letters, 2019, 26 (1): 114118. [NASA ADS] [CrossRef] [Google Scholar]
 Yang Y F, Peng H , Jiang Y, et al. A regionbased image segmentation method under P systems [J]. Journal of Information and Computational Science, 2013, 10 (10): 29432950. [Google Scholar]
 Xu S Y, Peng C L,Chen K, et al. Measurement method of wheat stalks cross section parameters based on sector ring region image segmentation[J]. Transactions of the Chinese Society for Agricultural Machinery, 2018, 49(4): 5359. [Google Scholar]
 Chen Y L , Ma Y D ,Kim D H, et al. Regionbased object recognition by color segmentation using a simplified PCNN[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26 (8): 16821697. [Google Scholar]
 Su L, Fu X J, Zhang X D, et al. Delineation of carpal bones from hand Xray images through prior model, and integration of regionbased and boundarybased segmentations[J]. IEEE Access, 2018, 6: 1999320008. [CrossRef] [Google Scholar]
 Wang H, Wang Y, Zhao X, et al. Lane detection of curving road for structural highway with straightcurve model on vision[C]// IEEE Transactions on Vehicular Technology, 2019, 68(6): 53215330. [Google Scholar]
 AIKofahi Y, Zaltsman A, Graves R, et al. A deep learningbased algorithm for 2D cell segmentation in microscopy images[J]. BMC Bioinformatics, 2018, 19 (1):68. [Google Scholar]
 Li L H, Qian B, Lian J, et al. Traffic scene segmentation based on RGBD image and deep learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2018, 19 (5) :16641669. [CrossRef] [Google Scholar]
 GarciaGarcia A, OrtsEscolano S, Oprea S, et al. A survey on deep learning techniques for image and video semantic segmentation[J]. Applied Soft Computing Journal, 2018, 70: 4165. [Google Scholar]
 Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 834848. [CrossRef] [PubMed] [Google Scholar]
 Lu R T, Shen T, Yang X G, et al. Infrared dim moving target detection algorithm assisted by incremental inertial navigation information in highdynamic air to ground background (Invited)[J]. Infrared and Laser Engineering, 2022, 51 (4):5060. [Google Scholar]
 Mammeri A, Zuo T, Boukerche A. Extending the detection range of visionbased vehicular instrumentation[J]. IEEE Transactions on Instrumentation and Measurement, 2016, 65 (4): 856873. [NASA ADS] [CrossRef] [Google Scholar]
 Li J N, Liang X D, Shen S M, et al. Scaleaware fast RCNN for pedestrian detection[J]. IEEE Transactions on Multimedia, 2018, 20 (4): 985996. [Google Scholar]
 Long Z R, Wei B, Feng P, et al. A fully convolutional networks (FCN) based image segmentation algorithm in binocular imaging system[C]// International Conference on Optical Instruments and Technology, 2018: 10621. [Google Scholar]
 Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoderdecoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 24812495. [CrossRef] [PubMed] [Google Scholar]
 Gan J H, Zhang R Q. Ultrasound image segmentation algorithm of thyroid nodules based on improved UNet network[C]// Proceedings of the 2022 3rd International Conference on Control, Robotics and Intelligent System (CCRIS '22). New York: Association for Computing Machinery, 2022: 6166. [Google Scholar]
 Qian H, Ma Y, Chen W, et al. Traffic signs detection and segmentation based on the improved mask RCNN[C]//2021 40th Chinese Control Conference (CCC). New York: Association for Computing Machinery, 2021: 82418246. [Google Scholar]
 Wang H F, Shan Y H, Hao T, et al. Vehicleroad environment perception under lowvisibility condition based on polarization features via deep learning[C]// IEEE Transactions on Intelligent Transportation Systems, 2022, 23(10): 1787317886. [Google Scholar]
 Mittal A, Hooda R, Sofat S. LFSegNet: A fully convolutional encoderdecoder network for segmenting lung fields from chest radiographs,wireless personal communications[J] Wireless Personal Communications, 2018, 101(1): 511529. [CrossRef] [Google Scholar]
 Shan J C, Li X Z, Song M, et al. Semantic segmentation based on deep convolution neural network[C]// Journal of Physics: Conference Series, 2018, 1069(1): 012169. [Google Scholar]
 Gadde R, Jampani V, Kiefel M, et al. Super pixel convolutional networks using bilateral inceptions[C]// Lecture Notes in Computer Science. Berlin: SpringerVerlag, 2016, 9905: 597613. [Google Scholar]
 Wu F Y. The Potts model[J]. Reviews of Modern Physics,1982, 54(1): 235268. [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
All Tables
IoU differences of different models in segmenting key vehicle road environmental objects (unit:%)
All Figures
Fig.1 The flow chart of the proposed method 

In the text 
Fig.2 Image semantic segmentation based on superpixel 

In the text 
Fig. 3 The model of SegNet network structure 

In the text 
Fig. 4 Upsampling differences between SegNet and FCN 

In the text 
Fig.5 Results of superpixel segmentation with different parameters 

In the text 
Fig. 6 The effect diagram of local optimization 

In the text 
Fig. 7 Specific optimization diagram 

In the text 
Fig. 8 CRF for edge detail recovery 

In the text 
Fig. 9 Contrast chart of subjective evaluation 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.