Open Access
Issue
Wuhan Univ. J. Nat. Sci.
Volume 30, Number 4, August 2025
Page(s) 334 - 342
DOI https://doi.org/10.1051/wujns/2025304334
Published online 12 September 2025

© Wuhan University 2025

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

With the continuous increase in global energy demand, natural gas transportation networks are playing an increasingly important role in energy transportation. As critical infrastructure, the safety of gas pipelines directly impacts the stability of energy supply. Although underground gas pipelines, being buried beneath the surface, avoid many direct threats from surface factors, they still face a complex array of external intrusion risks, such as illegal excavation, mechanical vibrations, natural disasters, and more. These threats can lead to pipeline ruptures and leaks, potentially causing environmental pollution and significant economic losses. Therefore, comprehensive real-time monitoring and early warning of gas pipelines have become necessary measures to ensure their safe operation.

Distributed Acoustic Sensing (DAS) has emerged as a rapidly developing monitoring technology in recent years. Its ability to cover large areas and to perceive subtle vibrations around pipelines in real-time has led to its widespread application in various intrusion detection scenarios, such as perimeter security[1-2], pipeline monitoring[3-6], ground monitoring[7-8], cable fault detection[9], gas leakage detection[10-11], landslide early warning[12], and fire early warning[13]. However, accurately identifying the vibration signals caused by different types of intrusion events remains a challenge in current technological development.

Wiesmeyr et al[14] used a Support Vector Machine (SVM) classifier to monitor a 40 km long fiber using FFT (Fast Fourier Transform) features to obtain the activity location of trains, achieving an accuracy of over 98%. Bublin et al[15] reported an MLP-PLUS feature extraction method with an accuracy of 99.88% by converting one-dimensional time series signals into two-dimensional images, though it required 80% of the dataset for training. Ge et al[16] also employed a 2D-CNN network for feature extraction, achieving an accuracy of 98.3% while training with 90% of the dataset. Aktas et al[17] collected data from events like walking, digging, shoveling, plowing, and facility noise using deep CNNs in 40 km of fiber buried at a depth of 1 m, achieving an accuracy of 93%. Chen et al[18] argued that using global features overlooks the local details of disturbance signals; they enhanced the features using spectral subtraction, combined with LSTM, and trained on 75% of the dataset, achieving an accuracy of 94.3% in classifying five types of events: walking, digging, vehicle passage, climbing, and heavy rain.

While existing deep learning-based event recognition methods can achieve high accuracy in data-rich scenarios, they are heavily reliant on large-scale datasets, making data collection time-consuming and labor-intensive. Additionally, these methods often perform poorly when faced with low-frequency intrusion events. The identification of rare events has become one of the key challenges that need to be addressed in pipeline monitoring.

To solve the issues of data dependency and insufficient identification capability for rare events, this paper proposes a small-sample learning model based on triplet learning, aimed at enhancing the performance of the distributed optical fiber sensing system in identifying pipeline intrusion events. Triplet learning is a deep learning method focused on distance measurement, capable of achieving classification tasks by learning the similarities between samples even with limited sample sizes. Compared with traditional methods, triplet learning not only reduces reliance on large-scale labeled datasets but also better handles the classification of low-frequency event samples, thereby improving the overall recognition performance of the system.

This paper employs a 6-way 20-shot support set configuration and evaluates the model using the K-means clustering algorithm. Experimental results indicate that the model achieves an average accuracy of 91.6%, validating its superior performance in identifying rare events. This research not only provides new insights for event detection in distributed optical fiber sensing systems but also brings new possibilities for classifying intrusion events in low-sample environments.

1 Triplet-Based Small Sample Learning Framework for Vibration Intrusion Identification

1.1 Distributed Optical Fiber Sensing System

The Distributed Vibration Sensing (DVS) system based on φ-OTDR technology is used to monitor ground vibrations around gas pipelines, as shown in Fig. 1. The system employs a narrow-linewidth continuous laser, which is first pulse-modulated by an acousto-optic modulator (AOM), with the modulation process controlled by a signal from a laser driver. The generated laser pulses are then amplified by an Erbium-Doped Fiber Amplifier (EDFA) and filtered to remove noise, before being injected into the sensing fiber. When the sensing fiber detects external vibrations, part of the backscattered light, based on the principle of Rayleigh scattering, reflects back towards the system. This scattered light is then amplified again by the EDFA and finally received by a photodetector, converting it into an electrical signal for event detection and analysis.

thumbnail Fig.1 Structure of distributed optical fiber vibration sensing system (DVS)

The intensity of the Rayleigh backscattered light at a distance from the disturbance source in the sensing fiber can be expressed as the vector sum of the amplitudes and phases of the scatterers within a region, as follows:

E b ( L i ) = E 0   e - α ( i - 1 ) Δ   L k = 1 M α k i e   j ϕ k i , (1)

where E0 represents the intensity of the incident light, α is the attenuation coefficient of the sensing fiber, and ΔL is the length of Li. Additionally, αki and ϕki represent the amplitude and phase of the k-th scatterer in the i-th scattering region, respectively.

1.2 Triplet Input-Based Small Sample Learning Method

For a closed-set multi-class problem, given a one-dimensional vibration signal dataset D={(x1,y1),, (xn,

y n ) } containing K known class samples, where sample xiRD  represents a vibration signal sample, its size depends on the sampling rate and sampling time of the modulus conversion. Additionally, yi {1,2,,K} represents the sample label. The proposed small sample learning model can be defined as a mapping function f(x,p,n,θ), where θ represents the learning parameters of the deep learning model, x is the sampled vibration signal (the anchor signal), p is the positive sample signal (same class as the anchor signal), and n is the negative sample signal (different class from the anchor signal). The three signals constitute a triplet that is input into the feature extraction network, forming multiple sets of different triplet signals through random sampling, as shown in Fig. 2. During the training process, the distances for different triplets are computed, and through distance loss, the anchor is brought closer to the positive sample and further away from the negative sample.

thumbnail Fig. 2 Optimization of the distribution of triples in space

1.2.1 Small sample model learning framework

Few-Shot Learning (FSL) aims to achieve effective learning and generalization even with very few training samples, allowing for high classification or recognition accuracy despite insufficient training data. The model learning framework is shown in Fig. 3. Our sample library contains 20 samples per class for model learning. The model training consists mainly of two parts: preprocessing and feature extraction network. The preprocessing step converts the signal into a two-dimensional matrix similar to a grayscale image to aid subsequent feature representation. The feature extraction network is composed of convolutional blocks, residual blocks, average pooling layers, and flattening layers, ultimately yielding a feature vector of length 512 for distinguishing different sample features.

thumbnail Fig. 3 Triplet-based small sample model learning framework

Each time a certain batch of triplets is trained, the model learns without needing to extract sample labels. Instead, it continuously optimizes the distance loss between the anchor, positive, and negative samples, making the features of same-class samples more similar while increasing the differences for different-class samples.

We utilized a composite ResNet structure with 2D convolutions, repeating the convolutional and residual blocks five times, significantly increasing the network's depth and demonstrating excellent feature extraction and classification capabilities. The specific structure of the convolutional and residual blocks is shown in Fig. 4. The convolutional block contains a 2D convolution layer, batch normalization layer, average pooling layer, and ReLU activation function, primarily responsible for increasing the number of channels to extract more features. The residual block consists of two sets of 2D convolution layers, batch normalization layers, ReLU activation functions, and residual connections. The residual block maintains the channel count, allowing for direct addition of inputs and outputs, thereby avoiding additional complexity, reducing computational and storage burdens, and aiding the smooth propagation of gradients, thus enhancing network stability and operational efficiency. The role of the convolutional block is to expand feature dimensions and extract richer feature information. The combined use of convolutional and residual blocks balances the model's expressive ability, computational costs, and training stability, effectively improving generalization capabilities and reducing the likelihood of overfitting.

thumbnail Fig. 4 Residual block and convolution block structure

We used a public dataset[19], where each event signal consists of 12 adjacent spatial points, with a length of 10 000, representing a two-dimensional data signal. The input shape is batch×1×10 000×12. During feature extraction, operations such as average pooling and flattening yield a feature vector of length 512, with triplet loss used to measure the distances between anchor signals, positive samples, and negative samples. The specific parameters are shown in Table 1. The final fully connected layer outputs the two-dimensional distribution of samples, reducing the length-512 feature vector to two dimensions for easier observation of the model's training performance.

Given a set of samples [x,p,n] as input, the model's output can be expressed as:

( f x ,   f p ,   f n ) = f ( x , p , n , θ )   (2)

where x, p, and n are the anchor, positive, and negative sample signals from the sample library, respectively. The mapping function f() converts the input into a set of feature values representing their differences in the feature space.

Table 1

Network architectures

1.2.2 Triplet loss

In small sample learning, we utilize the triplet loss function to learn an embedding that can effectively distinguish different class samples in the feature space. By comparing the distances between samples, the loss function enhances inter-class separation and intra-class compactness, making it easier for the model to effectively learn from a small amount of data without the need for sample labeling or data labels.

The objective of the triplet loss is to learn a distance metric, as shown in Fig. 2, such that in the feature space, the distance between the anchor sample and the positive sample is minimized while maximizing the distance between the anchor sample and the negative sample. The triplet loss is computed as follows:

L o s s = m a x   ( 0 ,   d ( x , p ) - d ( x , n ) + α ) (3)

where d(x,p) is the distance between the anchor sample and the positive sample, d(x,n) is the distance between the anchor sample and the negative sample, and α is a predefined margin to ensure that the distance between the negative sample and the anchor sample is greater than that of the positive sample by a certain margin. In subsequent experiments, we set this value to 3. The distance calculations employ the Euclidean distance metric, expressed as:

d ( x , p )   = i = 1 512 ( f x [ i ] - f p [ i ] ) 2   , (4)

d ( x , n ) = i = 1 512 ( f x [ i ] - f n [ i ] ) 2   , (5)

where fx, fp, fn are the feature vectors of the three samples obtained from the feature extraction network, as performed by equation (2).

1.2.3 KNN clustering

During the model training process, the distribution of the support set (training set) in the feature space is continually updated, as shown in Fig. 2. Each update minimizes the distances between same-class samples while maximizing distances between different-class samples.

In small sample learning scenarios, with limited data, KNN does not rely on large-scale datasets to construct models. Instead, it performs clustering and classification using a small number of samples. Since the feature extraction network, trained with triplet loss, has already refined the feature representation of the data, we directly apply the KNN clustering algorithm to classify these high-quality feature representations. The samples in the feature space exhibit small intra-class distances and large inter-class distances, enabling KNN to efficiently differentiate between different class samples.

As illustrated in Fig. 5, with a total of 120 samples in the support set, we set K to 10 in subsequent experiments. For each test sample to be classified, we calculate the Euclidean distances between all samples in the support set and identify the 10 closest samples. By applying weighted averaging to regressively classify the test sample, the category with the most occurrences among the nearest samples is designated as the category for the test sample.

thumbnail Fig. 5 Principle of classification

Thus, the feature space optimized through triplet loss is typically high-dimensional. In such cases, the performance of KNN is more stable, as the distance information in high-dimensional space enables finer neighbor selection, improving clustering effectiveness.

2 Experimental Results and Analysis

2.1 Dataset Construction and Signal Processing

In this experiment, we used a public dataset[19], which was collected using a Distributed Acoustic Sensing (DAS) system and includes six typical events: background noise, digging, tapping, dripping, vibration, and walking, labeled as 0 through 5. The experiments were conducted using 5 km and 10 km sensing fibers across different times, locations, and operators to enhance data robustness. Each event sample consists of a spatiotemporal signal matrix with 10 000 points in the time domain and 12 points in the spatial domain, totaling 15 612 samples. The spatiotemporal distribution patterns of different events are distinct: digging and tapping exhibit sharp peaks, dripping and vibration are continuous signals with vibration showing periodicity, and walking presents multiple peaks reflecting cadence. This diverse and comprehensive dataset is well-suited for event classification and pattern recognition research. For the initial experiment, we randomly selected 500 samples from each category, totaling 3 000 samples, as shown in Table 2.

We constructed a 2D residual network to extract features according to the characteristics of the dataset. Before inputting the data into the training network, preprocessing was conducted. As shown in Fig. 6, normalization was applied using the Max-Min operation, followed by scaling the values by a factor of 255 to create a representation similar to a grayscale image. For the convenience of display, the display width of each sample is 12 pixels. This scale transformation further enhanced the expression of vibration signal features, improving the model's performance.

thumbnail Fig. 6 Grayscale representation

Table 2

Quantity distribution of vibration datasets

2.2 Classification Performance Testing

The goal of this experiment was to determine whether the proposed twin-network model met expectations for classifying input signals with no other abnormal inputs, and to compare it with other classic convolutional neural network-based deep learning models.

We randomly selected 20 samples from each category as the support set, with the remaining data used as the test set. During the model training phase, the batch size was set to 16, with an initial learning rate of 0.002 5. Instead of a fixed learning rate, we applied a cosine annealing algorithm to gradually reduce the learning rate, which decreased to zero after 300 training epochs, ensuring stable convergence of the model.

To accurately reflect the model's performance, we used accuracy, precision, recall, and F1 score as evaluation metrics, which can be calculated as follows:

A c c u r a c y = T P + T N T P + F P + F N + T N (6)

P r i c i s i o n = T P T P + F P (7)

R e c a l l = T P T P + F N (8)

F 1   S c o r e   = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n   +   R e c a l l (9)

where TP represents the number of true positive predictions, FP represents the number of false positive predictions, TN represents the number of true negative predictions, and FN represents the number of false negative predictions.

Our experimental results, shown in Fig. 7, depict the accuracy and loss curves, respectively. After approximately 50 training epochs, the accuracy reached 0.86, stabilizing at around 0.92 after 250 epochs, as shown in Fig. 7(a). In Fig. 7(b), the loss decreases quickly at first, then flattens and stabilizes around 10 after 250 epochs. Figure 8 shows the confusion matrix for the six categories, with tapping achieving a recall rate as high as 0.99. Figure 9 presents the spatial distribution of each category of test samples after training, while Table 3 displays the precision, recall, and F1 scores for each category, and the average accuary achieves 91.6%.

thumbnail Fig.7 (a) The change trend of accuracy rate with epochs; (b) The change trend of loss with epochs

thumbnail Fig.8 Confusion matrix

thumbnail Fig.9 The distribution of samples after model training in special space

Table 3

Closed set performance evaluation of multiple indicators for different categories

2.3 Discussion

Despite its promising performance, the proposed triplet-based small-sample learning model has several limitations. First, the computational complexity of the triplet loss function increases significantly with the dataset size, as forming effective triplets requires careful sampling and may involve a substantial computational overhead during training. Second, the model's scalability to more complex datasets with higher class diversity or noisy data remains a challenge, as the limited support set may not capture the full variability of such datasets. Additionally, the model's performance is highly sensitive to parameter tuning, particularly the choice of the margin in the triplet loss and the learning rate schedule, which can heavily influence convergence and generalization. Addressing these limitations would require more robust optimization techniques, advanced sampling strategies, or adaptive parameter adjustment mechanisms.

The practical implementation of the proposed model in a real pipeline monitoring system faces several challenges. First, the variability and complexity of real-world vibration signals, including noise from environmental factors like weather or nearby industrial activities, may affect model accuracy. Second, the availability of high-quality labeled data for rare events remains a limitation, as the model's performance heavily relies on effective support sets. Additionally, integrating the model into existing Distributed Vibration Sensing (DVS) systems requires addressing computational efficiency, ensuring that real-time data processing and event classification can meet operational demands. Furthermore, optimizing the model for long-term stability under diverse conditions, such as sensor aging and infrastructure changes, is critical. Lastly, deploying such systems at scale involves significant logistical and financial considerations, including the cost of hardware, maintenance, and the development of tailored calibration protocols for different pipeline environments.

3 Conclusion

This paper proposes a 2-D ResNet-based feature extraction network, designing a small-sample learning model for DVS signal classification. Using a support set of 20 samples per class and a public dataset, experimental results demonstrated that this model performs exceptionally well even with a significant reduction in dataset size. The findings indicate that the model not only shows remarkable accuracy under small-sample conditions but also excels in training time and computational efficiency. Additionally, it has proven its strong capability for capturing spatiotemporal features when dealing with highly dynamic and sparse DVS signals. This establishes a foundation for future applications and research based on DVS signals, particularly in optimizing models for small-sample learning. The proposed model enhances the reliability and robustness of detection in practical applications, making it significant for preventing destructive construction events around gas pipelines.

References

  1. Li S C, Liu K, Jiang J F, et al. A denoising and positioning method of long-distance fiber optic perimeter security system based on φ-OTDR[C]//2021 International Conference on Optical Instruments and Technology: Optical Sensors and Applications. Bellingham: SPIE, 2022: 122790S-7. [Google Scholar]
  2. Pan C, Zhu H, Yu B, et al. Distributed optical-fiber vibration sensing system based on differential detection of differential coherent-OTDR[C]//SENSORS, 2012 IEEE. New York: IEEE, 2012: 1-3. [Google Scholar]
  3. Wu H, Sun Z, Qian Y, et al. A hydrostatic leak test for water pipeline by using distributed optical fiber vibration sensing system[C]//Fifth Asia-Pacific Optical Sensors Conference. Bellingham: SPIE, 2015: 568-571. [Google Scholar]
  4. Zuo J C, Zhang Y, Xu H X, et al. Pipeline leak detection technology based on distributed optical fiber acoustic sensing system[J]. IEEE Access, 2020, 8: 30789-30796. [Google Scholar]
  5. Luo L J, Wang W D, Yu H W, et al. Abnormal event monitoring of underground pipelines using a distributed fiber-optic vibration sensing system[J]. Measurement, 2023, 221: 113488. [Google Scholar]
  6. Zhou Z X, Jiao W Y, Hu X, et al. Open-set event recognition model using 1-D RL-CNN with OpenMax algorithm for distributed optical fiber vibration sensing system[J]. IEEE Sensors Journal, 2023, 23(12): 12817-12827. [Google Scholar]
  7. Merlo S, Malcovati P, Norgia M, et al. Runways ground monitoring system by phase-sensitive optical-fiber OTDR[C]//2017 IEEE International Workshop on Metrology for AeroSpace (MetroAeroSpace). New York: IEEE, 2017: 523-529. [Google Scholar]
  8. He M, Feng L, Fan J M. A method for real-time monitoring of running trains using φ-OTDR and the improved Canny[J]. Optik, 2019, 184: 356-363. [Google Scholar]
  9. Pan W X, Zhao K, Xie C, et al. Distributed online monitoring method and application of cable partial discharge based on φ-OTDR[J]. IEEE Access, 2019, 7: 144444-144450. [Google Scholar]
  10. Zhang J, Lian Z H, Zhou Z M, et al. Numerical and experimental study on leakage detection for buried gas pipelines based on distributed optical fiber acoustic wave[J]. Measurement Science and Technology, 2021, 32: 125209. [Google Scholar]
  11. Zhan Y G, Liu L R, Wang Z Y, et al. Pipeline leakage identification method based on DPR-net and distributed optical fiber acoustic sensing technology[J]. Optics Communications, 2025, 574: 131096. [Google Scholar]
  12. Johnson M A M, Phang S K, Wong W, et al. Distributed fiber optic sensing landslide monitoring - A comparative review[J]. Journal of Engineering Science and Technology, 2023, 18(1): 406-423. [Google Scholar]
  13. Jiang H, Wang C Y, Zhao Y H, et al. A fast wavelength detection method based on OTDR and 1-DDCNN in series overlapping spectra[J]. Optical Fiber Technology, 2023, 80: 103458. [Google Scholar]
  14. Wiesmeyr C, Litzenberger M, Waser M, et al. Real-time train tracking from distributed acoustic sensing data[J]. Applied Sciences, 2020, 10(2): 448. [Google Scholar]
  15. Bublin M. Event detection for distributed acoustic sensing: Combining knowledge-based, classical machine learning, and deep learning approaches[J]. Sensors, 2021, 21(22): 7527. [Google Scholar]
  16. Ge Z, Wu H, Zhao C, et al. High-accuracy event classification of distributed optical fiber vibration sensing based on time-space analysis[J]. Sensors, 2022, 22(5): 2053. [Google Scholar]
  17. Aktas M, Akgun T, Demircin M U, et al. Deep learning based multi-threat classification for phase-OTDR fiber optic distributed acoustic sensing applications[C]//Fiber Optic Sensors and Applications XIV. Bellingham: SPIE, 2017: 102080G. [Google Scholar]
  18. Chen X, Xu C J. Disturbance pattern recognition based on an ALSTM in a long-distance φ-OTDR sensing system[J]. Microwave and Optical Technology Letters, 2020, 62(1): 168-175. [Google Scholar]
  19. Cao X M, Su Y S, Jin Z Y, et al. An open dataset of φ-OTDR events with two classification models as baselines[J]. Results in Optics, 2023, 10: 100372. [Google Scholar]

All Tables

Table 1

Network architectures

Table 2

Quantity distribution of vibration datasets

Table 3

Closed set performance evaluation of multiple indicators for different categories

All Figures

thumbnail Fig.1 Structure of distributed optical fiber vibration sensing system (DVS)
In the text
thumbnail Fig. 2 Optimization of the distribution of triples in space
In the text
thumbnail Fig. 3 Triplet-based small sample model learning framework
In the text
thumbnail Fig. 4 Residual block and convolution block structure
In the text
thumbnail Fig. 5 Principle of classification
In the text
thumbnail Fig. 6 Grayscale representation
In the text
thumbnail Fig.7 (a) The change trend of accuracy rate with epochs; (b) The change trend of loss with epochs
In the text
thumbnail Fig.8 Confusion matrix
In the text
thumbnail Fig.9 The distribution of samples after model training in special space
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.