| Issue |
Wuhan Univ. J. Nat. Sci.
Volume 30, Number 6, December 2025
|
|
|---|---|---|
| Page(s) | 576 - 588 | |
| DOI | https://doi.org/10.1051/wujns/2025306576 | |
| Published online | 09 January 2026 | |
Computer Science
CLC number: TP311
Program Derivation and Mechanized Verification of Edit Distance Algorithm
编辑距离算法的程序推导与机械化验证
1 School of Artificial Intelligence, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
2 National-Level International S&T Cooperation Base of Networked Supporting Software, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
3 School of Information Engineering, Jiangxi University of Technology, Nanchang 330098, Jiangxi, China
† Corresponding author. E-mail: kerrykaren@126.com
Received:
28
May
2025
Edit distance is an algorithm to measure the difference between two strings, usually represented as the minimum number of editing operations required to transform one string into another. The edit distance algorithm involves complex dependencies and constraints, making state management and verification work tedious. This paper proposes a derivation and verification method that avoids directly handling dependencies and constraints by proving the equivalence between the edit distance algorithm and existing functional modeling. First, the derivation process of edit distance algorithm mainly includes 1) describing problem specifications, 2) inductively deducing recursive relations, 3) formally constructing loop invariants using the optimization theory (memorization technology and optimal decision table) and properties (optimal substructure property and subproblems overlapping property) of the edit distance algorithm, 4) generating the Minimalistic Imperative Programming Language (IMP) code based on the recursive relations. Second, the problem specification, loop invariants, and generated IMP code are input into Verification Condition Generator (VCG), which automatically generate five verification conditions, and then the correctness of edit distance algorithm is verified in the Isabelle/HOL theorem prover. The method utilizes formal technologies and theorem prover to complete the derivation and verification of the edit distance algorithm, and it can be applied to linear and nonlinear dynamic programming problems.
摘要
编辑距离是一种用于衡量两个字符串差异的算法,通常表示为将一个字符串变换为另一个字符串所需的最少编辑操作数。该算法涉及复杂的依赖关系与约束,导致状态管理和正确性验证过程繁琐。本文提出了一种程序推导与验证方法,通过与已有函数建模方法的等价性证明,避免了直接处理复杂依赖和约束的问题。首先,编辑距离算法的推导过程主要包括:(1)描述问题规约;(2)递归关系的归纳推导;(3)基于优化理论(记忆化技术与最优决策表)及编辑距离算法性质(最优子结构性质与子问题重叠性质)形式化构造循环不变量;(4)根据递归关系生成IMP(极简命令式编程语言)代码。其次,将问题规约、循环不变量以及生成的IMP代码输入到VCG(验证条件生成器)中,自动生成5个验证条件,并在Isabelle/HOL定理证明器中完成编辑距离算法的正确性验证。该方法综合运用了形式化技术与定理证明工具,实现了编辑距离算法的程序推导与验证工作,具备良好的可迁移性,可推广应用于线性和非线性的动态规划类问题。
Key words: Isabelle/HOL / mechanized verification / edit distance / Verification Condition Generator (VCG)
关键字 : Isabelle/HOL / 机械化验证 / 编辑距离 / 验证条件生成器(VCG)
Cite this article: YOU Zhen, ZHANG Chen, SUN Huan, et al. Program Derivation and Mechanized Verification of Edit Distance Algorithm[J]. Wuhan Univ J of Nat Sci, 2025, 30(6): 576-588.
Biography: YOU Zhen, female, Ph. D., Associate professor, Senior member of CCF, research direction: software formal method, concurrent computing and distributed virtual reality. E-mail: youzhen@jxnu.edu.cn
Foundation item: Supported by the National Natural Science Foundation of China (62462036, 62462037), Key Project of Jiangxi Provincial Natural Science Foundation (20242BAB26017), Academic and Major Disciplines in Jiangxi Province Technical Leader Training Project (20232BCJ22013)
© Wuhan University 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
0 Introduction
The edit distance algorithm, utilized for measuring differences between two strings, has extensive application in fields like natural language processing, spelling correction, and gene sequence comparison. The algorithm involves complex dependencies and constraints, making state management and verification challenging.
The mathematical foundation of formal methods is based on a triad of formal logic systems encompassing formal languages, semantics, and proof reasoning[1]. It is a technology for describing, developing, and verifying computer hardware and software. By employing formal methods, the correctness of the edit distance algorithm can be effectively ensured. There has been much related research work emerging. Refs. [2-3] provided formal derivations of implementation methods and recursive relationships for edit distance algorithms but lacked mechanical verification of the derivation results. Refs. [4-6] designed functional edit distance algorithms and verify the correctness of the algorithms in Isabelle/HOL.
However, the imperative edit distance algorithm's reliance on program state leads to significant side effects, making the correctness proof particularly challenging. Imperative programs are more widely used than functional ones, making the verification of imperative edit distance algorithms increasingly urgent and important. To verify the imperative edit distance algorithm, based on Wimmer et al's study of dynamic programming-like algorithm functional modeling and verification in Isabelle/HOL[5], this paper proves the equivalence between imperative and functional programs, addressing the complex dependencies and constraints in proving the correctness of imperative algorithms and further providing derivation and verification of the imperative edit distance algorithm program.
This paper presents the derivation and mechanized verification of the edit distance algorithm program. The derivation employs the memorization property of dy-namic programming to describe problem specifications, inductively inferring recursive relations from the opti-mal substructure and overlapping subproblems proper-ties, constructing loop invariants for the edit distance algorithm, and generating corresponding Minimalistic Imperative Programming Language (IMP) code. These specifications, IMP code[7], and loop invariants[8] are input into Verification Condition Generator (VCG)[7] to automatically generate verification conditions, which are then verified using Isabelle/HOL, ensuring the reliability of the proofs.
1 Related Work
The edit distance algorithm is used to measure the similarity between two strings, namely, the number of steps required to transform one string into another through the least number of insertions, deletions, and substitutions. In recent years, research on the edit distance algorithm has mainly included two aspects: derivation and verification.
Non-formal derivation of the edit distance algorithm: Ref. [9] provided a framework capable of approximating the edit distance between two strings within a constant factor. Ref. [10] proposed a stochastic model, where a seed string s is first chosen, and then string A is constructed by randomly permuting s uniformly. Ref. [11] introduced Quantum Random Access Memory (QRAM) into the computational model, proposing a word QRAM model. Ref. [12] introduced a series of efficient parallel output-sensitive edit distance algorithms, covering three breadth-first search (BFS)-based methods and a Divide-and-Conquer (DaC) strategy. These literatures focus on optimizing the edit distance algorithm to improve its time efficiency and mainly rely on testing to ensure accuracy. However, testing can only uncover a portion of errors and cannot completely cover all potential cases. To ensure the algorithm's correctness, more rigorous and comprehensive formal methods are needed.
Formal derivation of the edit distance algorithm: Ref. [2] utilized mathematical symbols to formally derive the edit distance algorithm and its recursive relations, demonstrating how to establish state transition equations based on the characteristics and requirements of the problem, ultimately obtaining the optimal solution to the problem. Ref. [3] conducted formal derivations for both top-down and bottom-up dynamic programming algorithms, which are applicable to the edit distance algorithm. Specifically, the top-down approach combines the ideas of divide and conquer and memorization, optimizing algorithm performance by storing solutions to subproblems. On the other hand, in the bottom-up approach, the time and space overhead is reduced because subproblems are no longer repeatedly computed and stored. These literatures discuss the formal derivation of the edit distance algorithm's recursive relations and implementation methods but do not involve mechanized verification of algorithm correctness.
Verification of edit distance algorithm: Ref. [6] made an improvement to a minor error in the iteration of boundary variables within the file comparison algorithm fcomp, which essentially fills and searches an abstract edit distance matrix, and verifies the termination and correctness of the corrected fcomp algorithm in Isabelle/HOL. Wimmer et al[4-5] introduced the verification of the correctness of dynamic programming-like algorithm results, including the correctness of the edit distance algorithm, as optimal solutions using the Isabelle/HOL theorem prover. They designed recursive functions with automatic memorization, providing a framework for automatic verification of correctness, with specific implementations. These literatures primarily focus on using formal methods to verify functional edit distance algorithms. However, imperative programs have a more complex logical structure compared to functional programs and depend on program state, leading to severe side effects, making the verification of imperative edit distance algorithm programs more challenging.
This paper proposes a method for the derivation and mechanized verification of imperative edit distance algorithm programs based on the functional modeling and verification of the edit distance algorithm implemented by Wimmer et al[4-5]. By proving the equivalence of the imperative edit distance algorithm program to its functional model, it avoids dealing with complex dependencies and constraints. The verification method employs a VCG to automatically generate verification conditions, and the verification process uses the Isabelle/HOL theorem prover for mechanized verification of these conditions. A concise comparison of edit distance algorithms is shown in Table 1.
Analysis and comparison of related works
2 Derivation of the Edit Distance Algorithm
2.1 Edit Distance Algorithm
Edit distance refers to the minimum number of replacement, insertion, and deletion operations required to transform one string, A, into another, B, given the two strings. For example, consider string A="kitten" and string B="sitting"; their edit distance is 3, because 3 operations are needed to transform A into B: replace "k" with "s", replace "e" with "i", and insert "g" at the end, as shown in Fig. 1.
![]() |
Fig. 1 The process of transforming string A into B |
Dynamic programming is a powerful algorithmic paradigm that breaks down problems into a series of interconnected sub-problems, solves each sub-problem just once, and stores their solutions—using a memory-based data structure (array, map, etc.) for avoiding the repetitive work of solving the same sub-problem multiple times, thus improving the efficiency of algorithms. To derive the edit distance algorithm using the dynamic programming approach, a two-dimensional optimal decision table m is used as the memorization to store the results of sub-problems, eliminating the need for recalculating them and thereby enhancing the algorithm's efficiency. As shown in Fig. 2, when calculating the minimum edit distance between string A="kitten" and string B="sitting", a recursive algorithm is used from the top down to solve for the optimal decision table m, filling in the table from left to right, top to bottom following specific algorithm steps. Here, u represents the length of string A corresponding to the rows of m, v represents the length of string B corresponding to the columns of m, i indicates the stage starting from 0 up to u, and j indicates the state starting from 0 up to v, with i and j eventually reaching the final target problem m[u,v] at stage u and state v, where u = length m
1, v = length (m[i
1])
1[13-14].
![]() |
Fig. 2 The process of calculating the optimal decision table m |
2.2 Problem Specification Description
For the edit distance algorithm, formal methods can describe its program specification <P,Q>, where P denotes the precondition and Q represents the postcondition[15].
The precondition P is a prerequisite or assumption that must be satisfied before the program's execution, ensuring the correct environment or state before operations. Specifically, it includes the requirement that the row count of the decision table m, serving as the algorithm's memorization, must be at least 1.
The postcondition Q for the edit distance algorithm specifies conditions or properties that must be satisfied after program execution, detailing the ensured results or states. It consists of two parts: 1) All subproblems at each stage have been solved, with the outer loop variable i set to the loop's exit value, length m; 2) The final result achieves the optimal solution. The paper focuses on demonstrating the correctness of the imperative edit distance algorithm by proving its equivalence to the functional edit distance algorithm, specifically the min_ed algorithm (Algorithm 1) as mentioned in Ref.[4], which directly proves that the defined function achieves the optimal solution. By showing that the final problem m[u,v] in the optimal decision table m solved by the imperative algorithm is equivalent to its counterpart in the functional definition, it indirectly ensures the correctness of the imperative edit distance algorithm by confirming that the final results meet the optimal solution criteria.
As shown in Algorithm 1, xs and ys are recursive decomposition variables for the two input lists, each representing the "remaining part" of a list. They can be regarded as two lists for which the edit distance is to be calculated and are used to implement the recursive logic of the edit distance algorithm.
In summary, using a two-dimensional optimal decision table med as a memorization technique for the edit distance algorithm, the program specification for the edit distance algorithm in Isabelle/HOL can be described as follows:
Precondition P: length med≥1
Postcondition Q: i = length med ∧med!u!v
=min_ed_dist0 u v A B
| Algorithm 1: Definition of the functional edit distance algorithm |
|---|
| fun min_ed :: "'a list ⇒ 'a list ⇒ nat" where "min_ed [] [] = 0" "min_ed [] (y#ys) = 1 + min_ed [] ys" | "min_ed (x#xs) [] = 1 + min_ed xs []" | "min_ed (x#xs) (y#ys) = Min {1 + min_ed (x#xs) ys, 1 + min_ed xs(y#ys), (if x=y then 0 else 1) + min_ed xs ys}" fun min_ed_dist0 :: "nat ⇒ nat ⇒ char list ⇒ char list ⇒ nat" where"min_ed_dist0 m n xs ys =min_ed (take (m) xs) (take (n) ys)" |
2.3 Inductive Deduction of Recursive Relations
Step 1 Definition of stages and states: The characters of sequence A determine the stage division, where the i-th character corresponds to the stage variable i, and the j-th character of sequence B corresponds to the state variable j.
Step 2 Decision of Determination: This step involves deciding whether the i-th character ai of sequence A is equal to the j-th character bj of sequence B, represented by the stage i' and state j'.
Step 3 Definition of the Allowed Decision Set: The step determines whether the i-th character ai of sequence A matches the j-th character bj of sequence B. When ai and bj coincide, the edit distance between Ai-1 and Bj-1 is regarded as the minimal edit distance for Ai and Bj, setting stage i' to i
1 and state j' to j
1, with the allowed decision set D(i, j) being {(i
1, j
1)}. When ai and bj are not equal, operations such as insertion, deletion, or replacement can be chosen, corresponding to (i, j
1), (i
1, j), and (i
1, j
1), respectively, making the total allowed decision set D(i, j) include {(i
1, j
1), (i, j
1), (i
1, j)}.
Step 4 Definition of Functions:
An auxiliary function X(i,j) calculates the sum of med[i
1, j
1] and 1, med[i, j
1] and 1, and med[i
1, j] and 1 when ai and bj are not equal.
When considering whether ai equals bj, is divided into equal or not equal cases. If ai≠bj, the dependent function t needs to compare the minimum edit distance between Ai and Bj-1 (med[i, j
1]), Ai-1 and Bj (med[i
1, j]), and Ai-1 and Bj-1 (med[i
1, j
1]); if ai=bj, function t directly takes the value of med[i
1, j
1] without judgment.
The edit distance problem seeks the minimum edit distance between sequences A and B, so the optimal function opt is defined as min.
F[i, j] calculates the optimal decision table med[i, j], where med[i, j] represents the optimal solution for the edit distance between sequences Ai and Bj.
The recursive relation for the edit distance problem is discussed based on whether the i-th character of string A equals the j-th character of string B, thus the function Ji for i takes the i-th character of string A (A[i]), the j-th character of string B (B[j]) is taken by the function Jⱼ,and whether A[i] and B[j] are equal is determined by the judgment function Judge.
We have provided the specific example of definition of functions in Table 2 based on the instance of Fig. 1 in Section 2.1.
Auxiliary function X (i, j) is used to calculate the operational cost when characters are unequal. For example, when processing the first character "k" (i=1) of A and the first character "s" (j=1) of B, the two are not equal. At this time, the corresponding calculation for X (1,1) is med[0,0]+1=0+1=1, med[1,0]+1=1+1=2, med [0,0]+1=0+1=1, that is, the result of X(1,1) is 1, 2, 1.
Dependent function t is used to determine the edit distance value for the current position. When dealing with the 4th character "e" (i=4) of A and the 5th character "i" (j=5) of B, they are not equal. At this time, the function of t is to compare the sizes of med[4,4]+1, med [3,5]+1, and med[3,4]+1, and finally take the minimum value as the value of med[4,5]; When dealing with the 6th character "n" (i=6) of A and the 7th character "g" (j=7) of B, if the two subsequently match (assuming the scene), then t directly takes the value of med[5,6] as med[6,7].
Optimal function opt is the operation of taking the minimum value. For example, when calculating med[2, 2], it is necessary to compare the results of med[2,1]+1, med[1,2]+1, and med[1,1]+1. Opt will select the smallest value among them as the value of med[2,2].
Optimal indicator function F[i, j] is med[i, j], for example, F[6,7] represents the minimum editing distance between the first 6 characters "kitten" of A and the first 7 characters "sitting" of B, with a value of 3, corresponding to the 3-step conversion required in Fig.1.
Judge function is used to compare whether characters are equal. For example, when i=3 (the character "t" of A) and j=3 (the character "t" of B), Judge's result is A[3]=B[3]; When i=1 (the character "k" of A) and j=1 (the character "s" of B), Judge's result is A[1] ≠ B[1].
Step 5: The recursive formula for the induced edit distance problem: Based on the induced recursive formula
and instantiating the functions according to Table 2, we can derive the recursive formula for the edit distance problem as follows:
Definition of function instantiation
2.4 Construction of Loop Invariants
In Section 2.2, the preconditions P and postconditions Q for the edit distance algorithm were given. The precondition requires that the length med≥1, and the postcondition is that i=length med ∧ med!u!v= min_ed_dist0 u v A B, focusing next on the construction of the loop invariant for the edit distance. This construction is based on the property of optimal substructure and the implicit postcondition of the loop invariant. Thus, the condition satisfied by the loop invariant is that an optimal solution is obtained for each sub-problem when iterating through the two-dimensional optimal decision table med, i.e., the solution for each sub-problem solved is equal to the result corresponding to the functional solution. Hence, it can be shown that the imperative edit distance algorithm and the functional one are equivalent during the solution process. Eventually, it can be demonstrated that the post-assertion satisfies the final goal of solving the result of the optimal decision table med. The process of constructing the loop invariant for the edit distance algorithm (with i as the outer loop variable and j as the inner loop variable) is described as follows:
Step INV1: Specify the range of loop variables: The update of the two-dimensional optimal decision table med requires an update function, whose correctness depends on the bounds of the loop variables, including both upper and lower limits.
In the edit distance algorithm, i iterates over the rows of med, i.e., i<length med; j iterates over the columns of m, i.e., j<length (med!(i
1)). However, upon completion of the loop, the loop variables are evaluated within the loop invariant for judging. Hence, the loop invariant must meet the exit conditions of the loop. In particular, the outer loop invariant i should cover i=length med, while the inner loop invariant j should cover j=length (med!(i
1)). Furthermore, when j completes the inner loop, it will be judged within the outer loop invariant, so the outer loop invariant must also encompass j=length (med!(i
1)). The lower limits of the loop variables are set according to the recursive relation, as the stages and states required on both sides of the equation must be non-negative, thus satisfying i≥1, j ≥1.
Therefore, the scope of the loop variables within the outer loop invariant of the edit distance algorithm is: i≥1∧i≤length med∧ j≤length (med!(i
1))∧ j≥1; the range of loop variables in the inner loop invariant of the edit distance algorithm is: i≥1∧i<length med∧ j≤length (med!(i
1)) ∧ j≥1.
Step INV2: Determining the Variation Rule for Problem Solution: Based on the optimal substructure property of dynamic programming, when iteratively solving each sub-problem of the two-dimensional optimal decision table med, the obtained results should satisfy equivalence with the results obtained from functional solving. When executing an outer loop iteration up to the i-th row of the two-dimensional optimal decision table m in the edit distance algorithm, the entries of all solved rows up to k (where k<i) are expected to equal min_ed_dist0 k j A B; at this point, when performing an inner loop iteration up to the j-th column of the two-dimensional optimal decision table med, the values of already traversed columns up to k in the i-th row (for all k, k<j) should be equal min_ed_dist0 i k A B. Additionally, entering the inner loop, it is assumed that for each k less than i in the outer loop, the corresponding row values represent the optimal solutions already determined. Upon completing an inner loop iteration for every element in the i-th row, when exiting the inner loop, the values of already traversed k rows (∀k. k
i) can be obtained as equal. For solving the special case j=1 in step INV1, which means that, for the original value of the 0th column(∀k. k < length med
m!k!0=0), it is not possible to enter the inner loop for verification(∀k. k< j
med!i!k =min_ed_dist0 i k A B). Therefore, the loop invariant needs to include the data of the 0th column.
In summary, the variation rule for the external loop invariant of the edit distance algorithm for solving the problem is (∀k. k < i
(∀j. j<length (med!(k
1))
med!k!j =min_ed_dist0 k j A B) )∧ (∀k. k<length med
med!k!0=0), and the variation rule for the internal loop invariant is (∀k. k < i
(∀j. j<length (med!(k
1))
med!k!j =min_ed_dist0 k j A B) )∧ (∀ k. k < j
med!i!k =min_ed_dist0 i k A B)∧ (∀k. k<length m
med!k!0=0).
Step INV3: Satisfying the properties of the two-dimensional optimal decision Table: The edit distance algorithm's outcomes are recorded in the two-dimensional table med. Therefore, the loop invariant should consider the attributes of med, which mainly comprise its height and width, with the height satisfying length med ≥ 1. The table's width must guarantee that all columns in m have the same length, and the width of m must be at least 1. Formally, this can be expressed as (∀k.k< length med
length(med!k) =length(med!(k
1)) ∧length (med!k)≥1).
Thus, the invariants for the inner and outer loops of the edit distance algorithm conform to the following properties of the two-dimensional optimal decision table:length med≥1∧(∀k .k<length m
length (med!k) = length (med!(k
1)) ∧ length (med!k)≥1).
Step INV4: Constructing the loop invariant for the edit distance algorithm, INV {INV1 ∧ INV2 ∧ INV3}, as shown in Table 3.
Loop invariant of the edit distance algorithm
2.5 Code Generation
Recursive relations only describe the logical relationships between subproblems, and the implementation of IMP programs relies on stable state transfer during the iteration process, it is necessary to transform recursive relations into constraints that can guide iteration through loop invariants. The derived recursive relationship (in Section 2.3) consists of two parts, which are respectively converted into two statements in the IMP code, as shown in Algorithm 2.
(1) When the i-th character of string A matches the j-th character of string B (A [i]=B [j]), the first recursive relation (med[i,j] = med[i-1,j
1]) could be transformed into the sixth line of the IMP code:(IF (A[i]=B[j]) THEN L[i, j]::=L[i
1, j
1].
(2) When the characters are not equal (A [i] ≠ B [j]),the second recursive relation (med[i, j] = min (med[i, j
1])+1, med[i
1, j])+1,med[i
1, j
1])+1)) could be transformed into the seventh line of the IMP code:ELSE L[i,j]::= min(L[i, j
1]+1, L[i
1, j]+1, L[i
1, j
1]+1);;).
Based on the basic structure and syntax of the IMP language provided in Ref. [7], this section designs the IMP code for the edit distance algorithm. The writing of its most important loop part relies on the edit distance recursive relation formula induced in Section 2.3. The IMP code is as follows:
Algorithm 2 remains an abstract program and cannot be directly compiled and run on a computer. It requires a C++ program auto-generation system to automatically convert the IMP abstract program into the corresponding C++ code. Figure 3 illustrates the translation of the edit distance IMP code into C++ code. The generated C++ code executes successfully, as shown in Fig. 4.
![]() |
Fig. 3 Automatically generated C++ code for the edit distance |
![]() |
Fig. 4 C++ code for the edit distance problem |
| Algorithm 2: IMP code |
|---|
| 1. WHILE i≤length A //outer loop 2. DO 3. (j::=1;; 4. WHILE j≤length B //inner loop 5. DO 6. ((IF (A[i]=B[j]) THEN L[i,j]::=L[i 1,j 1];;7. ELSE L[i,j]::=min(L[i,j 1]+1,L[i 1,j]+1,L[i 1,j 1]+1);;);;8. j::=j+1; );; 9. i::=i+1;; );; |
2.6 Analysis of Equivalence
The equivalence between the imperative edit distance algorithm program and its corresponding functional program, as proposed in the article, refers to ensuring that the solutions obtained at the termination (via postconditions) and during the execution process (via loop invariants) of the imperative program are equal to the solutions derived from the corresponding functional program. This equivalence serves as an indirect verifi-cation of the correctness of the imperative edit distance algorithm written in IMP code.
The equivalence of solutions during the execution process is established through the design of loop invariants: Section 2.4 introduces the construction of loop invariants. Due to space limitations, this section briefly outlines the role of the outer loop invariant INV2 in proving the equivalence between the imperative edit distance algorithm and the functional edit distance algorithm. To ensure that the results obtained during the iterative solving of the two-dimensional optimal decision table med are consistent with those derived from functional computation, the outer loop invariant INV2 specifies that when the outer loop iterates to the i-th row of the two-dimensional optimal decision table med, the values of the previously solved k rows must be equal to min_ed_dist0 k j A B.
The equivalence of solutions obtained at the end of execution is established through the design of the postcondition: Section 2.2 introduces the postcondition Q:i=length med∧med!u!v=min_ed_dist0 u v A B. The first part ensures that all subproblems at each stage have been solved, confirming that the imperative edit distance algorithm has reached the correct state. The second part ensures that the solution at the completion of the imperative edit distance algorithm, med!u!v is equal to the solution of the corresponding functional edit distance algorithm min_ed_dist0 u v A B.
3 Mechanized Verification of the Edit Distance Algorithm
Building on the derivation of the edit distance algorithm presented in Section 2, we have determined the associated preconditions, postconditions, IMP code, and loop invariants, and fed them into the VCG, which then automatically produces the conditions required to verify the program's correctness. Then, using the Isabelle/HOL theorem prover, we mechanized the verification of these conditions to ensure the correctness of the edit distance algorithm.
3.1 VCG Code Generation
VCG operates according to Hoare Logic principles, by emulating the backward application of Hoare Logic rules and recording the logical connections between assertions during this procedure, thereby generating the required verification conditions. This section requires preconditions, postconditions, loop invariants, and IMP code known from Section 2. By inputting these three elements into the VCG, it produces the conditions required to validate the correctness of the IMP code automatically. The integration of these three inputs is known as the VCG code[16]. Based on the aforementioned findings, we present the VCG code for the edit distance algorithm, corresponding to Algorithm 3.
3.2 Isabelle/HOL Verification
Using VCG to validate the correctness of the IMP program implementing the double-loop edit distance algorithm, the "apply vcg" command automatically derives the five corresponding verification conditions. Then, the generated verification conditions are passed to the Isabelle/HOL theorem prover for verification [17-18]. For the verification of the imperative algorithm for the edit distance problem, it is necessary to complete the verification of 5 verification conditions to complete the verification of the min_ed theorem described in Section 3.1. The definitions of verification conditions 1-5 are as follows:
VC1: lemma ed_DL1 :"length m ≥1
wp('i:=1;j:=1;m:=[[0,1,2,3,4,5,6],[1,0,0,0,0,0,0], [2,0,0,0,0,0,0], [3,0,0,0,0,0,0], [4,0,0,0,0,0,0], [5,0,0,0,0,0,0], [6,0,0,0,0,0,0], [7,0,0,0,0,0,0]];
A:=[CHR ''#'',CHR ''a'',CHR ''b'',CHR ''c'',CHR ''b'', CHR ''d'',CHR ''a'',CHR ''b''];B:=[CHR ''#'',CHR ''b'', CHR ''d'',CHR ''c'',CHR ''a'',CHR ''b'',CHR''a'']',
INV_DP_E(length m,1,length (m!(i
1)),0, min_ed_dist0, A, B)) "
Verification condition 1 signifies that the precondition P holds at the beginning of the program. Upon executing the initialization program, the resulting state is expected to satisfy the outer loop invariant INV_DP_E(length m,1, length(m!(i
1)), 1, min_ed_dist0, A, B).
VC2:lemma ed_DL2 :" INV_DP_E(length m,1,length (m!(i
1)), 1, min_ed_dist0, A, B)∧i ≤ length m
1
wp('j:=1',INV_DP_I(length m, 1, length (m!(i
1)), 1, min_ed_dist0, A, B))"
Verification condition 2 signifies that, when the outer loop invariant INV_DP_E(length m,1,length (m!(i
1)),1, min_ed_dist0, A, B) is satisfied and the condition for entering the outer loop, i ≤ length m
1, is met, executing the statement j:=1 (the statement before entering the inner loop from the outer loop) satisfies the inner loop invariant INV_DP_I(length m,1,length (m!(i
1)),1,min_ed_dist0, A, B).
VC3:lemma ed_DL3 :"INV_DP_I(length m,1,length (m!(i
1)),1,min_ed_dist0,A,B) ∧ j ≤ length (m!i)
1
wp('IF (A!(i))=(B!j) THEN m:=list_list_update m i j (m!(i
1)!(j
1))
ELSE m:=list_list_update m (i) j Min {(m!(i)!(j
1))+1,(m!(i
1)!j)+1,(m!(i
1)!(j
1))+1} FI;j:=j+1',INV_DP_I(length m, 1, length (m!(i
1)), 1,min_ed_dist0, A, B) )"
Verification condition 3 indicates that meeting the condition to enter the inner loop, j ≤ length (m!i)
1, and executing the body of the inner loop still satisfies the inner loop invariant.
VC4: lemma ed_DL4:"INV_DP_I(length m,1,length (m! (i
1)),1,min_ed_dist0,A,B)∧ ¬ j ≤ length (m!i)
1
wp('i:=i+1',INV_DP_E(length m, 1, length (m! (i
1)), 1, min_ed_dist0, A, B))"
Verification condition 4 signifies that once the inner loop has completed,¬ j ≤ length (m ! i)
1, and the inner loop invariant INV_DP_I(length m,1,length (m!(i
1)),1,min_ed_dist0,A,B) being true, executing the statement i:=i+1 (the statement between the end of the inner loop and returning to the outer loop) still maintains the outer loop invariant as true.
VC5:lemma ed_DL5 :"INV_DP_E(length m,1, length (m!(i
1)), 1, min_ed_dist0,A,B)∧¬ i ≤ length m
1
(m!(i
1)!((length (m!(i
1)))
1)) =min_ed_dist0 (i
1) ((length (m!(i
1)))
1) A B ∧i=length m "
Verification condition 5 means that once the outer loop has concluded, (¬ i ≤ length m
1) and the outer loop invariant INV_DP_E(length m, 1, length (m!(i
1)), 1, min_ed_dist0, A, B) being true at this point, it can be proven that the postcondition Q is true.
| Algorithm 3: VCG code for the edit distance algorithm |
|---|
|
1. lemma min_ed: "VARS m A B i j //Precondition: 2. {length m ≥1} //Initialize the DP table 3. j:=1; i:=1; m:=[[0,1,2,3,4,5,6], [1,0,0,0,0,0,0], [2,0,0,0,0,0,0],[3,0,0,0,0,0,0], [4,0,0,0,0,0,0],[5,0,0,0,0,0,0], [6,0,0,0,0,0,0], [7,0,0,0,0,0,0]]; 4. A:=[CHR ''#'',CHR ''a'',CHR ''b'',CHR ''c'',CHR ''b'',CHR ''d'',CHR ''a'',CHR ''b'']; //Initialize string A 5. B:=[CHR ''#'',CHR ''b'',CHR ''d'',CHR ''c'',CHR ''a'',CHR ''b'',CHR ''a'']; //Initialize string B //Outer loop:Traverse string A 6. WHILE i ≤ length m 17. INV{INV_DP_E(length m,1,length (m!(i 1)),1,min _ed_dist0,A,B)} // External loop invariant8. DO 9. j:=1; //Inner loop:Traverse string B 10. WHILE j ≤ length (m!i) 111. INV{INV_DP_I(length m,1,length(m!(i 1)),1, min_ed_dist0,A,B)} //Inner loop invariant12. DO 13. IF (A!(i))=(B!j) 14. THEN m:=list_list_update m i j (m!(i 1)!(j 1))15. ELSE m:=list_list_update m i j Min{(m!(i)!(j 1))+1,(m!(i 1)!j)+1 ,(m!(i 1)!(j 1))+1}16. FI; 17. j:=j+1 18. OD; 19. i:=i+1 20. OD // Postcondition: 21. {(m!(i 1)!((length (m!(i 1))) 1)) =min_ed_dist0 (i 1) ((length (m!(i 1))) 1) A B ∧ i=length m }"// The result is equal to the value at the bottom-right corner of the DP table |
3.3 Practical Feasibility
To evaluate the practicality of the formal derivation and verification framework, we analyze the implementation metrics across three components: the VCG code lines, the Isabelle/HOL verification code lines, and the value instance, as shown in Fig. 5.
![]() |
Fig. 5 Isabelle/HOL verification code and value instance |
1) VCG code lines: The core VCG code for generating verification conditions from the edit distance algorithm's IMP program have 21 lines as shown in Algorithm 3.
2) Isabelle/HOL verification code lines: The Isabelle/HOL script for mechanized verification (including definitions, lemmas, and proofs) totals 209 lines. This includes:
Three function definitions (e.g., list_list_update for 2D table manipulation, functional specifications of edit_distance).
Five theorems corresponding to five verification conditions generated by VCG.
Ten core lemmas linking recursive functional models.
3) Value instance: To confirm the correctness of our derived algorithm of edit distance, we use Isabelle/HOL's value command to execute the test inputs, as shown in Fig. 5 (Value instance part).
4 Analysis and Application
4.1 Process Framework Diagram
The formal derivation and verification of the Edit Distance algorithm encompasses two core phases: algorithmic derivation and mechanized verification, as shown in Fig. 6.
![]() |
Fig. 6 Process framework diagram |
(1) Algorithmic Derivation
There are four processes from D1 to D4 in the algorithmic derivation phase, as shown in Fig. 6(1).
D1: We initiate with a formal specification of the problem, defining the precondition (P) as the input decision table length being at least 1, and the postcondition (Q) as the equivalence between the algorithm's output upon termination and the result computed by the functional model.
D2: Through inductive derivation of the recursive relationships, we define the problem's stages, states, and allowable decision sets, and derive the recursive formula based on the principle of optimality.
D3: Constructing loop invariants serves as the pivotal step bridging the recursive model and the imperative implementation; we establish both outer loop invariants (INV1-INV4) and inner loop invariants (INV1-INV4) to ensure that the solutions of subproblems remain equivalent to their recursive definitions throughout the loops.
D4: Leveraging these theoretical foundations, we design an IMP code implementation featuring a double-loop structure and conditional logic, and prove through equivalence analysis that its execution process and termination results align with those of the functional model.
(2) Mechanized Verification
There are two processes V1 and V2 in the mechanized verification phase, as shown in Fig. 6(2).
V1: We use the Verification Condition Generator (VCG) to generate verification conditions (VCs). Using the precondition (P), postcondition (Q), loop invariants, and IMP code as inputs, we generate five verification conditions: initialization validity (VC1), loop invariant transitivity (VC2, VC4), loop preservation (VC3), and termination result validity (VC5).
V2: Each of these conditions is rigorously verified using the Isabelle/HOL theorem prover, culminating in the certification of the algorithm's correctness.
4.2 Applications
The formal derivation and mechanized verification method proposed in this paper, initially validated for the edit distance algorithm, exhibits strong generalizability to other dynamic programming problems. This section demonstrates its applicability through case studies on linear and nonlinear data structures, confirming that the method can be systematically extended to diverse optimization problems characterized by optimal substructure and overlapping subproblems.
4.2.1 Application to linear data structures
Linear data structures, where subproblems are organized in sequences or arrays, represent a primary domain for dynamic programming. We have successfully applied the method proposed in this paper to problems related to linear data structures[19], such as 0-1 knapsack problem, longest common subsequence problem and Fibonacci sequence.
The 0/1 knapsack problem is taken as an example for a brief explanation. This problem involves selecting items to maximize total value without exceeding a weight capacity.
1) Problem Specification. Precondition P: length mks ≥ 1 (where mks is the optimal decision table)
Postcondition Q: i = length mks ∧ mks!u!v = knapsack u v w v (equivalence to the functional model knapsack).
2) Recursive Relations. Induced by classifying decisions (including or excluding an item) and defining allowed decision set (D(i, j)={(i
1, j), (i
1, j
w_i)}), with the optimal function opt = max.
3) Loop Invariants. Constructed to ensure subproblem solutions align with the functional model throughout iteration.
Outerloop invariant: INV_DP_E (length m, 1, length(m!(i
1)), 0, knapsack, w, v)
Inner loop invariant: INV_DP_I (length m, 1, length (m!(i
1)), 0, knapsack, w, v).
4) Formal Verification. VCG generated 5 conditions (initialization, loop preservation, etc.), verified in Isabelle/HOL to confirm consistency with the functional model.
4.2.2 Application to nonlinear data structures
In addition to handling linear data structure problems, this method also exhibits broad applicability in dealing with nonlinear data structure issues, such as shortest path, post office and optimal binary search tree problem.
We take the demonstration of the correctness of the Optimal Binary Search Tree (OBST) algorithm, which minimizes average search cost for a set of keys:
1) Problem specification. Precondition P: length mbst ≥ 1 (where mbst stores minimum search costs)
Postcondition Q: r=length (mbst[1]) ∧ mbst!u!v = min_wpl u v W (equivalence to min_wpl function).
2) Recursive relations. Induced by selecting root nodes (k in [i, j]) and splitting into left or right subtrees, with decision set (D (i, j)={(i, k), (k+1, j)}) and opt = min.
3) Loop invariants
Outer loop invariant: INV_DP_E(length m!1,1, length m
r+1, 0, min_wpl, W)
Inner loop invariant: INV_DP_I(length m!1,1, length m
r+1, 0, min_wpl, W)
4) Formal verification. Conditions are generated by VCG, and the corresponding theorems were verified in Isabelle/HOL.
Hence, two kinds of dynamic programming algorithms (linear data structures and nonlinear data structures) follow the same workflow: define specifications, induce recursions, construct invariants, generate code, and verify via VCG/Isabelle/HOL. This confirms our method's potential as a standardized framework for imperative dynamic programming algorithm design and verification.
5 Conclusion
This paper proposes a derivation and mechanized verification method for the edit distance algorithm. By proving the equivalence between the algorithm and a functional model, it avoids the need to handle complex dependencies and constraints. Specifically, the method derives recursive relations based on the properties of dynamic programming, constructs loop invariants, generates IMP code, then uses VCG to produce verification conditions and completes verification with Isabelle/HOL, ensuring the algorithm's correctness. This method can be extended to linear and nonlinear dynamic programming problems.
The study has two limitations: it does not deeply explore efficiency optimization of the algorithm for large-scale data scenarios, and the adaptability of the verification process to complex nested loops needs improvement. Future research will focus on two aspects: first, optimizing the algorithm implementation to adapt to massive data processing and reduce time and space overhead; second, improving the verification framework to enhance support for complex dynamic programming problems (such as multi-dimensional sub-problem dependencies) and further expand the method's scope of application.
References
- Wang J, Zhan N J, Feng X Y, et al. Overview of formal methods[J]. Journal of Software, 2019, 30(1): 33-61(Ch). [Google Scholar]
- Dasgupta S, Papadimitriou C H, Vazirani U V. Algorithms[M]. New York: McGraw-Hill Education, 2006. [Google Scholar]
- Mofarah-Fathi L, Norvell T S. Formal derivation of dynamic programming algorithms[EB/OL].[2007-11-26]. https://www.engr.mun.ca/~theo/Publications/submittedNECEC2007.pdf. [Google Scholar]
- Wimmer S, Hu S W, Nipkow T. Verified memoization and dynamic programming[M]//Interactive Theorem Proving. Cham: Springer-Verlag, 2018. [Google Scholar]
- Wimmer S, Hu S W, Nipkow T. Monadification, memoization and dynamic programming[EB/OL]. [2018-10-25]. https://www.isa-afp.org/browser_info/current/AFP/Monad_Memo_DP/document.pdf. [Google Scholar]
- Song L H, Wang H T, Ji X J, et al. Verification of file comparison algorithm fcomp in Isabelle/HOL[J]. Journal of Software, 2017, 28(2): 203-215(Ch). [Google Scholar]
- Nipkow T, Klein G. Concrete Semantics: With Isabelle/HOL[M]. Cham: Springer-Verlag, 2023. [Google Scholar]
- Si X J, Dai H J, Raghothaman M, et al. Learning loop invariants for program verification[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc, 2018: 7762-7773. [Google Scholar]
- Boroujeni M, Ehsani S, Ghodsi M, et al. Approximating edit distance in truly subquadratic time: Quantum and MapReduce[J]. Journal of the ACM , 2021, 68(3): 1-41. [Google Scholar]
- Boroujeni M, Seddighin M, Seddighin S. Improved algorithms for edit distance and LCS: Beyond worst case[C]//Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2020: 1601-1620. [Google Scholar]
- Equi M, Griend A M, Mäkinen V. Quantum linear algorithm for edit distance using the word QRAM model[EB/OL]. [2021-03-30]. https://arXivpreprintarXiv:2112.13005. [Google Scholar]
- Ding X, Dong X, Gu Y, et al. Efficient parallel output-sensitive edit distance[EB/OL]. [2023-04-21]. https://www.semanticscholar.org/reader/6b2439fa153e2e6feace3d69f1167f8d41172328. [Google Scholar]
- Eddy S R. What is dynamic programming[J]. Nature Biotechnology, 2004, 22(7): 909-910. [CrossRef] [Google Scholar]
- Luus R. Iterative Dynamic Programming[M]. New York: Chapman and Hall/CRC, 2000. [Google Scholar]
- Hoare C A R. An axiomatic basis for computer programming[J]. Communications of the ACM, 1969, 12(10): 576-580. [Google Scholar]
- Zuo Z K, Hu Y, Huang Q, et al. Automatic algorithm programming model based on the improved Morgan's refinement calculus[J]. Wuhan University Journal of Natural Sciences, 2022, 27(5): 405-414. [CrossRef] [EDP Sciences] [Google Scholar]
- Paulson L C, Nipkow T, Wenzel M. From LCF to Isabelle/HOL[J]. Formal Aspects of Computing, 2019, 31(6): 675-698. [Google Scholar]
- Jiang N, Li Q G, Wang L, et al. Overview on mechanized theorem proving[J]. Journal of Software, 2020, 31(1): 82-112(Ch). [Google Scholar]
- Zuo Z K, Sun H, Wang C J, et al. Program derivation and mechanized verification of imperative dynamic programming algorithms[J]. International Journal of Software and Informatics, 2024, 14(4): 419-448. [Google Scholar]
All Tables
All Figures
![]() |
Fig. 1 The process of transforming string A into B |
| In the text | |
![]() |
Fig. 2 The process of calculating the optimal decision table m |
| In the text | |
![]() |
Fig. 3 Automatically generated C++ code for the edit distance |
| In the text | |
![]() |
Fig. 4 C++ code for the edit distance problem |
| In the text | |
![]() |
Fig. 5 Isabelle/HOL verification code and value instance |
| In the text | |
![]() |
Fig. 6 Process framework diagram |
| In the text | |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.






























