Open Access
Issue
Wuhan Univ. J. Nat. Sci.
Volume 27, Number 2, April 2022
Page(s) 128 - 134
DOI https://doi.org/10.1051/wujns/2022272128
Published online 20 May 2022

© Wuhan University 2022

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

High dimensional data can be efficiently collected at a low cost in many scientific areas due to the advance of information technology. In analysis of high dimensional data, estimating the positive-definite covariance matrix, denoted , is perhaps one of the most fundamental problems. It is however not straightforward to estimate precisely in high dimensions. To estimate the positive-definite covariance matrix accurately, an important concept, sparsity, is introduced which assumes many off-diagonal components of are identically zero. The sparsity assumption plays an important role to construct , an intuitively feasible estimate of , and to establish the consistency property of . If is sparse, many interesting approaches, such as banding[1], tapering[2] and thresholding[13] are proposed in the literature to produce sparse estimates of . However, these sparse estimates are not always positive-definite, though the positive-definite property is highly desirable.

The definition of positive-definite was described in Xue et al [4]. When the positive-definite property is concerned in estimating the high dimensional covariance matrix, Friedman et al [5] and Rothman[6] considered the following minimization problem:(1)where τ and λ are two distinctive regularization parameters, τ≥0 is introduced here to ensure the positive-definite of and controls the trade-off between the penalty and the loss function. The following notations will be used:

is the sample covariance matrix, is a random sample of size stands for the determinant of stands for the Frobenius norm of stands for the norm of all off-diagonal elements of . Specifically,

An iterative shooting algorithm is proposed by Friedman et al [5] and Rothman[6] to solve (1). However, whether this shooting algorithm converges or not remains unknown in the literature. To yield a positive-definite estimate of , Xue et al [4] considered a different optimization problem:(2)for a sufficiency small positive constant ε, where I stands for an identity matrix. Solving (2) with the constraint must yield a positive-definite . Xue et al [4] proposed an Alternating Direction Method of Multiplies (ADMM ) to convert the problem of solving (2) into a sequence of simpler optimization subproblems. This method has been cited and studied in many subsequent researches[4].

Although solving a sequence of subproblems may end up taking more iterations than directly solving (2), it runs in less total time because the subproblems are relatively easy to solve in general. In the present context, all the subproblems have explicit analytic forms. These two optimization problems, described in (1) and (2) can be efficiently solved for a single regularization parameter, referring to Kang and Deng[7], Choi et al [8], Zhang et al [9], etc. However, when these optimization problems are solved over the full range of regularization parameters, the computational complexity will be dramatically increased.

Efficient estimation of positive-definite covariance matrix can be implemented by these two optimization problems, described in (1) and (2). But Log structure of (1) is not stable in estimation process. In this paper we suggest to compute the regularization paths of through a slight modification of Xue et al’s ADMM algorithm[4]. To choose an optimal λ which yields the “best” estimate in a certain sense, Xue et al [4] suggested to inspect a sequence of sparse solutions over the full range of regularization parameters:, where M is usually set to 100 heuristically, and λmax is the maximal value of λ such that , the sparsest solution. Conceptually, estimating at a fixed λ consists of two ingredients. The first is to seek for the support of , which is defined as the (i,j) pairs such that . The second step is to solve (2) over the support of . In general, the first step is much more important than the second step for large scale problems because one can always refit to obtain an optimal estimate of the parameter values with a sufficiently correct estimate of the support of . To handle large scale problems, we need algorithms not only to recover the support of but also to estimate the entire sequence of quickly. Following Friedman et al [10], Wu and Lange[11], Hu et al [12], we first introduce a procedure to estimate the regularization paths through an ADMM algorithm with warm starts over a range of regularization parameters to yield a path-like sequence of solutions, then perform a one-step approximation along each point on this path to yield an ADMM algorithmic regularization path. This procedure quickly outlines a sequence of at a fine resolution and, at the same time, reduces computational time dramatically.

The rest of this paper is organized as follows. In Section 1, we first review the ADMM algorithm with warm-starts over a grid of regularization parameters to yield a path-like sequence of solutions, then use the one-step approximation of this path-like sequence of solutions to obtain an algorithmic regularization path. In Section 2, we demonstrate the effectiveness and computational savings of our proposals through simulated examples. We conclude this paper with some brief remarks in Section 3.

1 The Algorithmic Regularization Path

In this section we use Xue et al ’s[4] ADMM splitting method as the foundation to develop a new approximation to the sequence of sparse solutions , which allows us to quickly explore the space of sparse solutions at a fine resolution. And, sacrifice of estimation precision can be ignored as long as the step size, defined as , is sufficiently small. The objective function formulated in (2) by Xue et al [4] consists of a differentiable loss function, , a non-differentiable penalty, , and a constraint which ensures that is positive-definite. In the constraint, is a user-specified parametric which is sufficiently small, say, . The loss function is also convex in . The regularization parameter controls the trade-off between of the loss function and the penalty. Following Xue et al [4], we split the smooth loss function from the nonsmooth penalty through a copy para matrix , and add an equality constraint forcing , which converts (2) into(3)

Similar operator splitting strategies are also used in many other learning problems such as total variation[13], jointly graphical model selection[14,15], a convex formulation of clustering[16], etc. With scaled dual variable of the same dimension as and and an algorithm tuning parameter the associated augmented Lagrangian of (3) is

Then the ADMM algorithm for (2) is partitioned into three subproblems:

1) -subproblem:(4)

2) -subproblem:(5)

3) Dual update

We solve these subproblems, together with the dual update, iteratively until convergence. The benefit of solving these subproblems is that there are closed-form solutions to both (4) and (5). The -subproblem solves a linear regression with an additional quadratic ridge penalty. Specifically,

The -subproblem also has an analytical solution. For notational clarity, we define as the projection of onto the convex cone . Specifically, suppose λj is the j-th eigenvalue of B and μj is the associated eigenvector, namely,

We further define an entry wise soft thresholding rule, denoted , for all the off-diagonal elements of B .The analytical solution of the -subproblem is as follows:Solving the -subproblem introduces sparsity. Finally, the dual update guarantees that is squeezed towards . As the iterations proceed, becomes increasingly sparse. We shall illustrate this phenomenon with simulations in Section 2. One can refer to Fig. 1 (a) and (c) for details. Because the -subproblem controls the sparsity level of , a natural question arises: How to use or modify the iterates of the ADMM algorithm to quickly generate regularization paths for estimating a sparse positive-definite covariance matrix in high or even ultrahigh dimensions?

Following the idea of Hu et al [12], we provide two options to quickly generate a regularization path. The first is to use warm starts in the ADMM algorithm along a grid of regularization parameters, say, . To be precise, the warm starts use the solution from the previous value of , , as the initial value for the ADMM algorithm to solve the optimization problem at . If is sufficiently close to , it is reasonable to expect that is also very close to . Using warm starts usually reduces the number of iterations dramatically.

The ADMM algorithm with warm starts, which is referred to as Algorithm 1 in our context, is described below:

In the above algorithm, the convergence is measured by the primal and dual residuals[17]. The resulting solution goes from dense to sparse, or equivalently, the regularization parameter λ goes from small to large. The ADMM algorithm with warm starts can certainly go in the reverse direction from sparse to dense. However, going from sparse to dense may introduce additional yet unnecessary discontinuities in the -subproblem, consequently would require more iterations for convergence. Therefore, we advocate using the ADMM algorithm with warm starts going from dense to sparse, as described in Algorithm 1 at present.

The second option to speed the computation of regularization paths is to seek for a single path approximating algorithm instead of solving several optimization problems separately over a grid of regularization parameter values. In general, the sparsity levels of typically stabilize to that of the solution within a small number of iterations as λ is increased from λ1 to λM; and the remaining iterations and a large proportion of the computation time are spent on squeezing towards to satisfy the primal feasibility. This motivates us to surmise that, if λj+1 is sufficiently close to λj, then the sparsity patterns given by the -subproblem may correctly approximate the regularization paths within a few or even one iteration when implementing the ADMM algorithm with warm starts. This further motivates us to suggest the following ADMM Algorithmic Regularization Paths to quickly approximate the regularization paths for estimating large scale positive-defi- nite covariance matrices.

This algorithm is described as follows:

In Algorithm 2, the sequence of λj can be linearly spaced. As long as the step size, defined as , is sufficiently small, the sparsity patterns of Algorithm 2 can well approximate the regularization paths of . This algorithm makes full use of three key ingredients to quickly generate a regularization path: the sparsity patterns of the solution to the -subproblem, the ADMM with warm-starts generating a dense to sparse solution and the one-step approximation at each regularization level. We begin with the fully dense ridge solution, employing these key ingredients, to implement the ADMM algorithm which gradually increases the amount of regularization level until we obtain a fully sparse solution. If we iterate the three subproblems until convergence, Algorithm 2 would be approximatively equivalent to Algorithm 1. Therefore, the one-step approximation is the key difference between these two algorithms.

In both of the above algorithms, there is an ADMM tuning parameter μ. We fix μ=1 in our algorithms. Updating μ dynamically may speed up convergence (Boyd et al [17], He et al [18]), however, it is not conducive to achieve a path-like algorithm using warm-starts. In particular, the sparsity levels of dramatically change when μ is changed because the -subproblem was solved by soft-thresholding at the level , which eliminates the benefit of using warm-starts to achieve relatively smooth transitions in sparsity levels.

There are at least three advantages of working with Algorithm 2: it is easy to implement with a few lines of code; it fleetly gives a sequence of sparse solutions as it requires M iterations to fully explore the regularization paths ; and it also allows us to explore the space of sparse models at a very fine resolution.

Algorithm 1: ADMM with warm starts to estimate sparse and large scale positive-definite covariance
1. Initialize and M log-spaced values for and
2. for do
  Initialize and ;
  while do
     
     
     
      and
     
    end while
  end for
3. Output as the regularization path.
Algorithm 2: Algorithmic regularization paths to estimate sparse and large scale positive-definite covariance matrices
1. Initialize set λ to be for and
2. while do
j=j+1
end while
3. Output as the regularization path.

2 Numerical Studies

We illustrate the performance of our proposals through simulations. Write . We set the sample size n=50 and vary the dimension p=100, 200, 500 and 1 000. We consider the following two models which were once used in existing literature:

1) Model (I): .

2) Model (II): We partition the index set, , into non-overlapping subsets of equal size, and denote the index set, . LetSpecifically, Model (I) was used by Cai et al [2] and Model (II) was used by Rothman et al [5].

Figure 1 shows the sparsity levels. In Fig. 1 (a) and (c) we report the sparsity levels, measured by the number of nonzero components of of the ADMM algorithm with warm starts (Algorithm 1) for a fixed when p=1 000. The ADMM algorithm with warm starts spends a large proportion of the computational time on squeezing towards to satisfy the primal feasibility, which does not converge until 500 iterations in Model (I) and 280 iterations in Model (II). In Fig. 1 (b) and (d), we present the ADMM algorithmic regularity paths obtained by Algorithm 2. In both models, the sparsity levels of the ADMM algorithmic regularity paths increase sharply within only a few iterations as the regularity parameter λ increases, and achieve a relatively smooth transition that explores the range of possible sparse estimates at a fine resolution, and requires substantially less computer time for Algorithm 2 to converge. We also remark here that there are some fluctuations in all four plots of Fig. 1, main because we apply the operator to ensure the solution to the -subpr- oblem, denoted , to be positive-definite.

thumbnail Fig.1 Sparsity levels measured by the number of nonzero entries of

Sparsity levels measured by the number of nonzero entries of

(a) and (c) describe the sparsity levels of the -subproblem iterates of the ADMM algorithm, fit for a fixed ; (b) and (d) describe the sparsity levels of the -subproblem over the iterates of our proposed ADMM algorithmic regularization paths. (a) and (b) are for Model (I), and (c) and (d) are for Model (II)

In Figs. 2 and 3, we plot the ADMM regularization paths with cool starts in (a), the ADMM regularization paths with warm starts in (b), the ADMM algorithmic regularization paths with a relatively large step size in (c) and a tiny step size in (d), for Models (I) and (II), respectively. In both figures, the ADMM algorithm with warm starts is almost equivalent to the ADMM algorithm with cool starts because the regularization paths exhibit almost identical patterns. The ADMM algorithmic regularization paths with a tiny step size also well approximate the ADMM regularization paths, however, those with a relatively large step-size yields a sequence of sparse models that distinguishes obviously from the sparsity patterns of ADMM algorithm with warm starts. This is mainly because a large change in regularization levels of each step makes that the sparsity levels of the -subproblem after the one-step approximation are not equivalent to that of the solution to (2).

thumbnail Fig.2 Simulation results for Model (I)

Simulation results for Model (I)

(a) The ADMM regularization paths with cool starts; (b) The ADMM regularization paths with warm starts; (c) The ADMM algorithmic regularization paths with a relatively large step size; and (d) The ADMM algorithmic regularization paths with a tiny step size. The blue lines denote the false positive variables and the red lines denote true positive variables

thumbnail Fig.3 Simulation results for Model (II)

Simulation results for Model (II)

(a) The ADMM regularization paths with cool starts; (b) The ADMM regularization paths with warm starts; (c) The ADMM algorithmic regularization paths with a relatively large step size; and (d) The ADMM algorithmic regularization paths with a tiny step size. The blue lines denote the false positive variables and the red lines denote true positive variables

We further compare our algorithms for computing the regularization paths in terms of computational time based on 100 repetitions and each of size n=50. The simulations are entirely coded in Matlab and carried out on an AMD 3.30 GHz processor. The results are summarized in Table 1. Notice that, our proposed ADMM algorithmic regularization paths are at least two times faster than the ADMM with warm starts, while the latter is far superior to the state-of-art ADMM with cool starts.

Table 1

Timing comparison (averaged over 100 replications) of Algorithm 1 and Algorithm 2 in computing the regularization paths for both models

3 Conclusion

In this paper, we propose two efficient algorithms to quickly derive the regularization paths for estimating the sparse and large scale positive-definite covariance matrices. Instead of solving the optimization problems over a grid of regularization parameters, which is required by the conventional regularization methods, we propose a one-step approximation to the ADMM algorithm with warm starts to quickly approximate the regularization paths. These new algorithms are easy to implement because no iteration is required. These algorithms both estimate the covariance matrices over a lot of regularization parameters, but ADMM algorithmic regularization paths (Algorithm 2) can quickly outline a sequence of sparse solutions at a fine resolution, and ADMM algorithmic regularization paths are at least two times fast than the ADMM with warm starts.

References

  1. Bickel P J, Levina E. Covariance regularization by thresholding [J]. The Annals of Statistics, 2008, 36(6): 2577-2604. [MathSciNet] [Google Scholar]
  2. Cai T T, Zhang C H, Zhou H H, et al. Optimal rates of convergence for covariance matrix estimation [J]. The Annals of Statistics, 2010, 38(4): 2118-2144. [MathSciNet] [Google Scholar]
  3. Rothman A J, Levina E, Zhu J. Generalized thresholding of large covariance matrices [J]. Journal of the American Statistical Association, 2009, 104(485): 177-186. [CrossRef] [MathSciNet] [Google Scholar]
  4. Xue L Z, Ma S Q, Zou H. Positive-definite 1-penalized estimation of large covariance matrices [J]. Journal of the American Statistical Association, 2012, 107(500): 1480-1491. [CrossRef] [MathSciNet] [Google Scholar]
  5. Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso [J]. Biostatistics, 2007, 9(3): 432-441. [Google Scholar]
  6. Rothman A J . Positive definite estimators of large covariance matrices [J]. Biometrika, 2012, 99(3): 733-740. [CrossRef] [MathSciNet] [Google Scholar]
  7. Kang X N, Deng X W. On variable ordination of cholesky-based estimation for a sparse covariance matrix [J]. The Canadian Journal of Statistics, 2021, 49(2): 283-310. [CrossRef] [MathSciNet] [Google Scholar]
  8. Choi Y G, Lim J, Roy A J P. Fixed support positive-definite modification of covariance matrix estimators via linear shrinkage [J]. Journal of Multivariate Analysis, 2019, 171: 234-249. [CrossRef] [MathSciNet] [Google Scholar]
  9. Zhang B, Zhou J, Li J. Improved covariance matrix estimators by multi-penalty regularization [C]// 2019 22th International Conference on Information Fusion (FUSION) . Washington D C: IEEE, 2019: 1-7. [Google Scholar]
  10. Friedman J, Hastie T, Hofling H, et al. Pathwise coordinate optimization [J]. The Annals of Applied Statistics, 2007, 1(2): 302-332. [CrossRef] [MathSciNet] [Google Scholar]
  11. Wu T T, Lange K. Coordinate descent algorithms for lasso penalized regression [J]. The Annals of Applied Statistics, 2008, 2(1): 224-244. [MathSciNet] [Google Scholar]
  12. Hu Y, Chi E C, Allen G I. ADMM algorithmic regularization paths for sparse statistical machine learning [C]// Splitting Methods in Communication, Imaging, Science, and Engineering. Berlin: Springer-Verlag, 2016: 433-459. [Google Scholar]
  13. Wahlberg B, Boyd S, Annergren M, et al. An ADMM algorithm for a class of total variation regularized estimation problems [J]. IFAC Proceedings Volumes, 2012, 45(16): 83-88. [CrossRef] [Google Scholar]
  14. Mohan K, Chung M, Han S, et al. Structured learning of gaussian graphical models [J]. Advances in Neural Information Processing Systems, 2012, 2012:629-637. [PubMed] [Google Scholar]
  15. Mohan K, London P, Fazel M, et al. Node-based learning of multiple Gaussian graphical models [J]. Journal of Machine Learning Research, 2014, 15(1): 445-488. [MathSciNet] [PubMed] [Google Scholar]
  16. Chi E C, Lange K. Splitting methods for convex clustering [J]. Journal of Computational and Graphical Statistics, 2015, 24(4): 994-1013. [CrossRef] [MathSciNet] [PubMed] [Google Scholar]
  17. Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers [J]. Foundations & Trends in Machine Learning, 2010, 3(1): 1-122. [Google Scholar]
  18. He B S, Yang H, Wang S L. Alternating direction method with self adaptive penalty parameters for monotone variational inequalities [J]. Journal of Optimization Theory & Applications, 2000, 106(2): 337-356. [CrossRef] [MathSciNet] [Google Scholar]

All Tables

Table 1

Timing comparison (averaged over 100 replications) of Algorithm 1 and Algorithm 2 in computing the regularization paths for both models

All Figures

thumbnail Fig.1 Sparsity levels measured by the number of nonzero entries of

Sparsity levels measured by the number of nonzero entries of

(a) and (c) describe the sparsity levels of the -subproblem iterates of the ADMM algorithm, fit for a fixed ; (b) and (d) describe the sparsity levels of the -subproblem over the iterates of our proposed ADMM algorithmic regularization paths. (a) and (b) are for Model (I), and (c) and (d) are for Model (II)

In the text
thumbnail Fig.2 Simulation results for Model (I)

Simulation results for Model (I)

(a) The ADMM regularization paths with cool starts; (b) The ADMM regularization paths with warm starts; (c) The ADMM algorithmic regularization paths with a relatively large step size; and (d) The ADMM algorithmic regularization paths with a tiny step size. The blue lines denote the false positive variables and the red lines denote true positive variables

In the text
thumbnail Fig.3 Simulation results for Model (II)

Simulation results for Model (II)

(a) The ADMM regularization paths with cool starts; (b) The ADMM regularization paths with warm starts; (c) The ADMM algorithmic regularization paths with a relatively large step size; and (d) The ADMM algorithmic regularization paths with a tiny step size. The blue lines denote the false positive variables and the red lines denote true positive variables

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.