Appl. Math. Mech. -Engl. Ed.     2014, Vol. 35 Issue (2) : 259–268     PDF       
http: //dx. doi. org/10.1007/s10483-014-1788-6
The Chinese Meteorological Society
0

Article Information

Wei ZHU, Shi SHU, Li-zhi CHENG 2014.
Proximity point algorithm for low-rank matrix recovery from sparse noise corrupted data
Appl. Math. Mech. -Engl. Ed., 35 (2) : 259–268
http: //dx. doi. org/10.1007/s10483-014-1788-6

Article History

Received 2013-04-07;
Revised 2013-05-27
Proximity point algorithm for low-rank matrix recovery from sparse noise corrupted data
Wei ZHU1, Shi SHU1,2, Li-zhi CHENG3        
1 School of Mathematics and Computational Science, Xiangtan University, Xiangtan 411105, Hunan Province, P. R. China;
2 Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, Hunan Province, P. R. China;
3 SDepartment of Mathematics and Computational Science, College of Science, National University of Defense Technology, Changsha 410073, P. R. China;
ABSTRACT:The method of recovering a low-rank matrix with an unknown fraction whose entries are arbitrarily corrupted is known as the robust principal component analysis (RPCA). This RPCA problem, under some conditions, can be exactly solved via convex optimization by minimizing a combination of the nuclear norm and the l1norm. In this paper, an algorithm based on the Douglas-Rachford splitting method is proposed for solving the RPCA problem. First, the convex optimization problem is solved by canceling the constraint of the variables, and then the proximity operators of the objective function are computed alternately. The new algorithm can exactly recover the low-rank and sparse components simultaneously, and it is proved to be convergent. Numerical simulations demonstrate the practical utility of the proposed algorithm.
Keywordslow-rank matrix recovery        sparse noise        Douglas-Rachford splitting method         proximity operator       

1 Introduction

There has been growing interest in the robust principal component analysis (RPCA)[1, 2] recently,namely,recovering a low-rank matrix with an unknown fraction whose entries are arbitrarily corrupted. This problem has wide applications in many areas such as web data ranking and collaborative filtering[3],computer vision[1, 4],and machine learning[5].

Mathematically,suppose that we are given a large data matrix D,which is the sum of a low-rank matrix L0 and a sparse matrix S0. The aim of the RPCA is to recover the low-rank matrix L0 from the highly corrupted measure D = L0 + S0 exactly and efficiently. Cand`es et al.[1] and Chandrasekaran et al.[2] have shown that under some conditions,the RPCA problem can be exactly solved via convex optimization by minimizing a linear combination of the nuclear norm and the l1norm. That is to solve

where ||·|| denotes the nuclear norm of a matrix,i.e.,the sum of its singular values,||·||1 denotes the sum of the absolute values of the matrix entries,and λ is a positive weighting parameter.

The convex optimization (1) has received much attention,and many numerical algorithms have been proposed. Liu and Vandenberghe[6] suggested the interior point method,with which the problem (1) could be easily recast as a semidefinite program and be solved by some in- terior point solvers[7]. However,since the interior point method depends on second-order in- formation which is difficult to be computed,it is limited to small problems,e.g.,n < 102. Therefore,it cannot handle matrices whose dimension is larger than 102. To overcome the limited scalability of the interior point method,many kinds of first-order methods are devel- oped. Nesterov[8] presented an accelerated first-order method for optimization,which has been successfully employed for improving the convergence of the iterative thresholding method in l1 norm minimization[9, 10]. Toh and Yan[11] designed a proximal gradient algorithm (APG) for the matrix completion inspired by an accelerated version of the iterative thresholding. Ganesh et al.[12] applied the APG algorithm to low-rank and sparse decomposition,i.e.,the problem (1). Experimental results show that the APG is at least 50 times faster than the iterative thresholding method. Goldfarb et al.[13] developed the accelerated iterative thresholding and proposed an alternating linearization method and its fast version. They also applied their algo- rithms to the problem (1) and obtained higher accuracy than the APG with the same number of iterations. More recently,Lin et al.[14] applied the augmented Lagrange multiplier (ALM) for the problem (1),and achieved much higher efficiency,at least five times faster than the APG. Yuan and Yang[15] proposed the alternating direction method based on the ALM.

However,the above mentioned methods depend strongly on a good parameter-choice rule since they all convert the constraint into the objective function as a penalty term and have to introduce some penalty parameters. A common way is to use the so-called continuation schemes[16, 17] which need to be carefully designed. In this paper,we do not follow the traditional way but transform the problem (1) as an unconstrained version by canceling one variable with the constrained equation and minimize a new objective function for the other variable. That is to minimize

Obviously,the objective function of (2) or (3) is without any introduced parameter,and thus needs no continuation scheme. Based on the Douglas-Rachford splitting method,we design a proximity point algorithm to solve the problem (2) for the exact low-rank matrix recovered from the sparse noise corrupted data matrix D. Although the aim of our algorithm is to recover the low-rank matrix,we shall show that the sparse component S0 can also be recovered simultaneously by a proximal point sequence.

The remainder of the paper is organized as follows. In Section 2,we introduce the prox- imity operator and the Douglas-Rachford splitting method. In Section 3,we first discuss the uniqueness of the solution to (2) or (3),and then apply the Douglas-Rachford splitting method to the unconstrained problem (2) and design a novel algorithm. Meanwhile,the convergence is proved. In Section 4,we present some implementation details and simulations to illustrate the practical utility of our algorithm. Finally,some discussion is made in Section 5. 2 Method 2.1 Proximity operator

Let H be a real Hilbert space with the inner product <·|·> and a corresponding norm ||·||. Let Γ0(H) be a class of lower semicontinuous,convex,and proper functions from H to (−∞,+∞]. The proximity operator of a function f ∈ Γ0(H) is the operator proxf : H → H,which maps every x ∈ H to the unique minimizer of the function f + 1/2||x − ·||2,i.e.,for all x ∈ H,

The foundational theory of proximity operators was introduced by Moreau[18] in 1962. Recently, Combettes and Wajs[19] developed the theory and applied it to signal recovery problems. Here, we provide two useful properties of proximity operators[20]. One is a subdifferential characteri- zation,and the other is the nonexpansive property. Both of them are essential to establish the convergence of various algorithms based on proximity operators.

Lemma 1 Let f ∈ Γ0(H). Let x ∈ H and p ∈ H. Then,p = proxfx if and only if x − p ∈ ∂f(p).

Lemma 2 Let f ∈ Γ0(H). Then,proxf is nonexpansive: x,y ∈ H,it holds

For convenience,we introduce the following soft-thresholding operator:

where x ∈ R,and μ > 0. This operator can be extended to vectors and matrices by applying its element-wise. Denote With the soft-thresholding operator,we can provide closed-form formulas for proximity opera- tors of g1 and g2 as follows[12]: 2.2 Douglas-Rachford splitting method

To solve the following unconstrained convex problem:

where both f1 and f2 belong to Γ0(H). The Douglas-Rachford splitting method[20] for mini- mizing F(X) is governed by an updating rule as follows: where λn ∈ (0,2),r ∈ (0,+∞),and the sequences (An)n∈N and (Bn)n∈N model tolerances in the implementation of the proximity operators.

Here,we provide a brief summary of notations for describing a convergence result of the above updating rule. The domain of a function f : H → (−∞,+∞] is

Theorem 1[20] Let fi ∈ Γ0(H) (i ∈ {1,2}). Let (An)n∈N and (Bn)n∈N be sequences in H. Let (yn)n∈N be a sequence generated by the updating rule (8). Suppose that the following conditions hold: (i) f1 + f2 is coercive, (ii) 0 ∈ sri(domf1 − domf2), 3 Algorithm 3.1 Proximity point algorithm

The RPCA demonstrates that the solving of the problem (1) can exactly recover the low-rank and sparse components L0 and S0. In other words,the solution to the problem (1) is unique. Thereby,we conclude that the solution to (2) or (3),which is equivalent to the problem (1), is also unique. Hence,one can deduce that the first-order optimality condition of (2) or (3) is also sufficient for a point to be a solution. That is the following lemma.

Lemma 3 X is a solution to the problem (2) if and only if it obeys

Y is a solution to the problem (3) if and only if it obeys

Now,we focus on the Hilbert space Rn1×n2 with n1 and n2 being the dimension of the data matrix D. Let

Before applying the Douglas-Rachford splitting method to the problem (2),we need to compute the proximity operator of f1(X),i.e.,

Let Z = D − X. Then,we have X = D − Z and

By the closed-form formula for the proximity operator of g1 = ||X||1,we deduce that

Applying the Douglas-Rachford splitting method to the problem (2) yields the following updating rule:

where λn ∈ (0,2),r ∈ (0,+∞),and the sequences (An)n∈N and (Bn)n∈N model tolerances in the implementation of the proximity operators. Replacing the proximity operators by their closed-form formulas into the above updating rule leads to our main algorithm of this study, which is described as follows: Algorithm 1 Proximity point algorithm for RPCA Input: D ∈Rn1×n2 and λ. (I) Let λn ∈ (0,2),r ∈ (0,+∞),Y0 = D. (II) (U,Σ,V ) := svd(Yn),P2,n := USr[Σ]V T. (III) Xn := Yn − 2(Yn − P2,n). (IV) Zn := D − Xn,P1,n := Srλ[Zn],P1,n := D − P1,n. (V) Yn+1 := Yn + λn(P1,n − P2,n). (VI) n := n + 1. (VII) If the convergence is satisfied,end the computation; otherwise,Yn+1 := Y0,and go to Step (II). Output: (P1,n,P2,n). 3.2 Convergence analysis

From Theorem 1,we can see that the convergence of the sequence generated by the Douglas- Rachford splitting method is weak. This is because that the discussed space is a general Hilbert space whose dimension may be infinite. But here,our space Rn1×n2 is a finite dimensional linear space,and we can deduce a strong convergence result since the weak convergence is equivalent to the strong convergence in any finite dimensional linear space. Furthermore,we shall show that the sparse matrix S0 can be recovered simultaneously although we do not solve its optimization problem (3).

Theorem 2 Let (An)n∈N and (Bn)n∈N be sequences in Rn1×n2 . Let (Yn)n∈N,(P1,n)n∈N, and (P2,n)n∈N be sequences generated by the updating rule (12). We suppose that the sequence (λn)n∈N has a positive lower bound and the following conditions hold:

Proof It is obvious that the sum function f1 +f2 is coercive and 0 ∈ sri(domf1 −domf2). Together with the above conditions,the four conditions of Theorem 1 are satisfied. Therefore, we conclude that (Yn)n∈N converges weakly to the matrix Y and proxrf2Y is a minimizer of the problem (2). Since the weak convergence turns into a strong convergence in any finite dimensional linear space,(Yn)n∈N converges to the matrix Y . By the second condition

and the condition that (λn)n∈N has a positive lower bound,we deduce that both (An)n∈N and (Bn)n∈N tend to zero as n tends to ∞. Denote By Lemma 2,it holds Then, It follows that (P1,n)n∈N converges to the matrix P1. Since we deduce that D = P1+P2 from the fact that the sequence (λn)n∈N has a positive lower bound and sequences (Yn)n∈N,(P1,n)n∈N,and (P2,n)n∈N converge to Y ,P1,and P2,respectively. By Lemma 1 and P2 = proxrf2Y,we have which follows Y − P2 ∈ @λrkP1k1 since D = P1 + P2. Together with Y − P2 = ∂r||P2||,we have Canceling one variable by D = P1 + P2 yields 4 Numerical result 4.1 Implementation details 4.1.1 Computing SVD

Since we only need to compute few singular values that are larger than a set threshold and their corresponding singular vectors,we use the PROPACK package[21],which has been employed and recommended by several papers[11, 14, 22]. Before using the PROPACK package, we have to predict the dimension of the principal singular space whose singular values are larger than the set singular. Since the sequences (An)n∈N and (Bn)n∈N model tolerances in the implementation of the proximity operators and only need to obey the second condition of Theorem 2,the convergence of our algorithm still can be guaranteed by Theorem 2 even if the prediction is less than the practical dimension. This is not the case of other algorithms. We divide our prediction rule into two cases. One case is that we can predict the rank of the low-rank matrix L0,and the other case is that we have no idea of the rank of the low-rank matrix L0. Here,we use the same parameter notations in Refs. [11, 14]. In the first case,we use svpn,which equals to or is somewhat bigger than the rank of the low-rank matrix L0,as the predicted dimension. In the second case,we use the following prediction rule:

where d = min(n1,n2),s is a constant number to avoid our prediction being conservative,svn is the predicted dimension,and svpn is the number of singular values in the svn singular values which are larger than a given threshold. The second rule is similar to the prediction rule[11, 14] which is more complicated and depends on the observation that Lagrangian multiplier matrices are monotonically increasing and become stable at the true rank. Cai et al.[22] observed the same phenomenon in matrix completion by the singular value thresholding algorithm. However, no person can prove it strictly up to now. Therefore,if the computation of the soft-thresholding operator by their prediction rules is not exact,the convergence of their algorithms needs to be proved by taking the error into account. 4.1.2 Order of minimizing f1 and f2

As suggested by Lin et al.[14],we first minimize the function f1,and then minimize the function f2 in our numerical simulations,considering the huge complexity of the SVD for large dimensional matrices. One can easily check that the order of minimizing does not affect the convergence result. 4.1.3 Setting parameters

Following the theory of RPCA in [1],we choose a fixed weighting parameter

for a given data matrix D ∈ Rn1×n2 to be decomposed. We set λn ≡ 1.5 and r = 20. The stopping criterion is ||D − P1,n − P2,n||F /||D||F < 10−7. 4.2 Simulations

In this section,we compare our algorithm,i.e.,the proximity point algorithm (PPA),with the exact augmented Lagrange multiplier (EALM) algorithm and the inexact augmented La- grange multiplier (IALM) algorithm proposed in [14]. Both EALM and IALM are faster than the previous state-of-the-art algorithms[12]. We use the same simulation conditions in [14] for impartial comparison,i.e.,we use randomly generated square matrices for our simulations. We create the random low-rank matrices L0 ∈ Rm×m with the rank r by the following procedures: generate the random matrices U ∈ Rm×r and V ∈ Rm×r with independent and identically dis- tributed Gaussian random variables with zero mean and unit variance,and then set L0 = UVT.We generate S0 as a sparsematrix,whose support is chosen uniformly at random and whose non- zero entries are independent and identically distributed uniformly in the interval [−500,500]. Thus,the ordered pair (L0,S0) is the true solution. The matrix D = L0 + S0 is the input for decomposition with different algorithms and (L,S) denotes the output. We adopt the first pre- diction rule discussed in implementation details,and use relL and rel S to denote the relative errors of the output solutions (L,S),i.e.,

All simulations are performed by MATLAB (version 7.01) on a personal computer equipped with an Intel Pentium-IV 3.06GHZ processor and 512MB of RAM under Windows XP. Brief comparisons of the three algorithms are presented in Tables 1 and 2. As can be seen,for the same triplet {m,rank(L0),||S0||0},the RPCA problem is solved for the same data matrix D by using three different algorithms. Moreover,we can see that the three algorithms all recover the low-rank matrix L0 and the sparse matrix S0 with high accuracy. IALM is the fastest one among the three algorithms,and EALM is the slowest one. Since IALM is at least five times faster than APG[11, 12] and IALM is about two times faster than PPA,we conclude that PPA is at least two times faster than APG. Moreover,when the rank of L0 or kS0k0 increases,the estimate of the number of non-zeros in S0 (i.e.,||S0||0) becomes worse and worse. While errors between the estimated ||S0||0 by PPA and IALM and the truth number of non-zeros in S0 are not beyond two. Although IALM performs better than the others,one has to choose penalty parameters[14] in each iteration step properly to ensure both optimality and fast convergence.

Table 1.Comparison of EALM,IALM,and PPA on RPCA problems for Case 1

Table 2.Comparison of EALM,IALM,and PPA on RPCA problems for Case 2
5 Conclusions

In the paper,we design an algorithm,i.e.,PPA,based on the Douglas-Rachford splitting method for solving the RPCA problem. Compared with the previous state-of-the-art algorithms, our algorithm has at least three advantages. First,the objective function of the problem (2) or (3) is simpler than that of other algorithms,and has only one variable for the low-rank matrix variable L and the sparse matrix variable S. Secondly,it recovers the low-rank and sparse components exactly and simultaneously by solving the problem (2) or (3),and its convergence can be proved strictly. Thirdly,it does not introduce any penalty parameter. In contrast, other algorithms,for example APG and ALM,introduce such penalty parameters for canceling constraint,and hence have to carefully design the continuous schemes to ensure both optimality and fast convergence. Numerical simulations demonstrate the practical utility of our algorithm.

References
[1] Cand`es, E. J., Li, X., Ma, Y., and Wright, J. Robust principal component analysis? Journal of the ACM, 58, 11:1–37 (2011)
[2] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21, 572–596 (2011)
[3] Netflix, Inc. The Netflix Prize, at (2009)
[4] Peng, Y., Ganesh, A., Wright, J., Xu, W., and Ma, Y. RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 763–770 (2010)
[5] Liu, G., Lin, Z., and Yu, Y. Robust subspace segmentation by low-rank representation. Twenty- Seventh International Conference on Machine Learning (ICML), Madison, 663–670 (2010)
[6] Liu, Z. and Vandenberghe, L. Interior-point method for nuclear norm approximation with applica- tion to system identification. SIAM Journal on Matrix Analysis and Applications, 31, 1235–1256 (2009)
[7] Grant, M. and Boyd, S. CVX: Matlab Software for Disciplined Convex Programming, at (2010)
[8] Nesterov, Y. Smooth minimization of non-smooth functions. Mathematical Programming, 103, 127–152 (2005)
[9] Beck, A. and Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202 (2009)
[10] Becker, S., Bobin, J., and Cand`es, E. J. NESTA: a fast and accurate first-order method for sparse recovery. SIAM Journal on Imaging Sciences, 4, 1–39 (2011)
[11] Toh, K. C. and Yun, S. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of Optimization, 6, 615–640 (2010)
[12] Ganesh, A., Lin, Z., Wright, J., Wu, L., Chen, M., and Ma, Y. Fast algorithms for recovering a corrupted low-rank matrix. 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), IEEE-Xplore, 213–216 (2009)
[13] Goldfarb, D., Ma, S., and Scheinberg, K. Fast alternating linearization methods for minimizing the sum of two convex functions. Mathematical Programming, 1–34 (2010)
[14] Lin, Z., Chen, M., and Ma, Y. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices, at (2010)
[15] Yuan, X. and Yang, J. Sparse and low rank matrix decomposition via alternating direction method. Pacific Journal of Optimization, 9, 167–180 (2013)
[16] Hale, E. T., Yin, W., and Zhang, Y. Fixed-point continuation applied to compressed sensing: implementation and numerical experiments. Journal of Computational Mathematics, 28, 170–194 (2010)
[17] Hale, E. T., Yin, W., and Zhang, Y. Fixed-point continuation for l1-minimization: methodology and convergence. SIAM Journal on Optimization, 19, 1107–1130 (2008)
[18] Moreau, J. Proximit′e et dualit′edans un espace hilbertien. Bulletin de la Socit′e Math′ematique de France, 93, 273–299 (1965)
[19] Combettes, P. L. and Wajs, V. R. Signal recovery by proximal forward-backward splitting. Mul- tiscale Modeling and Simulation, 4, 1168–1200 (2005)
[20] Combettes, P. L. and Pesquet, J. C. A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery. IEEE Journal of Selected Topics in Signal Processing, 1, 564–574 (2007)
[21] Larsen, R. M. Lanczos Bidiagonalization with Partial Reorthogonalization, Department of Com- puter Science, Aarhus University, Technical Report, DAIMI PB-357, Aarhus (1998)
[22] Cai, J. F., Cand`es, E. J., and Shen, Z. A singular value thresholding algorithm for matrix com- pletion. SIAM Journal on Optimization, 20, 1956–1982 (2010)