Reduced rank regression based on hard-thresholding singular value penalization

doi:10.3969/j.issn.0253-2778.2020.02.012

Abstract

Abstract: Reduced rank estimation using penalty functions to restrict ranks of variety matrices is often used for solving the multi-collinearity of high-dimensional multivariate regression. Here a hard-thresholding singular value penalization was considered to get more efficient results. Through local linear approximate method, non-convex models were converted to computable ones. This model is computationally efficient, and the resulting solution path is continuous. Experiment results from simulation and public datasets show that this kind of reduced rank regression has better accuracy than some frequently-used ones in most situations.

Key words: hard-thresholding, singular value decomposition, reduced rank regression, singular value penalty

CLC Number:

O212.4

XU Hongming. Reduced rank regression based on hard-thresholding singular value penalization[J]. Journal of University of Science and Technology of China, 2020, 50(2): 163-175.

References

［1］
YUAN M, EKICI A, LU Z, et al. Dimension reduction and coefficient estimation in multivariate linear regression[J]. Journal of the Royal Statistical Society, 2010, 57(3): 329-346.
[2] NEGAHBAN S N, WAINWRIGHT M J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling[J]. International Conference on Machine Learning, 2010, 39(2): 823-830.
[3] BUNEA F, SHE Y, WEGKAMP M H. Optimal selection of reduced rank estimators of high-dimensional matrices[J]. Annals of Statistics, 2010, 39(2): 1282-1309.
[4] CHEN K, DONG H, CHAN K S. Reduced rank regression via adaptive nuclear norm penalization[J]. Biometrika, 2013, 100(4): 901-920.
[5] FAN J, LI R. Variable selection vianonconvave penalized likelihood and its oracle properties[J]. Publications of the American Statistical Association, 2001, 96(456): 1348-1360.
[6] ZHENG Z, FAN Y, LV J. High dimensional thresholded regression and shrinkage effect[J].Journal of the Royal Statistical Society B, 2014, 76(3) : 627-649.
[7] HOERL A E, KENNARDR W. Ridge regression: Biased estimation for nonorthogonal problems[J]. Technometrics, 2000, 42(1): 80-86.
[8] ROHDE A, TSYBAKOV A B. Estimation of high-dimensional low-rank matrices[J]. Annals of Statistics, 2011, 39(2): 887-930.
[9] DONOHO D L, ELAD M. Optimally sparse representation in general (nonorthogonal) dictionaries via l minimization[J]. Proceedings of the National Academy of Sciences of the United States of America, 2003, 100(5): 2197-2202.
[10] BICKEL P J, RITOV Y, TSYBAKOV A B. Simultaneous analysis of lasso and Dantzig selector[J]. Annals of Statistics, 2008, 37(4): 1705-1732.
[11] FAN J, LV J.Nonconcave penalized likelihood with NP-dimensionality[J]. IEEE Transactions on Information Theory, 2011, 57(8): 5467-5484.
[12] REINSEL G C, VELU R P. Multivariate Reduced-Rank Regression[M]. New York: Springer, 1998: 369-370.
[13] ZOU H, LI R. One-step sparse estimates innonconcave penalized likelihood models[J]. Annals of Statistics, 2008, 36(4): 1509-1533.
[14] LANGE K, HUNTER D R, YANG I. Optimization transfer using surrogate objective functions[J]. Journal of Computational and Graphical Statistics, 2000, 9(1): 1-20.
[15] ZOU H, HASTIE T. Regularization and variable selection via the elastic net[J]. J Roy Statist Soc Ser B, 2005, 67(2): 301-320.
[16] TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society, 2011, 73(3): 273-282.
[17] HUANG J, HOROWITZ J L, MA S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models[J]. Annals of Statistics, 2008, 36(2): 587-613.
[18] KLOPP O. Rank penalized estimators for high-dimensional matrices[J]. Electronic Journal of Statistics, 2011, 5(2011): 1161-1183.
[19] KOLTCHINSKII V, LOUNICI K, TSYBAKOV A B. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion[J]. Annals of Statistics, 2011, 39(5): 2302-2329.
[20] WITTEN D M, TIBSHIRANI R, HASTIE T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis[J]. Biostatistics, 2009, 10(3): 515-534.
[21] CHIN K, DEVRIES S, FRIDLYAND J, et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies[J]. Cancer Cell, 2006, 10(6): 529-541.
[22] VON NEUMANN J. Some matrix-inequalities and metrization of matrix-space[J]. Tomsk Univ Rev, 1937, 11(1): 286-300.

附录

定理2.1的证明
A.1 证明需要的引理

引理A.1 (Von Neumann迹不等式)考虑两个矩阵：A,B∈Rn1×n2.它们的奇异值组成的向量为σ(A)={a1,a2,…},σ(B)={b1,b2,…}，那么有

tr(A′B)=〈A,B〉≤〈σ(A),σ(B)〉=a1*b1+a2*b2…

引理A.1的证明见文献[22].
A.2 模型变量选择一致性

第一步，我们假设β︿=(σ1(B︿),…,σr(B︿)),β0=(σ1(B0),…,σr(B0))(r=min(p,q))，任何非零分量的真实回归系数向量β0或全局最优解β︿比λ大，这说明‖pλ(β︿)‖1=λ2‖β︿‖0/2,‖pλ(β0)‖1=sλ2/2. 所以，‖pλ(β︿)‖1-‖pλ(β0)‖1=(‖β︿‖0-s)λ2/2. 所以，我们用δ代替β︿-β0. 直接计算

Q(B︿)-Q(B0)=12n(tr{(Y-XB︿)′(Y-XB︿)}
-tr{(Y-XB0)′(Y-XB0)})+

‖pλσi(B︿)‖1-‖pλσi(B0)‖1.

其中，

tr{(Y-XB︿)′(Y-XB︿)}-tr{(Y-XB0)′(Y-XB0)}=

tr{Y′Y-B︿′X′(XB︿+E)-(XB︿+E)′XB︿+B︿′X′XB︿}-

tr{Y′Y-B′0X′(XB0+E)-(XB0+E)′XB0+B′0X′XB0}=tr{(B︿-B0)′X′X(B︿-B0)}-2tr{E′X(B︿-B0)}.

考虑到B′={β′1,…,β′n}，然后tr{B′X′XB}=∑ni=1‖Xβi‖2.考虑到鲁棒火种M=rsparkc(X)是使如下不等式成立的最大的τ：

min‖δ‖0<τ,‖δ‖2=1n-1/2‖Xδ‖2≥c.

所以有
n-1/2tr{B′X′XB}≥c (max(‖βi‖0)
而且因为引理A.1，有

tr(A′B)≤∑σi(A)σi(B).

由于一个矩阵第一个奇异值最大，所以我们得到条件更松的另一个迹不等式tr(A′B)≤d1(A)∑σi(B)，那么有
n-1|tr{E′X(B︿-B0)}|= n-1|tr{E′XB︿}-tr{E′XB0}|≤
n-1d1(X′E)|∑(σi(B︿)-σi(B0))|≤(σn+θ)(p+q)‖δ‖1≤
(σn+θ)(p+q)‖δ‖/20‖δ‖2.

将这些式子合并可以得到
Q(B︿)-Q(B0)≥
2-1c2‖δ‖2F-(σn+θ)(p+q)‖δ‖/20‖δ‖2+(‖β︿‖0-s)λ2/2.

所以，
2-1c2‖δ‖2F-(σn+θ)(p+q)‖δ‖/20‖δ‖2+(‖β︿‖0-s)λ2/2≤0.

现在定义t等于(σn+θ)(p+q),重新整理这些公式，得到

{c‖δ‖2-tc‖δ‖/20}2-t2c2‖δ‖0+(‖β︿‖0-s)λ2≤0.

可以得出
(‖β︿‖0-s)λ2≤t2c2‖δ‖0.

定义k=‖β︿‖0=rank(B︿)，令‖δ‖0=‖β︿-β0‖0≤k+s. 因此
(k-s)λ2≤t2c2‖k+s‖0.

整理k和s的关系，我们得到
k{λ2-t2c2}≤s{λ2+t2c2}.

所以
k≤s(λ2+t2c2)/（λ2-t2c2）=s{1+2t2λ2c2-t2}
因此，‖β︿‖0≤s.

第二步的做法是基于第一步，假设‖β0‖0< ‖β︿‖0；那么丢失的真实相关系数的个数k=|‖β0‖0-‖β︿‖0|≥1.所以我们有‖β︿‖0≥s-k和‖δ‖0≤‖β︿‖0+‖β0‖0≤2s.综合以上结论，有

Q(B︿)-Q(B0)≥2-1c2‖δ‖22-2st‖δ‖2-kλ2/2.

对所有j∈supp(β0)\supp(β︿)，有|δj|=|β0,j|≥b0. 所以，‖δ‖2≥b0k，综上，有

4-1c2‖δ‖2≥4-1c2b0k≥4-1c2b0>2st.

因此

Q(B︿)-Q(B0)≥4-1c2‖δ‖22-kλ2/2≥4-1c2kb20-kλ2/2>0.

因为λ
A.3 预测和估计损失

X(B︿-B0)的Frobenius 范数

‖X(B︿-B0)‖F=∑σ2(X(B︿-B0))=tr{(B︿-B0)′X′X(B︿-B0)}.

我们考虑情况1=∩′（和′见式（11）和（12）），有‖δ‖0≤s，又由于上面刚证明的A.2，且由于Cauthy-Schwarz 不等式，有

|n-1E′0X0δ|≤d1(n-1E′0X0) |∑(σi(B︿)-σi(B0))|≤

(σn+θ0)(2r*) ‖δ‖1≤s(σn+θ0)(2r*)‖δ‖2.

根据A.2中给出的‖β︿‖0=s，所以

Q(B︿)-Q(B0)=2-1‖n-1X(B︿-B0)‖2F-tr{n-1E′X(B︿-B0)}+1/2(‖β︿‖0-s)λ2≥

2-1c2‖δ‖22-d1(n-1E′X)‖δ‖2≥2-1c2‖δ‖2-(σn+θ0)(s)(2r*)‖δ‖2.

从全局最优性β有2-1c2‖δ‖2-(σn+θ0)(s)(2r*)≤0，其中L2估计和L∞估计的边界

‖β︿-β0‖∞≤‖β︿-β0‖2=‖δ‖2≤2c2(σn+θ0)(s)(2r*)≤42sσc2n+2c′2slnnc2n.

对Lm估计损失当1≤m≤2，应用Holder不等式得到


‖β︿-β0‖m=(∑nj=1|δj|m)/m≤(∑nj=1|δj|2)/2)(∑δj≠012/(2-m))/m-1/2)=
‖δ‖2‖δ‖0/m-1/2≤2s/m1c2(σn+θ0)(2r*)≤4s(1/2+1/m)σc2n+2c′2s/mlnnc2n.

最后，证明了Oracle预测损失的界.因为B︿是全局最优解，结合A.2的证明对1，有

2-1/2n-1/2tr{(B︿-B0)′X′X(B︿-B0)}≤{n-1tr{E′X(B︿-B0)}-(‖β︿‖0-s)λ2/2}/2≤

d1(n-1X′0E0) ‖δ‖1≤2s1c(σn+θ0)(2r*)≤22sσcn+c′22slnncn.

这样就完成了n-1/2标准化设计矩阵情况下的证明.

()
()