[1] YUAN M, EKICI A, LU Z, et al. Dimension reduction and coefficient estimation in multivariate linear regression[J]. Journal of the Royal Statistical Society, 2010, 57(3): 329-346. [2] NEGAHBAN S N, WAINWRIGHT M J. Estimation of (near) low-rank matrices with noise and high-dimensional scaling[J]. International Conference on Machine Learning, 2010, 39(2): 823-830. [3] BUNEA F, SHE Y, WEGKAMP M H. Optimal selection of reduced rank estimators of high-dimensional matrices[J]. Annals of Statistics, 2010, 39(2): 1282-1309. [4] CHEN K, DONG H, CHAN K S. Reduced rank regression via adaptive nuclear norm penalization[J]. Biometrika, 2013, 100(4): 901-920. [5] FAN J, LI R. Variable selection vianonconvave penalized likelihood and its oracle properties[J]. Publications of the American Statistical Association, 2001, 96(456): 1348-1360. [6] ZHENG Z, FAN Y, LV J. High dimensional thresholded regression and shrinkage effect[J].Journal of the Royal Statistical Society B, 2014, 76(3) : 627-649. [7] HOERL A E, KENNARDR W. Ridge regression: Biased estimation for nonorthogonal problems[J]. Technometrics, 2000, 42(1): 80-86. [8] ROHDE A, TSYBAKOV A B. Estimation of high-dimensional low-rank matrices[J]. Annals of Statistics, 2011, 39(2): 887-930. [9] DONOHO D L, ELAD M. Optimally sparse representation in general (nonorthogonal) dictionaries via l minimization[J]. Proceedings of the National Academy of Sciences of the United States of America, 2003, 100(5): 2197-2202. [10] BICKEL P J, RITOV Y, TSYBAKOV A B. Simultaneous analysis of lasso and Dantzig selector[J]. Annals of Statistics, 2008, 37(4): 1705-1732. [11] FAN J, LV J.Nonconcave penalized likelihood with NP-dimensionality[J]. IEEE Transactions on Information Theory, 2011, 57(8): 5467-5484. [12] REINSEL G C, VELU R P. Multivariate Reduced-Rank Regression[M]. New York: Springer, 1998: 369-370. [13] ZOU H, LI R. One-step sparse estimates innonconcave penalized likelihood models[J]. Annals of Statistics, 2008, 36(4): 1509-1533. [14] LANGE K, HUNTER D R, YANG I. Optimization transfer using surrogate objective functions[J]. Journal of Computational and Graphical Statistics, 2000, 9(1): 1-20. [15] ZOU H, HASTIE T. Regularization and variable selection via the elastic net[J]. J Roy Statist Soc Ser B, 2005, 67(2): 301-320. [16] TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society, 2011, 73(3): 273-282. [17] HUANG J, HOROWITZ J L, MA S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models[J]. Annals of Statistics, 2008, 36(2): 587-613. [18] KLOPP O. Rank penalized estimators for high-dimensional matrices[J]. Electronic Journal of Statistics, 2011, 5(2011): 1161-1183. [19] KOLTCHINSKII V, LOUNICI K, TSYBAKOV A B. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion[J]. Annals of Statistics, 2011, 39(5): 2302-2329. [20] WITTEN D M, TIBSHIRANI R, HASTIE T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis[J]. Biostatistics, 2009, 10(3): 515-534. [21] CHIN K, DEVRIES S, FRIDLYAND J, et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies[J]. Cancer Cell, 2006, 10(6): 529-541. [22] VON NEUMANN J. Some matrix-inequalities and metrization of matrix-space[J]. Tomsk Univ Rev, 1937, 11(1): 286-300.
附录 定理2.1的证明 A.1 证明需要的引理
引理A.1 (Von Neumann迹不等式)考虑两个矩阵:A,B∈Rn1×n2.它们的奇异值组成的向量为σ(A)={a1,a2,…},σ(B)={b1,b2,…},那么有
tr(A′B)=〈A,B〉≤〈σ(A),σ(B)〉=a1*b1+a2*b2… 引理A.1的证明见文献[22]. A.2 模型变量选择一致性
第一步,我们假设β︿=(σ1(B︿),…,σr(B︿)),β0=(σ1(B0),…,σr(B0))(r=min(p,q)),任何非零分量的真实回归系数向量β0或全局最优解β︿比λ大,这说明‖pλ(β︿)‖1=λ2‖β︿‖0/2,‖pλ(β0)‖1=sλ2/2. 所以,‖pλ(β︿)‖1-‖pλ(β0)‖1=(‖β︿‖0-s)λ2/2. 所以,我们用δ代替β︿-β0. 直接计算 Q(B︿)-Q(B0)=12n(tr{(Y-XB︿)′(Y-XB︿)} -tr{(Y-XB0)′(Y-XB0)})+
‖pλσi(B︿)‖1-‖pλσi(B0)‖1.
其中, tr{(Y-XB︿)′(Y-XB︿)}-tr{(Y-XB0)′(Y-XB0)}=
tr{Y′Y-B︿′X′(XB︿+E)-(XB︿+E)′XB︿+B︿′X′XB︿}-
tr{Y′Y-B′0X′(XB0+E)-(XB0+E)′XB0+B′0X′XB0}=tr{(B︿-B0)′X′X(B︿-B0)}-2tr{E′X(B︿-B0)}.
考虑到B′={β′1,…,β′n},然后tr{B′X′XB}=∑ni=1‖Xβi‖2.考虑到鲁棒火种M=rsparkc(X)是使如下不等式成立的最大的τ:
min‖δ‖0<τ,‖δ‖2=1n-1/2‖Xδ‖2≥c.
所以有 n-1/2tr{B′X′XB}≥c (max(‖βi‖0) 而且因为引理A.1,有
tr(A′B)≤∑σi(A)σi(B).
由于一个矩阵第一个奇异值最大,所以我们得到条件更松的另一个迹不等式tr(A′B)≤d1(A)∑σi(B),那么有 n-1|tr{E′X(B︿-B0)}|= n-1|tr{E′XB︿}-tr{E′XB0}|≤ n-1d1(X′E)|∑(σi(B︿)-σi(B0))|≤(σn+θ)(p+q)‖δ‖1≤ (σn+θ)(p+q)‖δ‖/20‖δ‖2.
将这些式子合并可以得到 Q(B︿)-Q(B0)≥ 2-1c2‖δ‖2F-(σn+θ)(p+q)‖δ‖/20‖δ‖2+(‖β︿‖0-s)λ2/2.
所以, 2-1c2‖δ‖2F-(σn+θ)(p+q)‖δ‖/20‖δ‖2+(‖β︿‖0-s)λ2/2≤0.
现在定义t等于(σn+θ)(p+q),重新整理这些公式,得到
{c‖δ‖2-tc‖δ‖/20}2-t2c2‖δ‖0+(‖β︿‖0-s)λ2≤0.
可以得出 (‖β︿‖0-s)λ2≤t2c2‖δ‖0.
定义k=‖β︿‖0=rank(B︿),令‖δ‖0=‖β︿-β0‖0≤k+s. 因此 (k-s)λ2≤t2c2‖k+s‖0.
整理k和s的关系,我们得到 k{λ2-t2c2}≤s{λ2+t2c2}.
所以 k≤s(λ2+t2c2)/(λ2-t2c2)=s{1+2t2λ2c2-t2} 因此,‖β︿‖0≤s.
第二步的做法是基于第一步,假设‖β0‖0< ‖β︿‖0;那么丢失的真实相关系数的个数k=|‖β0‖0-‖β︿‖0|≥1.所以我们有‖β︿‖0≥s-k和‖δ‖0≤‖β︿‖0+‖β0‖0≤2s.综合以上结论,有
Q(B︿)-Q(B0)≥2-1c2‖δ‖22-2st‖δ‖2-kλ2/2.
对所有j∈supp(β0)\supp(β︿),有|δj|=|β0,j|≥b0. 所以,‖δ‖2≥b0k,综上,有
4-1c2‖δ‖2≥4-1c2b0k≥4-1c2b0>2st.
因此
Q(B︿)-Q(B0)≥4-1c2‖δ‖22-kλ2/2≥4-1c2kb20-kλ2/2>0.
因为λ A.3 预测和估计损失
X(B︿-B0)的Frobenius 范数
‖X(B︿-B0)‖F=∑σ2(X(B︿-B0))=tr{(B︿-B0)′X′X(B︿-B0)}. 我们考虑情况1=∩′(和′见式(11)和(12)),有‖δ‖0≤s,又由于上面刚证明的A.2,且由于Cauthy-Schwarz 不等式,有
|n-1E′0X0δ|≤d1(n-1E′0X0) |∑(σi(B︿)-σi(B0))|≤
(σn+θ0)(2r*) ‖δ‖1≤s(σn+θ0)(2r*)‖δ‖2.
根据A.2中给出的‖β︿‖0=s,所以
Q(B︿)-Q(B0)=2-1‖n-1X(B︿-B0)‖2F-tr{n-1E′X(B︿-B0)}+1/2(‖β︿‖0-s)λ2≥
2-1c2‖δ‖22-d1(n-1E′X)‖δ‖2≥2-1c2‖δ‖2-(σn+θ0)(s)(2r*)‖δ‖2.
从全局最优性β有2-1c2‖δ‖2-(σn+θ0)(s)(2r*)≤0,其中L2估计和L∞估计的边界
‖β︿-β0‖∞≤‖β︿-β0‖2=‖δ‖2≤2c2(σn+θ0)(s)(2r*)≤42sσc2n+2c′2slnnc2n.
对Lm估计损失当1≤m≤2,应用Holder不等式得到
‖β︿-β0‖m=(∑nj=1|δj|m)/m≤(∑nj=1|δj|2)/2)(∑δj≠012/(2-m))/m-1/2)= ‖δ‖2‖δ‖0/m-1/2≤2s/m1c2(σn+θ0)(2r*)≤4s(1/2+1/m)σc2n+2c′2s/mlnnc2n.
最后,证明了Oracle预测损失的界.因为B︿是全局最优解,结合A.2的证明对1,有
2-1/2n-1/2tr{(B︿-B0)′X′X(B︿-B0)}≤{n-1tr{E′X(B︿-B0)}-(‖β︿‖0-s)λ2/2}/2≤
d1(n-1X′0E0) ‖δ‖1≤2s1c(σn+θ0)(2r*)≤22sσcn+c′22slnncn.
这样就完成了n-1/2标准化设计矩阵情况下的证明.
() () |