5. 报告题目：How to make model-free feature screening approaches for full data applicable to missing response case?
报告人简介：王启华，中国科学院数学与系统科学研究院研究员，博士生导师，国家杰出青年基金获得者，教育部长江学者奖励计划特聘教授，中科院“百人计划”入选者，Elected member of the International Statistical Institute (ISI).1997至今先后访问加拿大Carleton大学、California大学戴维斯分校、California大学洛杉矶分校、美国Yale大学、美国华盛顿大学、美国西北大学、德国Humboldt大学、澳大利亚国立大学及澳大利亚悉尼大学等。主要从事生存分析、缺失数据分析、高维数据统计分析及非-半参数统计推断等方面的研究。共出版专著2部，Springer出版社出版的书中2章, 发表论文100多篇，其中80多篇发表在JASA，The Annals of Statistics与Biometrika等国际重要刊物，80多篇被SCI收录，2014年及2015年被Elsevier列入中国高被引用专家榜单。
报告摘要: It is quite challenge to develop model-free feature screening approaches for missing response problems since the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops a novel technique by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh- dimensional covariates with full data can be applied to missing response case. This technique is developed by proving that the set of the active predictors on the response is a subset of the active predictors on the product of the response and missingness indicator. Then, any standard model-free feature screening procedures with screening property for full data can be applied to estimating the latter one. Hence, the probability that the estimated set contains the set of the latter one and hence the previous one tends to one. It is shown that the complete case (CC) approach can also keep the feature screening property of any feature screening approach with feature screening property for full data. As an alternative, a two-step approach is also developed for obtaining a feature screening estimator of the active predictor set of interest. A simulation study was conducted to compare the proposed methods with the ``complete case" (CC) approach. Real data analysis was used to illustrate the proposed method. Both the simulation studies and real data analysis indicate that the proposed zero imputation feature screening method outperforms the CC method and the two step one.
6. 报告题目: 医学大数据统计分析策略与数据挖掘
7. 报告题目: Projection correlation between two random vectors
报告人简介：朱立平博士为中国人民大学统计与大数据研究院教授，2006年于华东师范大学获得博士学位，同年任华东师范大学助理教授。2013年入选教育部新世纪优秀人才计划，2015年获得国家自然科学基金委“优秀青年基金”等资助。在统计顶级刊物Annals of Statistics，Journal of the Royal Statistical Society Series B，Journal of the American Statistical Association，Biometrika等顶级杂志上发表超过15篇文章。他的主要研究兴趣有半参数建模、高维数据分析、充分降维、变量选择等领域。
We propose projection correlation to characterize dependence between two random vectors. Projection correlation has several appealing properties. Specifically, it equals zero if and only if the two random vectors are independent; it is not sensitive to the dimensions of the two random vectors; and it is invariant with respect to the group of orthogonal transformations; and its estimation is free of tuning parameters and does not require moment conditions on the random vectors. We show that the sample estimate of the projection correction is n-consistent if the two random vectors are independent and root-n-consistent otherwise. Monte Carlo simulation studies indicate that the projection correlation has higher power than both the distance correlation and the ranks of distances in tests of independence, especially when the dimensions are relatively large or the moment conditions required by the distance correlation are violated.
8. 报告题目: Homogeneity Pursuit in Spectroscopic Data
报告人：许青松 教授 （中南大学）
报告摘要：In high-dimensional data modeling, variable selection methods have been a popular choice to improve the prediction accuracy by effectively selecting the subset of informative variables, and such methods can enhance the model interpretability with sparse representation. In this study, we propose a novel group variable selection method named ordered homogeneity pursuit lasso (OHPL) that takes the homogeneity structure in high-dimensional data into account. OHPL is particularly useful in high-dimensional datasets with strongly correlated variables. We illustrate the approach using three real-world spectroscopic datasets and compare it with four state-of-the-art variable selection methods. The benchmark results on real-world data show that the proposed method is capable of identifying a small number of influential groups and has better prediction performance than its competitors.