中南大学概率统计及相关领域学术论坛

发布时间:2017年07月13日 作者:唐颖   消息来源:业务办    阅读次数:[]

统计方面的学术报告

5. 报告题目:How to make model-free feature screening approaches for full data applicable to missing response case?

报告人简介:王启华,中国科学院数学与系统科学研究院研究员,博士生导师,国家杰出青年基金获得者,教育部长江学者奖励计划特聘教授,中科院“百人计划”入选者,Elected member of the International Statistical Institute (ISI).1997至今先后访问加拿大Carleton大学、California大学戴维斯分校、California大学洛杉矶分校、美国Yale大学、美国华盛顿大学、美国西北大学、德国Humboldt大学、澳大利亚国立大学及澳大利亚悉尼大学等。主要从事生存分析、缺失数据分析、高维数据统计分析及非-半参数统计推断等方面的研究。共出版专著2部,Springer出版社出版的书中2章, 发表论文100多篇,其中80多篇发表在JASA,The Annals of Statistics与Biometrika等国际重要刊物,80多篇被SCI收录,2014年及2015年被Elsevier列入中国高被引用专家榜单。

报告摘要: It is quite challenge to develop model-free feature screening approaches for missing response problems since the existing standard missing data analysis methods cannot be applied directly to high dimensional case. This paper develops a novel technique by borrowing information of missingness indicators such that any feature screening procedures for ultrahigh- dimensional covariates with full data can be applied to missing response case. This technique is developed by proving that the set of the active predictors on the response is a subset of the active predictors on the product of the response and missingness indicator. Then, any standard model-free feature screening procedures with screening property for full data can be applied to estimating the latter one. Hence, the probability that the estimated set contains the set of the latter one and hence the previous one tends to one. It is shown that the complete case (CC) approach can also keep the feature screening property of any feature screening approach with feature screening property for full data. As an alternative, a two-step approach is also developed for obtaining a feature screening estimator of the active predictor set of interest. A simulation study was conducted to compare the proposed methods with the ``complete case" (CC) approach. Real data analysis was used to illustrate the proposed method. Both the simulation studies and real data analysis indicate that the proposed zero imputation feature screening method outperforms the CC method and the two step one.

6. 报告题目: 医学大数据统计分析策略与数据挖掘

报告人简介: 郭秀花,教授、博士生导师。现任首都医科大学公共卫生学院副院长、临床流行病学北京市重点实验室副主任。荣获“北京市教学名师”、“北京市优秀教师”、“总后勤部优秀教师”、“总后勤部育才银奖”;目前以第一作者或责任作者发表科研论文250多篇,其中SCI论文65篇;近5年主持国家自然基金重点项目、国家科技部“十三五”课题、国家自然基金面上项目、北京市自然基金重点项目等15项;主编专著或教材13部(其中2部为北京市精品教材);“医学统计学”北京市精品课程负责人;获省部级科技进步三等奖8项;获批国家级专利4项;在9个全国或北京市学会兼职,主要有:中国现场统计生物统计学会担任副理事长、北京生物统计与数据管理研究会担任副主任委员、IBS-CHINA担任常务理事等职务。


7. 报告题目: Projection correlation between two random vectors

报告人简介:朱立平博士为中国人民大学统计与大数据研究院教授,2006年于华东师范大学获得博士学位,同年任华东师范大学助理教授。2013年入选教育部新世纪优秀人才计划,2015年获得国家自然科学基金委“优秀青年基金”等资助。在统计顶级刊物Annals of Statistics,Journal of the Royal Statistical Society Series B,Journal of the American Statistical Association,Biometrika等顶级杂志上发表超过15篇文章。他的主要研究兴趣有半参数建模、高维数据分析、充分降维、变量选择等领域。

We propose projection correlation to characterize dependence between two random vectors. Projection correlation has several appealing properties. Specifically, it equals zero if and only if the two random vectors are independent; it is not sensitive to the dimensions of the two random vectors; and it is invariant with respect to the group of orthogonal transformations; and its estimation is free of tuning parameters and does not require moment conditions on the random vectors. We show that the sample estimate of the projection correction is n-consistent if the two random vectors are independent and root-n-consistent otherwise. Monte Carlo simulation studies indicate that the projection correlation has higher power than both the distance correlation and the ranks of distances in tests of independence, especially when the dimensions are relatively large or the moment conditions required by the distance correlation are violated.


8. 报告题目: Homogeneity Pursuit in Spectroscopic Data

报告人:许青松 教授 (中南大学)

报告摘要:In high-dimensional data modeling, variable selection methods have been a popular choice to improve the prediction accuracy by effectively selecting the subset of informative variables, and such methods can enhance the model interpretability with sparse representation. In this study, we propose a novel group variable selection method named ordered homogeneity pursuit lasso (OHPL) that takes the homogeneity structure in high-dimensional data into account. OHPL is particularly useful in high-dimensional datasets with strongly correlated variables. We illustrate the approach using three real-world spectroscopic datasets and compare it with four state-of-the-art variable selection methods. The benchmark results on real-world data show that the proposed method is capable of identifying a small number of influential groups and has better prediction performance than its competitors.



打印】【收藏】 【关闭