An integrative analysis system of gene expression using self-paced learning and SCAD-Net

2019
Background: Few proposed gene biomarkers have been satisfactory in clinical applications. That is mainly due to the small studies sample size. Because of the batch effect, different gene-expression studies cannot be merged directly. Many integrative methods have attempted to integrate various datasets to eliminate the batch effect while keeping biological information intact. However, due to the complexity of the batch effect, it cannot be eliminated, and these methods may even add new systematic errors to the data, further complicating integrated data. Therefore, direct analysis of the merged data may cause some issues. In this paper, we suggest a novel integrative analysis framework for merged gene-expression data. The framework adopts the self-paced learning. This method allows samples to be automatically added into the training period, from simple to intricate, in a purely self-paced way. Moreover, the framework includes a new feature selection method, the SCAD-Net regularization method, a combination of SCAD and network-based penalties to integrates the biological network knowledge. The simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of seven large NSCLC gene expression datasets shows that the proposed method not only results in higher accuracies, but also identifies potential therapeutic markers and pathways in NSCLC. In conclusion, we provide a new and efficient integrative analysis system of gene expression, for the search for new reliable diagnosis or targeted therapy biomarker. (C) 2019 Elsevier Ltd. All rights reserved.
EXPERT SYSTEMS WITH APPLICATIONS
页码:102-112|卷号:135
ISSN:0957-4174
收录类型
SSCI
发表日期
2019
学科领域
循证管理学
国家
中国
语种
英语
DOI
10.1016/j.eswa.2019.06.016
其他关键词
VARIABLE SELECTION; REGULARIZATION; ALGORITHM; IDENTIFICATION; VALIDATION; PATTERNS; DISEASE; TARGETS
EISSN
1873-6793
资助机构
MOE (Ministry of Education in China) Project of Humanities and Social Sciences [18YJCZH054]; National Natural Science Foundation of Guangdong [2018A030307033]; Special Innovation Projects of Universities in Guangdong ProvinceNational Natural Science Foundation of Guangdong Province [12018ICTSCX205]; High-level Colleges Talent Project of GuangdongNational Natural Science Foundation of Guangdong Province [2013-178]; Macau Science and Technology Development Funds of Macau SAR of China [0055/2018/A2]
资助信息
This research was supported by MOE (Ministry of Education in China) Project of Humanities and Social Sciences [18YJCZH054], National Natural Science Foundation of Guangdong [2018A030307033], Special Innovation Projects of Universities in Guangdong Province 12018ICTSCX205], High-level Colleges Talent Project of Guangdong [2013-178], and Macau Science and Technology Development Funds [0055/2018/A2] of Macau SAR of China.
被引频次(WOS)
14
被引更新日期
2022-01
来源机构
Shaoguan University Shaoguan University Macau University of Science & Technology Macau University of Science & Technology
关键词
Integrative analysis system Meta-analysis Regularization Variable selection Gene expression