Proper Inference for Value Function in High-Dimensional Q-Learning for Dynamic Treatment Regimes

2019
Dynamic treatment regimes are a set of decision rules and each treatment decision is tailored over time according to patients' responses to previous treatments as well as covariate history. There is a growing interest in development of correct statistical inference for optimal dynamic treatment regimes to handle the challenges of nonregularity problems in the presence of nonrespondents who have zero-treatment effects, especially when the dimension of the tailoring variables is high. In this article, we propose a high-dimensional Q-learning (HQ-learning) to facilitate the inference of optimal values and parameters. The proposed method allows us to simultaneously estimate the optimal dynamic treatment regimes and select the important variables that truly contribute to the individual reward. At the same time, hard thresholding is introduced in the method to eliminate the effects of the nonrespondents. The asymptotic properties for the parameter estimators as well as the estimated optimal value function are then established by adjusting the bias due to thresholding. Both simulation studies and real data analysis demonstrate satisfactory performance for obtaining the proper inference for the value function for the optimal dynamic treatment regimes.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
页码:1404-1417|卷号:114|期号:527
ISSN:0162-1459
收录类型
SSCI
发表日期
2019
学科领域
循证社会科学-方法
国家
中国
语种
英语
DOI
10.1080/01621459.2018.1506341
其他关键词
SEQUENCED TREATMENT ALTERNATIVES; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; RATIONALE
EISSN
1537-274X
资助机构
National Natural Science Foundation of ChinaNational Natural Science Foundation of China (NSFC) [11771072, 11371083]; Foundation from China Scholarship Council [201406625026]; National Institute of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA; National Science FoundationNational Science Foundation (NSF) [2R01NS073671-05A1, 1R01GM124104-01A1, NCI-P01CA142538, NSF-DMS-1555244]; NATIONAL CANCER INSTITUTEUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Cancer Institute (NCI) [P01CA142538] Funding Source: NIH RePORTER; NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCESUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of General Medical Sciences (NIGMS) [R01GM124104] Funding Source: NIH RePORTER; NATIONAL INSTITUTE OF NEUROLOGICAL DISORDERS AND STROKEUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of Neurological Disorders & Stroke (NINDS) [R01NS073671] Funding Source: NIH RePORTER
资助信息
This work is supported in part by the National Natural Science Foundation of China (11771072 and 11371083), Foundation from China Scholarship Council (No. 201406625026), National Institute of Health and National Science Foundation (2R01NS073671-05A1, 1R01GM124104-01A1, NCI-P01CA142538, NSF-DMS-1555244).
被引频次(WOS)
4
被引更新日期
2022-01
来源机构
Northeast Normal University - China University of North Carolina University of North Carolina Chapel Hill University of North Carolina School of Medicine University of North Carolina North Carolina State University
关键词
Hard threshold Q-learning Value function inference Variable selection