New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes

2015
Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long-term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example, Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior DTRs compared with Q-learning especially in small samples. We illustrate the methods using data from a clinical trial for smoking cessation. Supplementary materials for this article are available online.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
页码:583-598|卷号:110|期号:510
ISSN:0162-1459
收录类型
SSCI
发表日期
2015
学科领域
循证社会科学-方法
国家
美国
语种
英语
DOI
10.1080/01621459.2014.937488
其他关键词
SUPPORT VECTOR MACHINES; INDIVIDUALIZED TREATMENT RULES; ADAPTIVE TREATMENT STRATEGIES; CLINICAL-TRIALS; INFERENCE; DESIGN; RANDOMIZATION; CLASSIFIERS; PERFORMANCE; DECISIONS
EISSN
1537-274X
资助机构
NIHUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USA [P01 CA142538]
资助信息
The authors thank the coeditor, associate editor, and two anonymous reviewers for their constructive and helpful comments which led to an improved article. This work was supported in part by NIH grant P01 CA142538. We thank Dr. Victor Strecher for providing the Smoking Cessation data.
被引频次(WOS)
93
被引更新日期
2022-01
来源机构
University of Wisconsin System University of Wisconsin Madison University of North Carolina University of North Carolina Chapel Hill University of North Carolina North Carolina State University University of North Carolina University of North Carolina Chapel Hill University of North Carolina University of North Carolina Chapel Hill
关键词
Classification Personalized medicine Q-learning Reinforcement learning Risk bound Support vector machine