Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health

2021
Due to the recent advancements in wearables and sensing technology, health scientists are increasingly developing mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHealth intervention policies, often called just-in-time adaptive interventions, are decision rules that map an individual's current state (e.g., individual's past behaviors as well as current observations of time, location, social activity, stress, and urges to smoke) to a particular treatment at each of many time points. The vast majority of current mHealth interventions deploy expert-derived policies. In this article, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy. Our measure of performance is the average of proximal outcomes over a long time period should the particular mHealth policy be followed. We provide an estimator as well as confidence intervals. This work is motivated by HeartSteps, an mHealth physical activity intervention.for this article are available online.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
页码:382-391|卷号:116|期号:533
ISSN:0162-1459
收录类型
SSCI
发表日期
2021
学科领域
循证社会科学-方法
国家
美国
语种
英语
DOI
10.1080/01621459.2020.1807993
其他关键词
ALGORITHMS; TRIALS
EISSN
1537-274X
资助机构
National Institute on Alcohol Abuse and Alcoholism (NIAAA) of the National Institutes of Health [R01AA23187]; National Institute on Drug Abuse (NIDA) of the National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute on Drug Abuse (NIDA) [P50DA039838, R01DA039901]; National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Institute of Biomedical Imaging & Bioengineering (NIBIB) [U54EB020404]; National Cancer Institute (NCI) of the National Institutes of Health [U01CA229437]; National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of HealthUnited States Department of Health & Human ServicesNational Institutes of Health (NIH) - USANIH National Heart Lung & Blood Institute (NHLBI) [R01HL125440]
资助信息
This work was supported by National Institute on Alcohol Abuse and Alcoholism (NIAAA) of the National Institutes of Health under award number R01AA23187, National Institute on Drug Abuse (NIDA) of the National Institutes of Health under award numbers P50DA039838 and R01DA039901, National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health under award number U54EB020404, National Cancer Institute (NCI) of the National Institutes of Health under award number U01CA229437, and National Heart, Lung, and Blood Institute (NHLBI) of the National Institutes of Health under award number R01HL125440. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
被引频次(WOS)
1
被引更新日期
2022-01
来源机构
University of Michigan System University of Michigan University of Michigan System University of Michigan Harvard University
关键词
Markov decision process Policy evaluation Reinforcement learning Sequential decision making