INCORPORATING CONDITIONAL DEPENDENCE IN LATENT CLASS MODELS FOR PROBABILISTIC RECORD LINKAGE: DOES IT MATTER?

2019
The conditional independence assumption of the Felligi and Sunter (FS) model in probabilistic record linkage is often violated when matching real-world data. Ignoring conditional dependence has been shown to seriously bias parameter estimates. However, in record linkage, the ultimate goal is to inform the match status of record pairs and therefore, record linkage algorithms should be evaluated in terms of matching accuracy. In the literature, more flexible models have been proposed to relax the conditional independence assumption, but few studies have assessed whether such accommodations improve matching accuracy. In this paper, we show that incorporating the conditional dependence appropriately yields comparable or improved matching accuracy than the FS model using three real-world data linkage examples. Through a simulation study, we further investigate when conditional dependence models provide improved matching accuracy. Our study shows that the FS model is generally robust to the conditional independence assumption and provides comparable matching accuracy as the more complex conditional dependence models. However, when the match prevalence approaches 0% or 100% and conditional dependence exists in the dominating class, it is necessary to address conditional dependence as the FS model produces suboptimal matching accuracy. The need to address conditional dependence becomes less important when highly discriminating fields are used. Our simulation study also shows that conditional dependence models with misspecified dependence structure could produce less accurate record matching than the FS model and therefore we caution against the blind use of conditional dependence models.
ANNALS OF APPLIED STATISTICS
页码:1753-1790|卷号:13|期号:3
ISSN:1932-6157
收录类型
SSCI
发表日期
2019
学科领域
循证社会科学-方法
国家
美国
语种
英语
DOI
10.1214/19-AOAS1256
其他关键词
ESTIMATING DIAGNOSTIC-ACCURACY; EVALUATING ACCURACY; DISCRIMINATING POWER; LOCAL DEPENDENCE; PATIENT RECORDS; GOLD STANDARD; ERROR; IDENTIFIERS; PERFORMANCE; TESTS
EISSN
1941-7330
资助机构
Agency for Healthcare Research and QualityUnited States Department of Health & Human ServicesAgency for Healthcare Research & Quality [R01HS018553, R01HS023808]; Patient-Centered Outcomes Research InstitutePatient-Centered Outcomes Research Institute - PCORI [ME-2017C1-6425]
资助信息
Supported by grants R01HS018553 and R01HS023808 from the Agency for Healthcare Research and Quality and ME-2017C1-6425 from the Patient-Centered Outcomes Research Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the
被引频次(WOS)
3
被引更新日期
2022-01
来源机构
Indiana University System Indiana University-Purdue University Indianapolis Indiana University System Indiana University-Purdue University Indianapolis Harvard University Beth Israel Deaconess Medical Center Harvard Medical School Indiana University System Indiana University-Purdue University Indianapolis Regenstrief Institute Inc
关键词
Conditional dependence finite mixture Gaussian random effects model latent class analysis log-linear model record linkage