cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R

2021
Optimal cutpoints for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in optimal cutpoints that are highly variable and systematically overestimate the out-of-sample performance. To address these concerns, the cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance. The robust methods include bootstrapping and smoothing based on kernel estimation, generalized additive models, smoothing splines, and local regression. These methods can be applied to a wide range of binary-classification and cost-based metrics. cutpointr also provides mechanisms to utilize user-defined metrics and estimation methods. The package has capabilities for parallelization of the bootstrapping, including reproducible random number generation. Furthermore, it is pipe-friendly, for example for compatibility with functions from tidyverse. Various functions for plotting receiver operating characteristic curves, precision recall graphs, bootstrap results and other representations of the data are included. The package contains example data from a study on psychological characteristics and suicide attempts suitable for applying binary classification algorithms.
JOURNAL OF STATISTICAL SOFTWARE
卷号:98|期号:11
ISSN:1548-7660
来源机构
Bielefeld University of Applied Sciences
收录类型
SSCI
发表日期
2021
学科领域
循证社会科学-方法
国家
德国
语种
英语
DOI
10.18637/jss.v098.i11
其他关键词
CUT POINTS; CROSS-VALIDATION; PERFORMANCE; SELECTION; PACKAGE; BIAS
资助机构
German Federal Ministry for Education and ResearchFederal Ministry of Education & Research (BMBF) [01EK1501]
资助信息
The study was supported by the German Federal Ministry for Education and Research to GH (BMBF #01EK1501).
被引频次(WOS)
1
被引更新日期
2022-01
关键词
optimal cutpoint ROC curve bootstrap R