检索结果

Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models

Importance Large language models (LLMs) may facilitate the labor-intensive process of systematic reviews. However, the exact methods and reliability remain uncertain. Objective To explore the feasibility and reliability of using LLMs to assess risk of bias (ROB) in randomized clinical trials (RCTs). Design, Setting, and Participants A survey study was conducted between August 10, 2023, and October 30, 2023. Thirty RCTs were selected from published systematic reviews. Main Outcomes and Measures A structured prompt was developed to guide ChatGPT (LLM 1) and Claude (LLM 2) in assessing the ROB in these RCTs using a modified version of the Cochrane ROB tool developed by the CLARITY group at McMaster University. Each RCT was assessed twice by both models, and the results were documented. The results were compared with an assessment by 3 experts, which was considered a criterion standard. Correct assessment rates, sensitivity, specificity, and F1 scores were calculated to reflect accuracy, both overall and for each domain of the Cochrane ROB tool; consistent assessment rates and Cohen kappa were calculated to gauge consistency; and assessment time was calculated to measure efficiency. Performance between the 2 models was compared using risk differences. Results Both models demonstrated high correct assessment rates. LLM 1 reached a mean correct assessment rate of 84.5% (95% CI, 81.5%-87.3%), and LLM 2 reached a significantly higher rate of 89.5% (95% CI, 87.0%-91.8%). The risk difference between the 2 models was 0.05 (95% CI, 0.01-0.09). In most domains, domain-specific correct rates were around 80% to 90%; however, sensitivity below 0.80 was observed in domains 1 (random sequence generation), 2 (allocation concealment), and 6 (other concerns). Domains 4 (missing outcome data), 5 (selective outcome reporting), and 6 had F1 scores below 0.50. The consistent rates between the 2 assessments were 84.0% for LLM 1 and 87.3% for LLM 2. LLM 1's kappa exceeded 0.80 in 7 and LLM 2's in 8 domains. The mean (SD) time needed for assessment was 77 (16) seconds for LLM 1 and 53 (12) seconds for LLM 2. Conclusions In this survey study of applying LLMs for ROB assessment, LLM 1 and LLM 2 demonstrated substantial accuracy and consistency in evaluating RCTs, suggesting their potential as supportive tools in systematic review processes.

期刊论文

Reporting and risk of bias of prediction models based on machine learning methods in preterm birth: A systematic review

IntroductionThere was limited evidence on the quality of reporting and methodological quality of prediction models using machine learning methods in preterm birth. This systematic review aimed to assess the reporting quality and risk of bias of a machine learning-based prediction model in preterm birth. Material and methodsWe conducted a systematic review, searching the PubMed, Embase, the Cochrane Library, China National Knowledge Infrastructure, China Biology Medicine disk, VIP Database, and WanFang Data from inception to September 27, 2021. Studies that developed (validated) a prediction model using machine learning methods in preterm birth were included. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement and Prediction model Risk of Bias Assessment Tool (PROBAST) to evaluate the reporting quality and the risk of bias of included studies, respectively. Findings were summarized using descriptive statistics and visual plots. The protocol was registered in PROSPERO (no. CRD 42022301623). ResultsTwenty-nine studies met the inclusion criteria, with 24 development-only studies and 5 development-with-validation studies. Overall, TRIPOD adherence per study ranged from 17% to 79%, with a median adherence of 49%. The reporting of title, abstract, blinding of predictors, sample size justification, explanation of model, and model performance were mostly poor, with TRIPOD adherence ranging from 4% to 17%. For all included studies, 79% had a high overall risk of bias, and 21% had an unclear overall risk of bias. The analysis domain was most commonly rated as high risk of bias in included studies, mainly as a result of small effective sample size, selection of predictors based on univariable analysis, and lack of calibration evaluation. ConclusionsReporting and methodological quality of machine learning-based prediction models in preterm birth were poor. It is urgent to improve the design, conduct, and reporting of such studies to boost the application of machine learning-based prediction models in preterm birth in clinical practice.

期刊论文

A critical appraisal of clinical practice guidelines on insomnia using the RIGHT statement and AGREE II instrument

Objective: Clinical Practice Guidelines (CPGs) have an indispensable role in guiding the selection of various treatments for insomnia, however, little is known about the quality of released insomnia CPGs. This study aims to critically appraise the quality of existing insomnia CPGs and identify quality limitations. Methods: PubMed, Web of Science, Embase, China National Knowledge Infrastructure, Wanfang, China Biology Medicine disc, and 6 databases of international guideline developing institutions were systematically searched. CPGs on the diagnosis or treatment of insomnia were included. Reviewers independently extracted basic information and development methods, and assessed methodological quality and reporting quality using the Appraisal of Guidelines for Research and Evaluation (AGREE) II tool and Reporting Items for practice Guidelines in Healthcare (RIGHT) checklist respectively. Intraclass correlation coefficients (ICCs) were used to measure the agreement among reviewers and assess inter-rater reliability. Results: Twenty-six CPGs were identified that focused on adults, children, or children with autistic spectrum disorder, patients in the intensive care unit, patients with cancer and pregnant, lactating or menopausal women. Twenty-two CPGs used nine grading systems to rate the level of evidence and strength of recommendation. 53.85% CPGs were classified as "recommended with modification" according to the AGREE II scores (ICC from 0.64 to 0.90), and 2 CPGs were "recommended". The "clarity of presentation" domain achieved the highest mean score (67.9% +/- 11.04%) but the "applicability" domain (37.1% +/- 12.67%) achieved the lowest. The average reporting rate of RIGHT items in all guidelines was 67.87%. Conclusions: The quality of guidelines varied substantially. Guideline developers should realize the importance of guideline applicability, patients' preferences and values. (c) 2022 Elsevier B.V. All rights reserved.

期刊论文

An umbrella review of meta-analyses on diagnostic accuracy of C-reactive protein

Background: Multiple studies and meta-analyses have reported the diagnostic value of C-reactive protein (CRP) in several diseases. However, the precision, and influence of potential bias regarding the diagnostic values of existing evidence may have implications for clinical practice. Methods: We performed an umbrella review of diagnostic test accuracy studies of CRP for diseases by searching PubMed, Embase, China National Knowledge Infrastructure, and WanFang databases up to March 7, 2021. Five independent reviewers evaluated eligibility, extracted data, and assessed methodological quality. We descriptively analyzed the diagnostic accuracy of CRP for multiple diseases, heterogeneity between studies, and publication bias. Results: Seventy-four meta-analyses were included, with 13 diseases classified according to the International Classification of Diseases-11 (ICD-11). The methodological quality of the included meta-analyses was mostly low, with only 16 meta-analyses rated as moderate or high, including seven diseases classified by ICD-11. CRP had a relatively greater diagnostic accuracy for two of these diseases. For postoperative infectious complications after bariatric surgery, sensitivity and specificity were 0.81 (0.34-1) and 0.91 (0.73-1), respectively. For anastomotic leakage after colorectal surgery, sensitivity and specificity were 0.95 (0.75-0.99) and 0.95 (0.75-0.99), respectively. Conclusions: The diagnostic accuracy of CRP for multiple diseases has been extensively studied; however, most studies have low methodological quality. Evidence indicates that CRP has a relatively greater diagnostic accuracy for inflammation and infection diseases, especially for postoperative infectious complications after bariatric surgery and anastomotic leakage after colorectal surgery.

期刊论文

PFMT relevant strategies to prevent perineal trauma: a systematic review and network meta-analysis

Background Most women suffer from perineal trauma during childbirth, whether it is natural tears or episiotomy. Objectives To perform a systematic review and network meta-analysis investigating the effectiveness of different PFMT relevant strategies in the prevention of perineal trauma. Search strategy PubMed, Embase, the Cochrane Library, CINAHL, CNKI, CBM, WANFANG DATABASE, and ClinicalTrials.gov were searched for citations published in any language from inception to 1 July 2021. Selection criteria Randomized controlled trials (RCTs) of PFMT relevant prevention strategies for preventing perineal trauma during childbirth. Data collection and analysis Data were independently extracted by two reviewers. Relative treatment effects were estimated using network meta-analysis (NMA). Main results Of 12 632 citations searched, 21 RCTs were included. Comparing with usual care, "PFMT combine with perineal massage" and PFMT alone showed more superiority in intact perineum (RR = 5.37, 95% CI: 3.79 to 7.60, moderate certainty; RR = 2.58, 95% CI 1.34-4.97, moderate certainty, respectively), episiotomy (RR = 0.26, 95% CI 0.14-0.49, very low certainty; RR = 0.63, 95% CI 0.45-0.90, very low certainty, respectively), and OASIS (RR = 0.35, 95% CI 0.16-0.78, moderate certainty; RR = 0.49, 95% CI 0.28-0.85, high certainty, respectively). "PFMT combine with perineal massage" showed superiority in reducing perineal tear (RR = 0.41, 95% CI 0.20-0.85, moderate certainty). Conclusions In view of the results, antenatal "PFMT combine with perineal massage" and PFMT were effective strategies for the prevention of perineal trauma.

期刊论文

Risk of abnormal pregnancy outcomes after using ondansetron during pregnancy: A systematic review and meta-analysis

Background: Hyperemesis gravidarum is a serious pregnancy complication that affects approximately 1% of pregnancies worldwide. Objective: To determine whether the use of ondansetron during pregnancy is associated with abnormal pregnancy outcomes. Search strategy: PubMed, Cochrane Library, CINAHL, Embase, CNKI, CBM, WANFANG, and were searched for citations published in any language from inception to 15 December 2021. Selection criteria: Eligible studies included any observational study. Data collection and analysis: Odds ratio (OR) and 95% confidence interval (CI) were used as indicators to examine the association between ondansetron and abnormal pregnancy outcomes. Main results: Twenty articles from 1,558 citations were included. Our preliminary analysis showed that compared with the unexposed group, the use of ondansetron during pregnancy may be associated with an increased incidence of cardiac defects (OR = 1.06, 95% CI: 1.01-1.10), neural tube defects (OR = 1.12, 95% CI: 1.05-1.18), and chest cleft (OR = 1.21, 95% CI: 1.07-1.37). Further sensitivity analysis showed no significant association between ondansetron and cardiac defects (OR = 1.15,95% CI: 0.94-1.40) or neural tube defects (OR = 0.87,95% CI: 0.46-1.66). When controversial studies were eliminated, the results for the chest defects disappeared. Simultaneously, we found that the use of ondansetron was associated with a reduced incidence of miscarriage (OR = 0.53, 95% CI: 0.31-0.89). Ondansetron was not associated with orofacial clefts (OR = 1.09,95% CI: 0.95-1.25), spinal limb defects (OR = 1.14,95% CI: 0.89-1.46), urinary tract deformities (OR = 1.06,95% CI: 0.97-1.15), any congenital malformations (OR = 1.03,95% CI: 0.98-1.09), stillbirth (OR = 0.97,95% CI: 0.83-1.15), preterm birth (OR = 1.22,95% CI: 0.80-1.85), neonatal asphyxia (OR = 1.05,95% CI: 0.72-1.54), or neonatal development (OR = 1.18,95% CI: 0.96-1.44) in our primary analysis. Conclusion: In our analysis, using ondansetron during pregnancy was not associated with abnormal pregnancy outcomes. Although our study did not find sufficient evidence of ondansetron and adverse pregnancy outcomes, future studies including the exposure period and dose of ondansetron, as well as controlling for disease status, may be useful to truly elucidate the potential risks and benefits of ondansetron.

期刊论文

White rice consumption and risk of cardiometabolic and cancer outcomes: A systematic review and dose-response meta-analysis of prospective cohort studies

White rice is the food more than half of the world's population depends on. White rice intake can significantly increase the glycemic load of consumers and bring some adverse health effects. However, the quality of evidence implicating white rice in adverse health outcomes remains unclear. To evaluate the association between white rice consumption and the risk of cardiometabolic and cancer outcomes, a systematic review and dose-response meta-analysis of the relevant publications were performed. Twenty-three articles including 28 unique prospective cohorts with 1,527,198 participants proved eligible after a comprehensive search in four databases. For the risk of type 2 diabetes mellitus (T2DM), the pooled RR was 1.18 (16 more per 1000 persons) for comparing the highest with the lowest category of white rice intake, with moderate certainty evidence. Females presented a higher risk (23 more per 1000 persons) in subgroup analysis. And every additional 150 grams of white rice intake per day was associated with a 6% greater risk of T2DM (5 more per 1000 persons) with a linear positive trend. We found no significant associations between white rice intake and risk of cardiovascular diseases (CVD), CVD mortality, cancer, and metabolic syndrome. In conclusion, moderate certainty evidence demonstrated that white rice intake was associated with T2DM risk, with a linear positive trend. However, low to very low certainty of evidence suggested that no substantial associations were found between white rice intake and other cardiometabolic and cancer outcomes. More cohorts are needed to strength the evidence body.

期刊论文

Quality Assessment of Cancer Pain Clinical Practice Guidelines

Introduction: Several clinical practice guidelines (CPGs) for cancer pain have been published; however, the quality of these guidelines has not been evaluated so far. The purpose of this study was to evaluate the quality of CPGs for cancer pain and identify gaps limiting knowledge. Methods: We systematically searched seven databases and 12 websites from their inception to July 20, 2021, to include CPGs related to cancer pain. We used the validated Appraisal of Guidelines for Research and Evaluation Instrument II (AGREE II) and Reporting Items for Practice Guidelines in Healthcare (RIGHT) checklist to assess the methodology and reporting quality of eligible CPGs. The overall agreement among reviewers with the intraclass correlation coefficient (ICC) was calculated. The development methods of CPGs, strength of recommendations, and levels of evidence were determined. Results: Eighteen CPGs published from 1996 to 2021 were included. The overall consistency of the reviewers in each domain was acceptable (ICC from 0.76 to 0.95). According to the AGREE II assessment, only four CPGs were determined to be recommended without modifications. For reporting quality, the average reporting rates for all seven domains of CPGs was 57.46%, with the highest domain in domain 3 (evidence, 68.89%) and the lowest domain in domain 5 (review and quality assurance, 33.3%). Conclusion: The methodological quality of cancer pain CPGs fluctuated widely, and the complete reporting rate in some areas is very low. Researchers need to make greater efforts to provide high-quality guidelines in this field to clinical decision-making.

期刊论文

The use of GRADE approach in Cochrane reviews of TCM was insufficient: a cross-sectional survey

Objective: To conduct a cross-sectional survey on the application status of the Grades of Recommendations Assessment Development and Evaluation (GRADE) in Cochrane systematic reviews (CSRs) of traditional Chinese medicine (TCM). Study Design and Setting: : We searched CSRs of TCM from the inception to December 2020 in the Cochrane Library database. General characteristics and details of GRADE were extracted. Results: Among 226 CSRs of TCM, 86 (38.05%) involving 711 outcomes used GRADE to rate the certainty of evidence. Topics mainly focused on genitourinary diseases (17.44%), diseases of the musculoskeletal system or connective tissue (11.63%), and diseases of the nervous system (10.47%). Only 15.89% of the outcomes reported high or moderate certainty of evidence. Acupuncture was the most common intervention. There were no significant differences in evidence certainty between acupuncture and non-acupuncture, between TCM alone and integrated Chinese and western medicine, or between Chinese patent medicines and non-Chinese patent medicines ( P > 0.05). Among 1 273 instances of downgrading, 44.62% were due to the risk of bias and 40.14% due to imprecision. Conclusion: Overall, GRADE approach is not widely used in CSRs of TCM. The certainty of evidence is generally low to very low, mainly because of the serious risk of bias and imprecision. (c) 2021 Elsevier Inc. All rights reserved.

期刊论文

Efficacy and Safety of Qingfei Paidu Decoction for Treating COVID-19: A Systematic Review and Meta-Analysis

Background: Qingfei Paidu decoction (QFPD) has been widely used in treating COVID-19 in China. However, there is still a lack of comprehensive and systematic evidence to demonstrate the effectiveness and safety of QFPD. This study aims to evaluate the efficacy and safety of QFPD in patients with COVID-19. Methods: We searched seven databases up to 5 March 2021. Two reviewers independently screened studies, extracted data of interest, and assessed risk of bias. The Cochrane risk of bias tool was used to assess the risk of bias of randomized controlled trials. The Newcastle-Ottawa scale was used to assess the risk of bias of cohort and non-randomized trials. The "Quality Assessment Tool for Before-After (Pre-Post) Studies With No Control Group" was adopted for controlled pre-post studies. We used the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) to assess the certainty of evidence. We carried out a random effect meta-analysis using RevMan 5.3. For outcomes that could not be meta-analyzed, we performed a descriptive analysis. Results: We identified 16 studies with 11,237 patients, including one RCT, six non-randomized trials, two cohort studies, and seven pre-post studies. The certainty of evidence was low to very low because of the observational study design. QFPD combined with conventional treatment might decrease the time for nucleic acid conversion (MD = -4.78 days, 95% CI: -5.79 to -3.77), shorten the length of hospital stay (MD = -7.95 days, 95% CI: -14.66 to -1.24), shorten the duration of symptoms recovery of fever (MD = -1.51 days, 95% CI: -1.92 to -1.09), cough (MD = -1.64 days, 95% CI: -1.91 to -1.36) and chest CT (MD = -2.23 days, 95% CI: -2.46 to -2.00), improve the overall traditional Chinese medicine symptom scores (MD = 41.58 scores, 95% CI: 32.67 to 50.49), and change the laboratory indexes, such as WBC, AST, and CRP. Conclusion: QFPD combined with conventional treatment might be effective for patients with COVID-19. No serious adverse reactions related to QFPD were observed. Further high-quality studies are still needed in the future.

期刊论文

Barriers and facilitators to implementation of evidence-based task-sharing mental health interventions in low- and middle-income countries: A systematic review using implementation science frameworks

BACKGROUND: Task-sharing is a promising strategy to expand mental healthcare in low-resource settings, especially in low- and middle-income countries (LMICs). Research on how to best implement task-sharing mental health interventions, however, is hampered by an incomplete understanding of the barriers and facilitators to their implementation. This review aims to systematically identify implementation barriers and facilitators in evidence-based task-sharing mental health interventions using an implementation science lens, organizing factors across a novel, integrated implementation science framework. METHODS: PubMed, PsychINFO, CINAHL, and Embase were used to identify English-language, peer-reviewed studies using search terms for three categories: "mental health," "task-sharing," and "LMIC." Articles were included if they: focused on mental disorders as the main outcome(s); included a task-sharing intervention using or based on an evidence-based practice; were implemented in an LMIC setting; and included assessment or data-supported analysis of barriers and facilitators. An initial conceptual model and coding framework derived from the Consolidated Framework for Implementation Research and the Theoretical Domains Framework was developed and iteratively refined to create an integrated conceptual framework, the Barriers and Facilitators in Implementation of Task-Sharing Mental Health Interventions (BeFITS-MH), which specifies 37 constructs across eight domains: (I) client characteristics, (II) provider characteristics, (III) family and community factors, (IV) organizational characteristics, (V) societal factors, (VI) mental health system factors, (VII) intervention characteristics, and (VIII) stigma. RESULTS: Of the 26,935 articles screened (title and abstract), 192 articles underwent full-text review, yielding 37 articles representing 28 unique intervention studies that met the inclusion criteria. The most prevalent facilitators occur in domains that are more amenable to adaptation (i.e., the intervention and provider characteristics domains), while salient barriers occur in domains that are more challenging to modulate or intervene on-these include constructs in the client characteristics as well as the broader societal and structural levels of influence (i.e., the organizational, mental health system domains). Other notable trends include constructs in the family and community domains occurring as barriers and as facilitators roughly equally, and stigma constructs acting exclusively as barriers. CONCLUSIONS: Using the BeFITS-MH model we developed based on implementation science frameworks, this systematic review provides a comprehensive identification and organization of barriers and facilitators to evidence-based task-sharing mental health interventions in LMICs. These findings have important implications for ongoing and future implementation of this critically needed intervention strategy, including the promise of leveraging task-sharing intervention characteristics as sites of continued innovation, the importance of but relative lack of engagement with constructs in macro-level domains (e.g., organizational characteristics, stigma), and the need for more delineation of strategies for task-sharing mental health interventions that researchers and implementers can employ to enhance implementation in and across levels.

研究证据

资源类型

发表年份

收录类型

学科领域

国家

关键词分布