Latent Code Identification (LACOID): A Machine Learning-Based Integrative Framework [and Open-Source Software] to Classify Big Textual Data, Rebuild Contextualized/Unaltered Meanings, and Avoid Aggregation Bias

Canche, MSG (通讯作者),Univ Penn, 208 South 37th St,Room 207, Philadelphia, PA 19104 USA.
2023-12
Labeling or classifying textual data and qualitative evidence is an expensive and consequential challenge. The rigor and consistency behind the construction of these labels ultimately shape research findings and conclusions. A multifaceted methodological conundrum to address this challenge is the need for human reasoning for classification that leads to deeper and more nuanced understandings; however, this same manual human classification comes with the well-documented increase in classification inconsistencies and errors, particularly when dealing with vast amounts of documents and teams of coders. An alternative to human coding consists of machine learning-assisted techniques. These data science and visualization techniques offer tools for data classification that are cost-effective and consistent but are prone to losing participants' meanings or voices for two main reasons: (a) these classifications typically aggregate all texts configuring each input file (i.e., each interview transcript) into a single topic or code and (b) these words configuring texts are analyzed outside of their original contexts. To address this challenge and analytic conundrum, we present an analytic framework and software tool, that addresses the following question: How to classify vast amounts of qualitative evidence effectively and efficiently without losing context or the original voices of our research participants and while leveraging the nuances that human reasoning bring to the qualitative and mixed methods analytic tables? This framework mirrors the line-by-line coding employed in human/manual code identification but relying on machine learning to classify texts in minutes rather than months. The resulting outputs provide complete transparency of the classification process and aid to recreate the contextualized, original, and unaltered meanings embedded in the input documents, as provided by our participants. We offer access to the database (Gonzalez Canche, 2022e) and software required (Gonzalez Canche, 2022a, Mac , and Windows ) to replicate the analyses. We hope this opportunity to become familiar with the analytic framework and software, may result in expanded access of data science tools to analyze qualitative evidence (see also Gonzalez Canche 2022b, 2022c, 2022d, for related no-code data science applications to classify and analyze qualitative and textual data dynamically).
INTERNATIONAL JOURNAL OF QUALITATIVE METHODS
卷号:22
ISSN:1609-4069|收录类别:SSCI
语种
英语
来源机构
University of Pennsylvania
资助信息
The author(s) disclosed receipt of the following financial support forthe research, authorship, and/or publication of this article: This research received financial support from Spencer Foundation, National Academy of Education, and SAGE OCEAN; Concept Grant
被引频次(WOS)
0
被引频次(其他)
0
180天使用计数
3
2013以来使用计数
3
出版年
2023-12
DOI
10.1177/16094069221144940
WOS学科分类
Social Sciences, Interdisciplinary
学科领域
循证社会科学-综合
关键词
methods in qualitative inquiry mixed methods secondary data analysis dimensional analysis discourse analysis narrative analysis performance based methods philosophy of science qualitative evaluation qualitative meta-analysis/synthesis
资助机构
Spencer Foundation National Academy of Education SAGE OCEAN