Criteria2Query: a natural language interface to clinical databases for cohort definition

被引：69

作者：

Yuan, Chi ^{[1
,2
]}

Ryan, Patrick B. ^{[1
,3
]}

Ta, Casey ^{[1
]}

Guo, Yixuan ^{[1
]}

Li, Ziran ^{[1
]}

Hardin, Jill ^{[3
]}

Makadia, Rupa ^{[3
]}

Jin, Peng ^{[1
]}

Shang, Ning ^{[1
]}

Kang, Tian ^{[1
]}

Weng, Chunhua ^{[1
]}

机构：

[1] Columbia Univ, Dept Biomed Informat, 622 West 168th St,PH-20,Room 407, New York, NY 10032 USA

[2] Nanjing Univ Sci & Technol, Dept Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China

[3] Janssen Res & Dev, Epidemiol Analyt, Titusville, NJ USA

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2019年 / 26卷 / 04期

关键词：

cohort definition; natural language processing; natural language interfaces to database; common data model; ELIGIBILITY CRITERIA; REPRESENTATION; EXTRACTION; SYSTEM;

D O I：

10.1093/jamia/ocy178

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort definition is appealing but requires substantial knowledge of terminologies and clinical data models. Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases. Materials and Methods Criteria2Query uses a hybrid information extraction pipeline combining machine learning and rule-based methods to systematically parse eligibility criteria text, transforms it first into a structured criteria representation and next into sharable and executable clinical data queries represented as SQL queries conforming to the OMOP Common Data Model. Users can interactively review, refine, and execute queries in the ATLAS web application. To test effectiveness, we evaluated 125 criteria across different disease domains from ClinicalTrials.gov and 52 user-entered criteria. We evaluated F1 score and accuracy against 2 domain experts and calculated the average computation time for fully automated query formulation. We conducted an anonymous survey evaluating usability. Results Criteria2Query achieved 0.795 and 0.805 F1 score for entity recognition and relation extraction, respectively. Accuracies for negation detection, logic detection, entity normalization, and attribute normalization were 0.984, 0.864, 0.514 and 0.793, respectively. Fully automatic query formulation took 1.22 seconds/criterion. More than 80% (11+ of 13) of users would use Criteria2Query in their future cohort definition tasks. Conclusions We contribute a novel natural language interface to clinical databases. It is open source and supports fully automated and interactive modes for autonomous data-driven cohort definition by researchers with minimal human effort. We demonstrate its promising user friendliness and usability.

引用

页码：294 / 305

页数：12

共 50 条

[1] Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation
Park, Jimyung
Fang, Yilu
Ta, Casey
Zhang, Gongbo
Idnay, Betina
Chen, Fangyi
Feng, David
Shyu, Rebecca
Gordon, Emily R.
Spotnitz, Matthew
Weng, Chunhua
[J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 154
[2] Query builder:: A natural language interface for structured databases
Little, J
de Ga, M
Özyer, T
Alhajj, R
[J]. COMPUTER AND INFORMATION SCIENCES - ISCIS 2004, PROCEEDINGS, 2004, 3280 : 470 - 479
[3] Toward a Cooperative Natural Language Query Interface for Biological Databases
Jamil, Hasan M.
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 556 - 561
[4] A natural language interface plug-in for cooperative query answering in biological databases
Hasan M Jamil
[J]. BMC Genomics, 13
[5] How to make a natural language interface to query databases accessible to everyone: An example
Llopis, Miguel
Ferrandez, Antonio
[J]. COMPUTER STANDARDS & INTERFACES, 2013, 35 (05) : 470 - 481
[6] A natural language interface plug-in for cooperative query answering in biological databases
Jamil, Hasan M.
[J]. BMC GENOMICS, 2012, 13
[7] COACT: a query interface language for collaborative databases
Mershad, Khaleel
Malluhi, Qutaibah M.
Ouzzani, Mourad
Tang, Mingjie
Gribskov, Michael
Aref, Walid G.
Prakash, Deo
[J]. DISTRIBUTED AND PARALLEL DATABASES, 2018, 36 (01) : 121 - 151
[8] COACT: a query interface language for collaborative databases
Khaleel Mershad
Qutaibah M. Malluhi
Mourad Ouzzani
Mingjie Tang
Michael Gribskov
Walid G. Aref
Deo Prakash
[J]. Distributed and Parallel Databases, 2018, 36 : 121 - 151
[9] Interfaces to Query Relational Databases in Natural Language
Singh, Harjit
[J]. IT PROFESSIONAL, 2019, 21 (01) : 67 - 73
[10] CNL-RDF-Query: A controlled natural language interface for querying ontologies and relational databases
Henarejos-Blasco, Jose
Antonio Garcia-Diaz, Jose
Apolinario-Arzube, Oscar
Valencia-Garcia, Rafael
[J]. PROCEEDINGS OF THE 10TH EURO-AMERICAN CONFERENCE ON TELEMATICS AND INFORMATION SYSTEMS (EATIS 2020), 2020,

← 1 2 3 4 5 →