Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data

Date
2017-05
Language
English
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Elsevier
Abstract

Objectives

Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and “off the shelf” tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries.

Materials and methods

We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve.

Results

Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model.

Conclusion

Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing “off the shelf” approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Kasthurirathne, S. N., Dixon, B. E., Gichoya, J., Xu, H., Xia, Y., Mamlin, B., & Grannis, S. J. (2017). Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data. Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2017.04.008
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Journal of Biomedical Informatics
Rights
Publisher Policy
Source
Author
Alternative Title
Type
Article
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Author's manuscript
Full Text Available at
This item is under embargo {{howLong}}