Pancreatic Cysts Identification Using Unstructured Information Management Architecture

Date
2013-04-05
Language
American English
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Office of the Vice Chancellor for Research
Abstract

Pancreatic cancer is one of the deadliest cancers, mostly diagnosed at late stages. Patients with pancreatic cysts are at higher risk of developing cancer and surveillance of these patients can help with early diagnosis. Much information about pancreatic cysts can be found in free text format in various medical narratives. In this retrospective study, a corpus of 1064 records from 44 patients at Indiana University Hospital from 1990 to 2012 was collected. A natural language processing system was developed and used to identify patients with pancreatic cysts. The input goes through series of tasks within the Unstructured Information Management Architecture (UIMA) framework consisting of report separation, metadata detection, sentence detection, concept annotation and writing into the database. Metadata such as medical record number (MRN), report id, report name, report date, report body were extracted from each report. Sentences were detected and concepts within each sentence were extracted using regular expression. Regular expression is a pattern of characters matching specific string of text. Our medical team assembled concepts that are used to identify pancreatic cysts in medical reports and additional keywords were added by searching through literature and Unified Medical Language System (UMLS) knowledge base. The Negex Algorithm was used to find out negation status of concepts. The 1064 reports were divided into sets of train and test sets. Two pancreatic-cyst surgeons created the gold standard data (Inter annotator agreement K=88%). The training set was analyzed to modify the regular expression. The concept identification using the NegEx algorithm resulted in precision and recall of 98.9% and 89% respectively. In order to improve the performance of negation detection, Stanford Dependency parser (SDP) was used. SDP finds out how words are related to each other in a sentence. SDP based negation algorithm improved the recall to 95.7%.

Description
poster abstract
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Mehrabi, Saeed, C. Max Schmidt, Joshua A. Waters, Chris Beesley, Anand Krishnan, Joe Kesterson, Paul Dexter, Mohammed A. A. Al-Haddad, and Mathew Palakal. (2013, April 5). Pancreatic Cysts Identification Using Unstructured Information Management Architecture. Poster session presented at IUPUI Research Day 2013, Indianapolis, Indiana.
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Rights
Source
Alternative Title
Type
Poster
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}