Prioritizing Hospital Incident Reports Through Text Classification
General Material Designation
[Thesis]
First Statement of Responsibility
Flood, Max
Subsequent Statement of Responsibility
Madathil, Sreenath C.
.PUBLICATION, DISTRIBUTION, ETC
Name of Publisher, Distributor, etc.
State University of New York at Binghamton
Date of Publication, Distribution, etc.
2019
GENERAL NOTES
Text of Note
99 p.
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
M.S.
Body granting the degree
State University of New York at Binghamton
Text preceding or following the note
2019
SUMMARY OR ABSTRACT
Text of Note
This research explores the potential for natural language processing and binary text classification to identify and prioritize medical incident reports. The study addresses the concerns of how to efficiently process incident reports while detecting cases that yield potential for patient safety initiatives. Over 30,000 incident reports were used to build a prediction model in order to classify each incident as requiring a formal investigation or not. The prediction model implemented 10 folds cross-validation, TF-IDF bag of words, chi-squared feature selection, random under sampling, and multinomial naïve Bayes to avoid overfitting due to the data being heavily imbalanced. The model was optimized towards recall scores rather than precision or accuracy. Classification results were: accuracy = 69%, precision = 65%, recall = 84%, roc = 78%. Results showed that with basic techniques it is possible to apply text classification to medical incident reports.