Patients Harm Classification and Prediction Through Natural Language Processing in a Cancer Medical Center
General Material Designation
[Thesis]
First Statement of Responsibility
Salazar Reyna, Roberto Jesus
Subsequent Statement of Responsibility
Khasawneh, Mohammad T.
.PUBLICATION, DISTRIBUTION, ETC
Name of Publisher, Distributor, etc.
State University of New York at Binghamton
Date of Publication, Distribution, etc.
2020
GENERAL NOTES
Text of Note
106 p.
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
M.Eng.
Body granting the degree
State University of New York at Binghamton
Text preceding or following the note
2020
SUMMARY OR ABSTRACT
Text of Note
This research addresses the problem of predicting adverse events and patient harm in cancer medical centers. Unintentional harm on patients has negative consequences for multiple parties, including patients and their families, healthcare providers, hospitals, and healthcare organizations. Therefore, identifying in advance patients that are at higher risk of suffering an adverse event is necessary to prevent harm, avoid unnecessary costs and provide safer care services. This can be achieved by employing natural language processing and machine learning algorithms to predict harm on patients based on structured and unstructured medical data. Most previous research works that addressed this problem have built and evaluated a single machine learning model using a single source of medical text. For this purpose, this research builds, evaluates and compares the performance of multiple machine learning models using three different sources of medical text (admission notes, final nursing assessment, and discharge notes) to determine the most reliable source to predict harm. Over 27,000 records were used to build the classification models. The experimental results suggested the final nursing assessment as the best source of medical text to predict harm. The best prediction model implemented 10 folds cross-validation, TF-IDF bag of words, chi-square feature selection, ADASYN over sampling technique, and a random forest classifier. This model outperformed the others by achieving 85% accuracy, 95% precision, 75% recall and 85% F1-scores. Results suggests that predicting adverse event and harm is possible through the use of natural language processing and machine learning algorithms.