Название | Biomedical Data Mining for Information Retrieval |
---|---|
Автор произведения | Группа авторов |
Жанр | Базы данных |
Серия | |
Издательство | Базы данных |
Год выпуска | 0 |
isbn | 9781119711261 |
In the course of recent decades, a few seriousness scoring frameworks and machine learning mortality prediction models have been developed [4]. Different traditional scoring techniques such as Acute Physiology and Chronic Health Evaluation (APACHE) [4], Simplified Acute Physiology Score (SAPS) [4], Sequential Organ Failure Assessment (SOFA) [4] and Mortality Probability Model (MPM) [4] and data mining techniques like Artificial Neural Network (ANN) [5], Support Vector Machine (SVM) [5], Decision Tree (DT) [5], Logistic Regression (LR) [5] have been used in the previous researches. Mortality prediction is still an open challenge in an Intensive Care Unit.
The objective of this chapter is to develop a model to predict whether a patient will survive in hospital or not in an ICU using different models such as Discriminate Analysis (DA), Decision Tree (DT), K-Nearest Neighbor (KNN), Naive Bayesian, Support Vector Machine (SVM) and Functional Link Artificial Neural Network (FLANN), a low complexity neural network and its comparison. The dataset have been collected from the PhysioNet Challenge 2012 [6] which consists of 4,000 records of patients admitted in ICU. There are 41 variables during first 48 h after the admission of patients to the ICU from which 5 variables indicate general descriptors—age, gender, height, ICU type and initial weight, 36 variables (time series) from which 15 variables (Temp, HR, Urine, pH, RespRate, GCS, FiO2, PaCO2, MAP, SysABP, DiasABP, NIMAP, NiDiasABP, MechVent, NISysABP) will be taken as input and 5 outcome descriptors—SAPS-1 score, SOFA score, length of stay in days (LOS), length of survival and in-hospital death (0 for survival and 1 for death in hospital) to predict the survival of patients.
The rest of the chapter is organized as follows: Section 1.2 describes the previous studies of mortality prediction, Material and methods are presented in Section 1.3 where data collection, data-preprocessing, model description is properly described. Section 1.4 presents the obtained results. Section 1.5 briefly discusses the work with conclusion and finally Section 1.6 gives the future work.
1.2 Review of Literature
Many researchers applied different models in PhysioNet Challenge 2012 dataset and obtained different accuracy results.
Silva et al. [7] have developed a method for the prediction of mortality in an in-hospital death (0 takes as survivor and 1 taken as died in hospital). They have collected the data from PhysioNet website and perform the challenges. Dataset consists of three sets: sets A, B and C. Each set has 4,000 records. The challenges are given in two events: event I for a binary classifier measurement performance and event II for a risk estimator measurement performance. For event I scoring criteria are evaluated by using sensitivity and positive predictive value and for event II Hosmer–Lemeshow statistic [8] is used. A baseline algorithm (SAPS-I) is used and obtained score of 0.3125 and 68.58 for events I and II respectively and final score they obtained for events I and II are 0.5353 and 17.58. In Ref. [9] Johnson et al. have described a novel Bayesian ensemble algorithm for mortality prediction. Artifacts and erroneous recordings are removed using data pre-processing. The model is trained using 4,000 records from training set for set A and also with two datasets B and C. Jack-knifing method is performed to estimate the performance of the model. The model has obtained values of 0.5310 and 0.5353 as score 1 on the hidden datasets. Hosmer– Lemeshow statistic has given 26.44 and 29.86 as score 2. The model has re-developed and obtained 0.5374 and 18.20 for scores 1 and 2 on dataset C. The overall performance of the proposed model gives better performance than traditional SAPS model which have some advantages such as missing data handling etc. An improved version of model to estimate the in hospital mortality in the ICU using 37 time series variables is presented in Ref. [10]. They have estimated the performance of various models by using 10-fold cross validation. In the clinical data, it is common to have missing values. These missing values are imputed by using the mean value for patient’s age and gender. A logistic regression model is used and trained using the dataset. The performance of model is evaluated by the two events: Event 1 for the accuracy using low sensitivity and positive predictive value and Event 2 for the Hosmer–Lemeshow H static model for calibration. Their model has resulted 0.516 and 14.4 scores for events 1 and 2 for test set B and 0.482 and 51.7 scores for both the event for test set C. The model performance is better than the existing SAPS model. Another model in Ref. [11] has developed an algorithm to predict the in-hospital death of ICU patients for the event 1 and probability estimation in event 2. Here the missing values are imputed by zero and the data is normalized. Six support vector machine (SVM) classifiers are used for training. For each SVM positive examples and one sixth of the negative examples have taken in the training set. The obtained scores for events 1 and 2 are 0.5345 and 17.88 respectively. An artificial neural network model has developed for the prediction of in-hospital death patients in the ICU under the 48 h observations from the admission [12]. Missing values are handled using an artificial value based on assumption. From all feature sets, 26 features are selected for further process. For classification, two layered neural network having 15 neurons in the hidden layers is used. The model has used 100 voting classifiers and the output it produced is the average of 100 outputs. The mode is trained and tested using 5-fold cross validation. Fuzzy threshold is used to determine the output of the neural network. The model is resulted 0.5088 score for event 1 and 82.211 score for event 2 on the test data set. Ref. [13] has presented an approach that identify time series motifs to predict ICU patients in an in-hospital segmenting the variables into low, high and medium measurements. The method has outperformed the existing scoring systems, SAPS-II, APACHE-II and SOFA and obtained 0.46 score for event 1 and 56.45 score for event 2. An improved mortality prediction using logistic regression and Hidden Markov model has developed for an in-hospital death in Ref. [14]. The model is trained using 4,000 records of patients on set A and validation on other sets of unseen data of 4,000 records. Two different events: event 1 for minimum sensitivity and positive predictive value and for event 2 Hosmer–Lemeshow H statistic is used. The model has given 0.50, 0.50 for event 1 and 15.18, 78.9 for event 2 compared to SAPS-I whose event 1 scores are 0.3170, 0.312 and for event 2 66.03 and 68.58 respectively. An effective framework model for predicting in- hospital death mortality in the ICU stay has been suggested in Ref. [15]. Feature extraction is done by data interpolation and Histogram analysis. To reduce the complexity of feature extraction, it reduces the feature vector by evaluating measurement value of each variable. Then finally Cascaded Adaboost learning model is applied as mortality classifier and obtained the 0.806 score for event 1 and 24.00 score for event 2 on dataset A. On another dataset B the model has obtained 0.379 and 5331.15 score for both events 1 and 2.