Sanjay Santokee – The Stallions Chronicle

The Project Conclusion

We also made some modifications to the model by having the data undersampled and oversampled (with imbalance learn). The results are shown here: -> Random Undersampling Classifiers F1 Score Accuracy Score Precision Score Recall Score ROC AUC Score Random Guessing 0.501245 0.501367 0.501347 0.501149 0.501358 Logistic Regression 0.944553 0.945775 0.966213 0.923844 0.945772 Neural Network 0.957754Continue reading “The Project Conclusion”

The Network Optimization

Hooray!! We finally got reasonable results from evaluating the PyTorch Feedforward Neural Network. Classifiers F1 Score Accuracy Score Precision Score Recall Score ROC AUC Score Random Guessing 0.437825 0.500549 0.389329 0.500129 0.500473 Logistic Regression 0.931898 0.948762 0.964399 0.901518 0.940170 Neural Network 0.941330 0.955052 0.955112 0.928378 0.950201 Random Forest 0.995107 0.996177 0.990894 0.999355 0.996755 Gaussian NaiveContinue reading “The Network Optimization”

The Feature Re-engineering

After building the LSTM in PyTorch, we continued to acquire terrible accuracy and F1 scores. This made us rethink our method of predicting sepsis. We contacted our supervisor and decided to switch from an LSTM approach to using a Feedforward Neural Network instead. This would mean that we needed to re-create features. We took theContinue reading “The Feature Re-engineering”

The Initial Observation

After evaluating the LSTM model in Keras, it was seen that a low Mean Squared Error was given along with a terrible accuracy and F1 score. This meant that the time series data was either highly imbalanced or the model was not properly configured. We decided to try creating an LSTM model in PyTorch instead.

The Interpolation Experiment

After successfully merging the data files together, data interpolation had to be done. This meant that it was necessary to fill the missing values with values that were last obtained for a patient. This can be seen in the diagram below. This brought the data into a time series format. Model development was also doneContinue reading “The Interpolation Experiment”

The Merging Configuration

This week consisted of us generating Jupyter notebooks in order to efficiently merge and produce a combined data set with patient data. This involved us studying the best methods of joining tables and merging in pandas and by examining the data’s offset values. These offset values were all synced together in order to present timeContinue reading “The Merging Configuration”

The Documentation Formulation

In this week, we began the write up for documentation. This is a very lengthy process and will continue to go on for weeks to come. Mainly only the known sections were done and the document was laid out. Hopefully it isn’t hard to complete in the weeks to come.

The Lab Data Transmogrification

Data cleaning on lab and vital periodic tables were done this week. From the ‘lab.csv’ file, wbcx1000, lactate and creatinine values were all extracted from the ‘labname’ column per patient by making rows with data into columns. This was paired with the lab’s offset value to synchronize overlapping patient data. Within the ‘vitalPeriodic.csv’ file, temperature,Continue reading “The Lab Data Transmogrification”

The Intake-Output Extraction

In the continuation for the battle in acquiring urine output values, we loaded in the ‘intakeOutput.csv’ file. This served as a way of gathering the urine output values for only the patient IDs that we specified. More tests were done on the validity of all data cleaning processes. We are currently in the process onContinue reading “The Intake-Output Extraction”

The Patient Data Insufficiency

In the hunt for training data, we first began with grabbing all the patients that are only diagnosed with sepsis in order to shrink the data set, this allowed us to utilize the ‘diagnosis.csv’ file. This represented patients with sepsis as a ‘diagnosis : 1’ and patients without sepsisas ‘diagnosis : 0’. To begin ourContinue reading “The Patient Data Insufficiency”