Only Just Getting Started!

Finally finishing up the project. The extra week was sure needed to iron out all the kinks and make sure the documentation was completed properly. Also needed the time to make the presentation sucks we couldn’t do an actual presentation.

But even though this is the end of the project course this is only the start. I have learned so much from this project. Some of the things I learned late on can be applied to this very project to improve it. Would have been great to implement it all now but alas there are time limitation and prepping the data takes way too long to be experimental when we had a timeline.

We plan to continue working on the project once the semester is done to get the time series biased sorted and also get a Neural ODE model running. Can’t wait to return to this project later in the summer when I no longer would have other courses and no time restrictions. I can relax and enjoy the research like I hoped I would have during the project course.

Look forward for the next update! But until then stay safe and stay indoors!

Kampai!!! The End?

After the long and tiring journey, our documentation and presentations are finally completed, the different comparisons on the final results using different datasets, gave us even more insights to the data and some ideas on what we can do to improve or to even test some hypothesis on sepsis prediction, but that’s a story for another day.

It was a great journey and for the moment it has come to an end. When our sanity returns we shall return to the project to improve and test some theories!

~~~Thank You for following our story~~~

The Project Conclusion

We also made some modifications to the model by having the data undersampled and oversampled (with imbalance learn).

The results are shown here:

-> Random Undersampling

ClassifiersF1 ScoreAccuracy ScorePrecision ScoreRecall ScoreROC AUC Score
Random Guessing0.5012450.5013670.5013470.5011490.501358
Logistic Regression0.9445530.9457750.9662130.9238440.945772
Neural Network0.9577540.9579910.9630630.9526950.958023
Random Forest0.9953820.9953620.991060.9997420.995364
Gaussian Naive Bayes0.870510.8772770.9208740.8255970.877288
Random Under sampling

-> Oversampling with Imbalance Learn

ClassifiersF1 ScoreAccuracy ScorePrecision ScoreRecall ScoreROC AUC Score
Random Guessing0.4982420.5003630.4979330.4985520.500346
Logistic Regression0.9072050.9105140.937170.8791020.910364
Neural Network0.9289820.9291310.9263310.9322710.929096
Random Forest0.9949490.9949480.9900140.9999340.994972
Gaussian Naive Bayes0.8001690.8201460.8935980.725210.819687
Over sampling with Imbalance Learn

Thank you for reading our journey from start to end of this project.

Finally Good News!

The new model works! It is great it still needs some tweaking but it is producing promising results. The scores are also better than the competition. We decided to compare it against different classifiers especially the ones the competition used in their research. Hopefully when they are tested they also perform well.

We have also been doing some of the documentation. That has been going smoothly. From the looks of it the data cleaning process we utilized seems to have made it easier for the models to classify if a patient has sepsis or not. However, we still need to test it by balancing the data since there are still more patients without sepsis that those with. Almost double actually.

We have decided to randomly under-sample the data and also look into methods from the imbalanced learn library to potentially over sample patients with sepsis.

The Network Optimization

Hooray!! We finally got reasonable results from evaluating the PyTorch Feedforward Neural Network.

ClassifiersF1 ScoreAccuracy ScorePrecision ScoreRecall ScoreROC AUC Score
Random Guessing0.4378250.5005490.3893290.5001290.500473
Logistic Regression0.9318980.9487620.9643990.9015180.940170
Neural Network0.9413300.9550520.9551120.9283780.950201
Random Forest0.9951070.9961770.9908940.9993550.996755
Gaussian Naive Bayes0.8626260.8935300.8654890.8603050.887457

This is really good for us.

We finished up by adjusting hyperparameters.

Documentation and the final presentation were also touched up and results were added in.

Abandon Ship

We have decided to course correct with the guidance of our supervisor since we wouldn’t have the time to properly adjust an LSTM model with the time we have left. Especially checking we need to do a full write and and video presentation for the project.

We have decided to put the LSTM on hold as future work along with the Neural ODE. Definitely returning to conquer that hurdle once the semester is done.

We are instead using a feedforward model. This would require is to compress our time series data into single rows for each patient. Which wasn’t that hard to do we have started to write the code and that is going smoothly.

For each feature we will expand them to the following:

  • mean
  • minimum
  • maximum
  • median
  • kurtosis value
  • skewness
  • and standard deviation

Hopefully this model will perform well and produce better results than our competition when trying to classifying sepsis.

We have also started to draft some of the write up so that things aren’t too rushed later on. Since it clear this is gonna be a close one.

The Feature Re-engineering

After building the LSTM in PyTorch, we continued to acquire terrible accuracy and F1 scores. This made us rethink our method of predicting sepsis.

We contacted our supervisor and decided to switch from an LSTM approach to using a Feedforward Neural Network instead.

Feedforward Neural Network

This would mean that we needed to re-create features.

We took the time series data and made the multiple rows of data per patient into one row per patient.

Bad News :(

We processed all of the full data into proper time series on the machine we got access it was so great seeing the code run. Took a good while though. The interpolate code took nine hours! I thought it would never end. But there is bad news….

The model didn’t produce good results. The f1 score is 0 which probably means the model is overfitting to the large amount of diagnosis of not sepsis since the accuracy score is close to 100%. It’s a bit depressing to be honest.

We also developed a model in pytorch that produced similar results and it take very long for the model to run so it’s not easy to tweak the model. Especially when there isn’t so much time left.

I also have a ton of other assignments to replace my course work exams since COVID 19 had to go ruin my final semester. This is overall further taking away from time I thought I would be able to spend on the project.

Hopefully things get better.

Design a site like this with WordPress.com
Get started