The Patient Data Insufficiency

In the hunt for training data, we first began with grabbing all the patients that are only diagnosed with sepsis in order to shrink the data set, this allowed us to utilize the ‘diagnosis.csv’ file.

This represented patients with sepsis as a ‘diagnosis : 1’ and patients without sepsis
as ‘diagnosis : 0’.

To begin our data cleaning process, we handled the patient data set first. Due to the urine output for a patient is in millilitres (ml), it needed to be converted to millilitres per kilogram (ml/kg) in order to work with sepsis definitions. This meant that the patient’s admission weight is necessary.

This was found in the ‘patient.csv’ file.

Missing values for a patient’s admission weight was calculated by filling nulls of ages and appropriating the gender attribute by making all values consistent.

Admission weight values were filled by average weight and gender per patient.

Leave a comment

Design a site like this with WordPress.com
Get started