Given a year of EHR data for patients without Diabetes, we predict which patients will be diagnosed with Diabetes in the next year.
Data Scientist A
Takes the evaluation results and determines a strategy for deployment.
Once leadership has approved the models that the data science department has built, we will work with IT to begin the process of integrating the models in the ACME healthcare processing pipeline. We will insure that the preprocessing steps we have developed is put into place.
All modeling files are stored on the share drive.
Monitor and Maintenance Plan
Since we cannot assume that the correct target feature value for a query instance will be made available shortly after the query has been presented to a deployed model, we will use the stability index. The stability index is an alternative to using the changing model performance by monitoring changes in the distribution of model output as a signal of concept drift.
In order to compare distribution, we measure the distribution of model outputs on the test set that was used to orginally evaluate a model and hte repeat this measurment on new sets of query instances collected during periods after the model has been deployed.
One of the most commonly used measures for this is the stability index. The stability index is calculate as:
$$stability\space index = \sum_{l\in levels(t)}^{}((\frac{\left |A_{t=l}\right |}{\left | A \right |}-\frac{\left | B_{t=l} \right |}{\left | B \right |})\times log_{e}(\frac{\left | A_{t=l} \right |}{\left | A \right |} / \frac{\left | B_{t=l} \right |}{\left | B \right |}))$$where $$\left | A \right |$$ refers to the size of the test set on which performance measures were orginally calculated,
$$\left | A_{t=l} \right |$$refers to the number of instances in the original test set for which the model made a prediction of level $$l$$ for target $$t$$ $$\left | B \right |$$ and $$\left | B_{t=l} \right |$$ refer to the same measurements on the newly collected dataset $$log_{e}$$ is the natural logarithm.
We recommend an alert system be put in place and work with the above threshold so that an individual from the data science department could consider retraining the model.
Final report & Presentation We have archived a final comprehensive presentation of the data science results for this projct located on *S:\Data Science Projects\Type II Diabetis*. This report includes all previous deliverables and summarizes the results.
===================================================================================================
Data Scientist B
Takes the evaluation results and determines a strategy for deployment.
Develop a careful monitoring and maintenance strategy.
This could be a final comprehensive presentatoin of the data science results or a summary of the project and its experiences.
Final report
Final presentation
Assess what went right and what went wrong, what was done well and what needs to be improved.