Data Scientist A

PHASE VI: Deployment¶

Plan Deployment¶

Takes the evaluation results and determines a strategy for deployment.

Once leadership has approved the models that the data science department has built, we will work with IT to begin the process of integrating the models in the ACME healthcare processing pipeline. We will insure that the preprocessing steps we have developed is put into place.

All modeling files are stored on the share drive.

Plan Monitoring and Maintenance¶

Monitor and Maintenance Plan

Since we cannot assume that the correct target feature value for a query instance will be made available shortly after the query has been presented to a deployed model, we will use the stability index. The stability index is an alternative to using the changing model performance by monitoring changes in the distribution of model output as a signal of concept drift.

In order to compare distribution, we measure the distribution of model outputs on the test set that was used to orginally evaluate a model and hte repeat this measurment on new sets of query instances collected during periods after the model has been deployed.

One of the most commonly used measures for this is the stability index. The stability index is calculate as:

$$stability\space index = \sum_{l\in levels(t)}^{}((\frac{\left |A_{t=l}\right |}{\left | A \right |}-\frac{\left | B_{t=l} \right |}{\left | B \right |})\times log_{e}(\frac{\left | A_{t=l} \right |}{\left | A \right |} / \frac{\left | B_{t=l} \right |}{\left | B \right |}))$$

where $$\left | A \right |$$ refers to the size of the test set on which performance measures were orginally calculated,

$$\left | A_{t=l} \right |$$

refers to the number of instances in the original test set for which the model made a prediction of level $$l$$ for target $$t$$ $$\left | B \right |$$ and $$\left | B_{t=l} \right |$$ refer to the same measurements on the newly collected dataset $$log_{e}$$ is the natural logarithm.

Other important points:
- If the value of the stability index is less than 0.1, then the distribution of the newly collected test set is broadly similar to the distribution oin the original test set.
- If the value of hte stability index is between 0.1 and 0.25, then some change has occured and further investigation may be useful.
- A stability index value greather than 0.25 suggests that a significant change has occured and corrective action is required.

We recommend an alert system be put in place and work with the above threshold so that an individual from the data science department could consider retraining the model.

Produce Final Report¶

Final report & Presentation We have archived a final comprehensive presentation of the data science results for this projct located on *S:\Data Science Projects\Type II Diabetis*. This report includes all previous deliverables and summarizes the results.

Review Project¶

Experience documentation
- We gained more understanding of what it is like to coordinate with the IT department when they have ongoing engagements. We assumed they would be able to assist us in the data integration of all the various tables. However, they had prior engagements that pushed our request back 30-days. In the future, we will know what is required of them and confirm their availability.
- We only spent 5% of our time on feature engineering and feature selection combined - this was not sufficient. We would like to make modifications to our workflow to be able to spending 10% of our time on feature engineering and another 10% of our time on feature selection. Beyond the preprocessing of data we felt that this data science project really requires more attention to these areas.

Data Scientist B

PHASE VI: Deployment¶

Plan Deployment¶

Takes the evaluation results and determines a strategy for deployment.

Deployment plan
- Summarize the deployment strategy, including the necessary steps and how to perform them:

Plan Monitoring and Maintenance¶

Develop a careful monitoring and maintenance strategy.

Monitoring and maintenance plan
- Summarize the monitoring and maintenance strategy, including the necessary steps and how to perform them.

Produce Final Report¶

This could be a final comprehensive presentatoin of the data science results or a summary of the project and its experiences.

Final report
- A written report of the data science engagement. It includes all of the previous deliverables, summarizing and organizing the results:
Final presentation
- Results presented to the client:

Review Project¶

Assess what went right and what went wrong, what was done well and what needs to be improved.

Experience documentation
- Summarize important experience gained during the project (pitfalls, misleading approaches, or hints for selecting the data science techniques in similar situations could be part of this documentation):

Finding Type II Diabetes In A Patient Population¶

Excercise 6¶

PHASE VI: Deployment¶

Plan Deployment¶

Plan Monitoring and Maintenance¶

Produce Final Report¶

Review Project¶

PHASE VI: Deployment¶

Plan Deployment¶

Plan Monitoring and Maintenance¶

Produce Final Report¶

Review Project¶