#!/usr/bin/env python # coding: utf-8 # # What is Machine Learning, and how does it work? ([video #1](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1)) # # Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos). # ![Machine Learning](images/01_robot.png) # ## Agenda # # - What is Machine Learning? # - What are the two main categories of Machine Learning? # - What are some examples of Machine Learning? # - How does Machine Learning "work"? # ## What is Machine Learning? # # One definition: "Machine Learning is the semi-automated extraction of knowledge from data" # # - **Knowledge from data**: Starts with a question that might be answerable using data # - **Automated extraction**: A computer provides the insight # - **Semi-automated**: Requires many smart decisions by a human # ## What are the two main categories of Machine Learning? # # **Supervised learning**: Making predictions using data # # - Example: Is a given email "spam" or "ham"? # - There is an outcome we are trying to predict # ![Spam filter](images/01_spam_filter.png) # **Unsupervised learning**: Extracting structure from data # # - Example: Segment grocery store shoppers into clusters that exhibit similar behaviors # - There is no "right answer" # ![Clustering](images/01_clustering.png) # ## How does Machine Learning "work"? # # High-level steps of supervised learning: # # 1. First, train a **Machine Learning model** using **labeled data** # # - "Labeled data" has been labeled with the outcome # - "Machine Learning model" learns the relationship between the attributes of the data and its outcome # # 2. Then, make **predictions** on **new data** for which the label is unknown # ![Supervised learning](images/01_supervised_learning.png) # The primary goal of supervised learning is to build a model that "generalizes": It accurately predicts the **future** rather than the **past**! # ## Questions about Machine Learning # # - How do I choose **which attributes** of my data to include in the model? # - How do I choose **which model** to use? # - How do I **optimize** this model for best performance? # - How do I ensure that I'm building a model that will **generalize** to unseen data? # - Can I **estimate** how well my model is likely to perform on unseen data? # ## Resources # # - Book: [An Introduction to Statistical Learning](https://www.statlearning.com/) (section 2.1, 14 pages) # - Video: [Learning Paradigms](https://www.youtube.com/watch?v=mbyG85GZ0PI&t=2162s) (13 minutes, starting at 36:02) # ## Comments or Questions? # # - Email: # - Website: https://www.dataschool.io # - Twitter: [@justmarkham](https://twitter.com/justmarkham) # # © 2021 [Data School](https://www.dataschool.io). All rights reserved.