Lesson 5 Practical Deep Learning for Coders 2022 | Notion

Speaker 1: Jeremy Howard Speaker 2: John

Concise Summary

0:00 | Introduction to Lesson 5 0:39 | Building Tabular Models from Scratch 1:16 | Titanic Problem Revisited 1:58 | Linear Model and Neural Net from Scratch Notebook 2:32 | Jupyter Notebook Environment 3:26 | Paper Space Gradient 4:30 | Clean Version of Notebook 5:27 | Kaggle Environment Check 6:19 | Installing Kaggle Module 7:25 | Reading Data with Pandas 8:10 | Handling Missing Values 9:30 | Imputing Missing Values with Mode 10:45 | Pandas Methods and Google Search 12:35 | Filling Missing Values with fillna 13:14 | Simple Imputation Methods 13:57 | Question on Imputation Assumptions 14:16 | Importance of Data and Feature Engineering 15:10 | Data Exploration with describe 16:01 | Histograms and Long Tail Distributions 16:51 | Log Transformation for Long Tail Distributions 18:12 | Categorical Variables and Dummy Variables 19:26 | Creating Dummy Variables with get_dummies 20:33 | Drop First Argument in get_dummies 21:15 | Ignoring Name Column for Now 21:50 | Feature Engineering with Name Column 23:13 | Matrix Multiplication and Element-wise Multiplication 23:33 | PyTorch for Linear Models 24:07 | Tensors and Broadcasting 24:54 | Independent and Dependent Variables 25:58 | Tensor Shape and Rank 26:56 | Multiplying Coefficients by Data 27:14 | Random Coefficients Initialization 27:59 | Setting Random Seed 28:58 | Reproducibility and Understanding Data Variation 29:45 | Matrix-Vector Product and Broadcasting 31:44 | Benefits of Broadcasting 32:36 | NumPy Broadcasting Rules 33:41 | Virtual Copying in Broadcasting 33:57 | Adding Coefficients Together 34:34 | Normalizing Data for Optimization 35:42 | PyTorch Dimensions and Axes 36:48 | Normalizing Data with Maximum or Standard Deviation 37:15 | Adding Coefficients and Predictions 37:42 | Gradient Descent and Loss Function 38:00 | Mean Absolute Value Loss Function 38:37 | Creating Functions for Calculations 39:52 | Calculating Derivatives with requiresGrad 40:42 | Backward Gradient Function and .grad Attribute 41:09 | Updating Coefficients with Gradient Descent 41:43 | Learning Rate and Loss Reduction 42:05 | Training a Linear Model 42:18 | Splitting Data into Training and Validation Sets 43:04 | Fast.ai for Data Splitting 43:49 | Creating Functions for Training Steps 44:37 | Training the Model with train_model Function 45:05 | Real World Data Sets and Kaggle 45:29 | Examining Coefficients 46:10 | Accuracy Metric for Kaggle Competition 46:47 | Calculating Accuracy 47:11 | Creating an Accuracy Function 47:37 | Code Comments and Notebooks 48:17 | Sigmoid Function for Binary Dependent Variables 49:04 | Sigmoid Function for Squishing Values 50:13 | SymPy for Symbolic Calculations 51:00 | Dynamic Language and Redefining Functions 51:58 | Improved Optimization with Sigmoid 52:32 | Importance of Sigmoid for Binary Dependent Variables 53:14 | Neural Net Architecture Details 53:51 | Sigmoid Function and Fast.ai 54:18 | Question on get_dummies and Test Data 54:48 | Handling New Categories in Test Data 55:41 | Categorical Variables with Many Levels 56:03 | Submitting to Kaggle 56:23 | Test.csv and Data Consistency 57:06 | Preprocessing Test Data 57:42 | Kaggle Submission and Results 58:08 | Break 58:25 | Welcome Back 58:47 | Matrix Multiplication in PyTorch 59:10 | Matrix Multiply Operator in Python 1:00:01 | Matrix Multiplication for Neural Networks 1:00:24 | Changing init_coefs to Create a Matrix 1:01:10 | Dependent Variable as a Matrix 1:01:26 | Adding a Trailing Dimension with None 1:02:28 | Unit Axis and Matrix Shape 1:03:16 | Expanding to Neural Networks 1:03:32 | Neural Network with Multiple Coefficients 1:04:20 | Initializing Coefficients for Hidden Layers 1:04:59 | Fiddling with Constants and Learning Rate 1:05:41 | Activations and Matrix Multiplication 1:06:02 | Coefficients for Hidden to Output Layer 1:06:41 | Constant Term for Second Layer 1:07:04 | Initializing Coefficients for Neural Network 1:07:11 | Calculating Predictions with calc_preds 1:08:02 | Neural Network Implementation 1:08:17 | Updating Coefficients for Multiple Layers 1:08:28 | Training the Neural Network 1:09:02 | Comparing Linear Model and Neural Network 1:09:23 | Deep Learning with Multiple Hidden Layers 1:09:32 | Initializing Coefficients for Multiple Layers 1:10:10 | Matrix Multiplication for Multiple Layers 1:10:56 | Activation Functions and Final Layer 1:11:42 | Importance of Final Activation Function 1:11:46 | Deep Learning calc_preds Function 1:11:55 | Updating Coefficients for Multiple Layers 1:12:16 | Experimenting with Code in Notebooks 1:13:00 | Understanding Code through Experimentation 1:13:32 | Deep Learning and Small Data Sets 1:14:19 | Deep Learning for Images and Text 1:14:45 | Feature Engineering for Tabular Data 1:15:17 | Importance of Simple Baselines 1:15:30 | Why You Shouldn’t Build from Scratch 1:16:17 | Framework Notebook for Tabular Data 1:16:44 | Feature Engineering with Fast.ai 1:17:05 | Advanced Feature Engineering Tutorial 1:17:44 | Feature Engineering Function 1:18:00 | Pandas Functions and Tutorials 1:18:31 | Fast.ai Tabular Model Data Set 1:19:05 | Preprocessing with Fast.ai 1:19:39 | Creating a Learner 1:19:57 | Learning Rate Finder 1:20:53 | Choosing a Learning Rate 1:21:16 | Training the Model with Fast.ai 1:21:35 | Submitting to Kaggle with Fast.ai 1:21:53 | Testdl Function for Inference Time Preprocessing 1:22:23 | Getting Predictions with Fast.ai 1:23:18 | Experimenting with Ensembling 1:23:27 | Ensembling for Improved Predictions 1:24:04 | Ensemble Function for Multiple Models 1:24:28 | Combining Predictions with Mean 1:24:47 | Kaggle Submission with Ensemble 1:25:22 | Question on Ensemble Mode vs. Mean 1:25:31 | Different Averaging Methods for Ensembles 1:26:40 | Random Forests Notebook 1:27:00 | Introduction to Random Forests 1:27:36 | Elegance and Resilience of Random Forests 1:27:51 | Logistic Regression vs. Random Forests 1:28:27 | Difficulty of Implementing Logistic Regression 1:28:52 | Importing from fastai.imports 1:29:03 | Preprocessing Data for Random Forests 1:29:33 | Converting Categorical Variables to Codes 1:30:30 | Why Categorical Codes are Helpful 1:31:00 | Random Forests as Ensembles of Trees 1:31:14 | Binary Splits in Decision Trees 1:31:26 | Example of Binary Split with Sex 1:32:33 | Evaluating a Binary Split Model 1:32:46 | Splitting into Training and Test Sets 1:33:19 | Creating Independent and Dependent Variables 1:33:26 | Making Predictions with Binary Split 1:33:45 | Using scikit-learn for Mean Absolute Error 1:34:07 | Example of Binary Split with Fair 1:34:57 | Kernel Density Plot for Continuous Variables 1:35:43 | Interactive Tool for Binary Split Scoring 1:36:05 | Scoring Binary Splits 1:37:03 | Standard Deviation and Group Size 1:37:46 | Total Score for Binary Split 1:38:01 | Scoring Binary Splits with Interact 1:39:02 | Finding the Best Binary Split Automatically 1:39:23 | Finding the Best Split Point for Age 1:39:50 | Argmin Function for Finding Minimum Score 1:40:35 | Function for Calculating Best Split Point 1:40:58 | Finding the Best Split Point for All Columns 1:41:14 | Best Binary Split for Titanic Data 1:41:28 | Decision Trees and Random Forests 1:41:50 | 1R Model and its Effectiveness 1:42:16 | Importance of Simple Baselines 1:42:29 | Kaggle Sample Submission with 1R 1:42:40 | Conclusion and Next Lesson