Lesson 8 - Practical Deep Learning for Coders 2022

Speaker 1: Jeremy Howard

Concise Summary

0:03 | Welcome to the last lesson of Part 1 1:09 | Collaborative Filtering Notebook 1:24 | Creating your own embedding module 1:38 | Importance of understanding the 0.5 linear model and neural net from scratch notebook 2:16 | PyTorch’s handling of parameters 3:06 | PyTorch’s parameter tracking mechanism 4:33 | Creating parameters using nn.parameter 6:02 | PyTorch’s automatic gradient calculation 6:23 | PyTorch’s nn.linear layer 7:13 | Linear layer attributes 8:02 | Creating an embedding module from scratch 8:32 | Understanding PyTorch’s embedding layer 9:39 | Initializing parameters with random numbers 10:08 | Indexing into parameters 10:33 | Replicating PyTorch’s embedding layer from scratch 11:35 | torch.zeros and torch.normal_ 12:21 | Interpreting the trained movieBias parameter 13:18 | Identifying movies with low and high movieBias 14:19 | Understanding the meaning of movieBias 15:42 | Analyzing user bias 16:08 | Visualizing movie factors using PCA 17:34 | The power of SGD in learning latent factors 18:12 | Using Fast.ai’s collaborative learner 18:55 | Comparing manual and Fast.ai models 19:20 | Analyzing item bias in the Fast.ai model 20:15 | Examining the source code of Colab Learner 21:17 | The usefulness of PCA in other areas 22:16 | Finding similar movies based on embeddings 23:56 | The bootstrapping problem in collaborative filtering 24:42 | Collaborative filtering using deep learning 25:03 | Creating a sequential model for collaborative filtering 25:31 | A simple neural network for collaborative filtering 26:16 | Using Fast.ai’s getEmbeddingSizes 27:06 | Training a deep learning model for collaborative filtering 27:24 | Using collaborative learner with useNN=True 28:06 | Combining dot product and neural network components 28:19 | Incorporating metadata into collaborative filtering 28:54 | The issue of bias in collaborative filtering 29:33 | The anime example in collaborative filtering 30:33 | Embeddings beyond collaborative filtering 30:51 | Embeddings in natural language processing 31:05 | Turning words into integers 31:48 | Creating an embedding matrix for NLP 32:21 | Using an embedding matrix to represent a poem 33:21 | Interpreting embeddings in NLP models 34:01 | Common principles in neural network inputs 34:32 | The simplicity of neural network operations 34:49 | Embeddings in tabular analysis 35:02 | Using neural networks for tabular data 35:34 | Separating continuous and categorical columns 36:52 | Creating a tabular learner 37:06 | The tabular model’s structure 37:30 | Examining the tabular model’s code 38:01 | Automatic embedding size calculation in tabular learner 38:55 | The forward pass in the tabular model 39:37 | Training a tabular learner 39:47 | Kaggle competition and the tabular model 40:19 | The use of embeddings in a Kaggle competition 41:04 | Embedding layers as linear layers on one-hot encoded inputs 41:24 | Combining embeddings with other models 41:57 | Visualizing embeddings for German regions 42:38 | Reconstructing geography through embeddings 43:38 | Visualizing embeddings for days of the week and months of the year 44:06 | Understanding the inner workings of neural networks 44:25 | Break 44:49 | Reviewing the components of a neural network 45:41 | Introducing convolutions 45:59 | Convolutional neural networks 46:26 | Convolutions for computer vision 46:35 | Using MNIST for convolution example 47:22 | Recognizing horizontal and vertical edges in an image 48:14 | Convolution as a sliding window operation 48:47 | Representing an image as numbers 49:28 | Creating an edge detector using convolution 50:00 | Convolution as a dot product 51:06 | Convolution as a sliding window of dot products 51:28 | Kernel size in convolution 54:12 | Varying kernel sizes 54:44 | Multiple convolutional layers 55:04 | Convolution with multiple channels 55:33 | Filter in convolution 56:07 | Combining features from multiple channels 57:03 | Multiple convolutional channels 57:27 | Final activations in a convolutional network 58:03 | Max pooling in convolutional networks 58:41 | Max pooling as a sliding window operation 59:09 | Two by two max pooling 59:26 | The purpose of max pooling 59:54 | Dense layer in convolutional networks 1:00:02 | Dot product of max pooled activations 1:00:49 | Modern convolutional network architecture 1:00:53 | Stride two convolutions 1:01:28 | Skipping activations in stride two convolutions 1:01:42 | Reducing feature size with stride two convolutions 1:02:06 | Stride two convolutions as an alternative to max pooling 1:02:17 | Multiple stride two convolutions 1:02:31 | Average pooling in convolutional networks 1:02:37 | Average pooling as a global prediction 1:03:01 | ImageNet style image detection 1:03:20 | Average pooling for large objects 1:03:44 | Average pooling for small objects 1:04:01 | Choosing between max pooling and average pooling 1:04:24 | The importance of model details 1:04:46 | Concat pooling in Fast.ai 1:05:15 | Convolution as matrix multiplication 1:05:39 | Matthew Clinesmith’s visualization of convolution 1:06:05 | Convolution as a sliding window operation 1:06:59 | Convolution as a matrix multiplication 1:07:51 | Convolution as a special case of matrix multiplication 1:08:23 | Dropout in convolutional networks 1:08:42 | Dropout in the conv-example spreadsheet 1:09:02 | Randomly deleting activations with dropout 1:09:12 | Dropout mask 1:10:14 | Applying the dropout mask 1:10:41 | Corrupting activations with dropout 1:11:09 | The purpose of dropout 1:11:33 | Dropout as data augmentation for activations 1:12:07 | Dropout for avoiding overfitting 1:12:43 | The dropout paper by Geoffrey Hinton’s group 1:13:29 | Dropout’s origin in a master’s thesis 1:14:01 | The importance of preprint servers 1:14:32 | Reviewing the components of a neural network 1:14:56 | Different activation functions 1:15:12 | The importance of non-linearity in activation functions 1:15:30 | Summarizing the components of a neural network 1:16:03 | The simplicity of neural network operations 1:16:20 | Understanding the inner workings of neural networks 1:16:27 | What to do after completing Part 1 1:16:41 | AMA session 1:16:50 | Radek’s book on meta-learning 1:17:25 | The importance of practice and writing 1:18:07 | Rewatching videos and coding along 1:18:20 | Writing blog posts and participating in forums 1:18:41 | The importance of community and study groups 1:19:04 | Building projects 1:19:57 | The MISH activation function 1:20:24 | Google Scholar citations for MISH 1:20:57 | AMA questions 1:21:20 | Staying motivated in the field 1:22:06 | Focusing on specific sub-areas 1:22:45 | The pace of change in the field 1:23:04 | The enduring nature of fundamental concepts 1:23:50 | The trend towards larger models and data 1:24:17 | The history of this question in machine learning 1:24:31 | Engineers’ desire to push the envelope 1:24:55 | Smarter solutions over bigger solutions 1:25:02 | Fast.ai’s DawnBench success 1:25:43 | Choosing the right problems to solve 1:26:17 | Answering Lucas’s question 1:26:27 | Daniel’s question about homeschooling 1:26:49 | Using computers and tablets in homeschooling 1:27:16 | The benefits of educational apps 1:27:36 | The importance of fun in learning 1:27:49 | DragonBox Algebra 5+ app 1:28:08 | The accessibility of algebra for young children 1:28:20 | Discussing homeschooling further 1:28:28 | Farah’s question about walkthroughs 1:28:50 | Continuing live coding sessions 1:29:07 | The focus of live coding sessions 1:29:22 | Foundational techniques for coders and data scientists 1:29:40 | Planning a software engineering course 1:30:00 | Wade’s question about turning a model into a business 1:30:16 | Turning a Gradio prototype into a business venture 1:30:22 | Planning a course on business ventures 1:30:28 | The importance of solving a real problem 1:30:56 | Choosing a problem you understand well 1:31:42 | The start of a business 1:31:47 | The Lean Startup by Eric Reese 1:31:58 | Minimum viable product (MVP) 1:32:03 | Creating a solution that solves the problem 1:32:19 | Launching without a neural network 1:32:48 | Gradually improving the product 1:32:51 | M-I-W-O-J-C’s question about productivity hacks 1:33:02 | Working 24 hours a day 1:33:06 | Not working too hard 1:33:19 | Spending less time working than most people 1:33:42 | Spending time learning and practicing 1:34:09 | The benefits of slow learning 1:34:21 | Building a base of expertise 1:34:43 | The importance of sleep, diet, and exercise 1:35:09 | Tenacity and finishing things nicely 1:35:27 | Coding related productivity hacks 1:35:33 | Creating tools to make finishing things easier 1:36:05 | Thanking the audience 1:36:11 | Appreciating the audience’s participation 1:36:16 | Giving a like on YouTube 1:36:28 | Helping beginners on forums.fast.ai 1:36:38 | Joining Part 2 1:36:43 | Farewell