Lesson 4 Practical Deep Learning for Coders 2022

Speaker 1: Jeremy Howard Speaker 2: John Williams

Concise Summary

0:00 | Introduction to NLP and Transformers 1:06 | Why use Hugging Face Transformers? 1:37 | Hugging Face Transformers: State of the art in NLP 2:15 | Hugging Face Transformers: Lower level library 3:19 | Fine-tuning a pre-trained model 5:01 | ULMfit: Pioneering fine-tuning 7:31 | ULMfit: Three steps 9:18 | Transformers: Advantage of modern accelerators 10:03 | Masked language models 10:34 | Question: How to go from a model trained to predict the next word to a model for classification? 11:04 | Visualizing layers of an ImageNet classification model 14:04 | Fine-tuning process: Adding a new random matrix 14:47 | Kaggle competition: US Patent Phrase-to-Phrase Matching 15:08 | Kaggle competitions: Real-world data and problems 16:04 | NLP classification: Sentiment analysis, author identification, legal discovery, etc. 18:05 | Kaggle competition data: Anchor, target, context, and score 19:12 | Turning similarity problem into a classification problem 20:42 | Deep learning: Turning novel problems into familiar ones 21:05 | Kaggle: Using GPU 21:19 | Paperspace: Alternative to Kaggle 22:05 | Downloading data and installing packages 23:09 | Notebook tricks: Bash commands and variables 23:47 | Understanding a dataset: CSV files and Pandas 24:48 | Key libraries for data science: NumPy, matplotlib, Pandas, and PyTorch 26:04 | Importance of fundamental libraries 26:51 | Pandas data frame: Describing data 28:07 | Key features of the dataset: Short documents and repetition 28:42 | Creating a single string with field separators 29:30 | Neural networks work with numbers 29:55 | Tokenization and numericalization 30:29 | Tokenization: Splitting into words or subwords 31:33 | Hugging Face Transformers and Datasets 31:55 | Turning a Pandas dataframe into a Hugging Face Datasets dataset 32:31 | Tokenization and numericalization: Splitting into tokens and turning them into numbers 33:10 | Hugging Face Model Hub: Hundreds of models 33:49 | Pre-trained models: Patent models 34:29 | DiBerta v3: A good starting point for NLP 35:35 | Model sizes: Small, medium, large 36:07 | AutoTokenizer: Tokenizing the same way as the pre-trained model 36:42 | Tokenizing a string 37:10 | Tokenization: Underscores represent the start of a word 38:16 | Vocabulary: List of unique tokens 38:51 | Tokenization and numericalization: Turning tokens into numbers 39:15 | Parallelizing tokenization with dataset.map 40:06 | Tokenized dataset: Input and numbers representing token positions 41:13 | Question: How to choose keywords and order of fields? 41:41 | Choosing keywords: Arbitrary and flexible 42:44 | Question: Special handling for long fields? 43:07 | Long documents and ULMfit: No special consideration 43:45 | Transformers and large documents: Challenging 44:33 | Hugging Face Transformers: Expectations about data 44:54 | Target column: Labels 45:33 | Test set: Separate data for evaluating model generalization 46:03 | Overfitting: Identifying and controlling 46:43 | Plotting polynomials to illustrate overfitting 47:47 | Scikit-learn: Library for classic machine learning methods 48:33 | Underfitting: Model too simple to match data 49:31 | Overfitting: Model fits training data too well, but generalizes poorly 50:47 | Validation set: Data not used for training, but used for measuring accuracy 51:47 | Fast.ai: Always uses a validation set 52:39 | Creating a good validation set: Not just random removal 53:48 | Kaggle competitions: Testing ability to create a good validation set 54:02 | Kaggle: Overfitting and the importance of a test set 55:04 | Kaggle competitions: Test sets with unseen data 56:04 | Cross-validation: Not about building a good validation set 56:57 | Test set: Data not used for training or validation, used for final evaluation 57:07 | Validation set: Measuring metrics 59:08 | Metrics: Accuracy, Pearson correlation coefficient, etc. 59:53 | Loss function vs. metric 1:00:28 | Metrics in real life: Not a single number 1:01:41 | Metrics and AI: Importance of considering the entire process 1:02:11 | Metrics and AI: The problem with metrics 1:03:04 | Goodhart’s Law: When a measure becomes a target, it ceases to be a good measure 1:03:55 | Kaggle: Using Pearson correlation coefficient 1:04:12 | Pearson correlation coefficient: Measuring similarity between variables 1:05:04 | Understanding the behavior of the Pearson correlation coefficient 1:05:34 | California Housing dataset: Visualizing correlations 1:06:09 | Plotting large datasets: Random sampling 1:06:37 | NumPy core-coef: Calculating correlation coefficients 1:07:46 | Visualizing correlations: Scatter plots and alpha transparency 1:08:52 | Correlation coefficient: 0.68 1:09:50 | Correlation coefficient: 0.43 1:10:43 | Outliers: Sensitivity of correlation coefficient 1:11:33 | Correlation coefficient: 0.34 1:12:08 | Correlation coefficient: -0.2 1:12:14 | Understanding the behavior of a new metric: Visualizing different levels 1:12:33 | Reporting correlation after each epoch 1:13:02 | Splitting data into training and validation sets 1:13:42 | Training a model with Hugging Face Transformers 1:14:04 | Trainer: Equivalent of learner in Fast.ai 1:14:14 | Mini-batches and batch sizes 1:15:04 | Learning rate 1:15:58 | Training arguments: Configuration for Hugging Face Transformers 1:16:34 | Creating a model: Automodel for sequence classification 1:17:24 | Creating a trainer: Equivalent of creating a learner 1:17:48 | Metrics: Printing out correlation coefficient 1:18:06 | Training the model 1:19:24 | Question: How to decide when to remove outliers? 1:20:06 | Outliers: Never just remove them 1:21:13 | Outliers: Valuable insights and understanding their source 1:22:08 | Trained model: Similar to a Fast.ai learner 1:22:48 | Predicting with the trained model 1:23:10 | Always look at your inputs and outputs 1:23:28 | Negative predictions and predictions over one: Fixing the problem 1:24:08 | Clipping predictions to 0 and 1 1:24:25 | Creating a submission file 1:25:01 | Deep learning: Progress and opportunities 1:25:33 | NLP: Huge opportunity area 1:26:17 | Subreddit: Automatically generated conversations between GPT-2 models 1:27:37 | NLP: Potential for misuse 1:28:04 | NLP: Controllable social media conversations 1:28:53 | The Guardian article written by GPT-3 1:29:47 | Net neutrality: Auto-generated comments 1:31:33 | Bot classifiers: Difficulty in detecting bot-generated content 1:32:09 | Beating bot classifiers: Including beating the classifier in the loss function 1:32:51 | NLP: Opportunities and concerns 1:33:01 | Question: Should num labels be 5 instead of 1? 1:33:27 | Num labels: One label for one column 1:33:42 | Regression problem: Treating the target as a continuous variable 1:34:23 | Conclusion: Enjoying NLP and looking forward to the next lesson