100 Days of Machine Learning

100 Days of Machine Learning

I received many messages inquiring about my study plan for the 100 Days of Machine Learning challenge. A big thanks to CampusX's YouTube playlists, which were my primary resource throughout the journey. They've put together a dedicated playlist for the 100 days of machine learning and much more—do check it out!

You can explore other resources here.

But first the prerequisites

I was extremely comfortable in Python and data science libraries (Pandas, NumPy, and Matplotlib) before I started out with machine learning. You cannot escape Pandas at all as a data scientist since most data you get introduced to is in tabular format. If you want to explore Kaggle as a platform as well, you would need Pandas.

Since I am pursuing a degree in data science, my college syllabus had me practice basic statistics , linear algebra , calculus and probability as well.

I did my 100 Days of Machine Learning during my summer vacation, covering feature engineering, exploratory data analysis, supervised learning, unsupervised learning, deep learning (ANNs and an intro to CNNs), and natural language processing concepts. The libraries that I got to explored were: BeautifulSoup, TensorFlow, PandasProfile, Streamlit, NLTK, Sci-Kit Learn, KerasTuner, makepipeline, and Spacy.

Day 0 : Lets make your roadmap...

Yes, I could share my roadmap directly, but blindly following it might not be the best approach. Machine Learning is vast, and I'm certain there are hundreds of concepts within this field that I haven't even heard of.

I also covered NLP, so you could try that too. Personally, I believe I should have allocated more time to Deployment, especially considering that 2024 is leaning towards MLOps trends.

Some little tips and things to keep in mind:

  1. Don't stress out about the syntax. I didn't, at least. You kind of get used to it while practicing, and Python is not that wordy.

  2. Focus on why you're learning the concept first. For example, a popular interview question is: 'Why do we need logistic regression if we already have linear regression?' You need to keep in mind that with the rise of AI tools, syntax, and rote memorization are not at all necessary. The model is built on your knowledge of data science , understanding of the problem and domain , and the intuition you possess about the dataset.

  3. Data preprocessing has more weight than modeling in real life. Let me explain. As you keep exploring this field, you'll realize that ML is pretty cool and amazing, but most models fail because data scientists are not able to get the right kind of datasets. You'll never get the perfect dataset. In most cases, you will have to make the best out of the datasets you have. A good model performs poorly with bad data. A weaker model can surprise you (and will) if you focus on the cleaning and preprocessing steps. So, learn them with patience instead of jumping directly into ML.

  4. I included revision days when I could not muster up the motivation to study or code. You can use ChatGPT to generate interview questions or quizzes for yourself. Revise instead of breaking your streak. You can even work on a few Kaggle notebooks, write a few blogs as a way of revising.

If your maths is weak then the first 20 days need to be dedicated to statistics , probability , calculus and linear algebra. Then you can start following my roadmap.

Day 1 to 10: Warming Up and Handling Data

  • Day 1: Git and GitHub. Please also check out DagsHub.

  • Day 2: Statistics ( give it minimum 4 hrs and rest you ll learn along)

  • Day 3: Probability and Distributions

  • Day 4: NumPy

  • Day 5: Working with CSV files (Pandas) and JSON data format

  • Day 6: Fetching Data from API

  • Day 7: Fetching Data from a Database

  • Day 8: Web Scraping

  • Day 9: Exploratory Data Analysis

  • Day 10: MS EXCEL

Day 11 to 20 : Feature Engineering

  • Day 11: What is ML and what are tensors.

  • Day 12: Introduction to Feature Engineering, Scaling - Standardization and Normalization

  • Day 13: Encoding Categorical Data - Ordinal Encoding and One-Hot Encoding

  • Day 14-15: Column Transformations (Power, Function, Log, Reciprocal, Square Root, Box-Cox, and YeoJohnson)

  • Day 16: Binning, Binarization, Discretization , Quantile Binning, KMeans Binning

  • Day 17-18: Principal Component Analysis (PCA)

  • Day 19-20: Imputation Techniques

  • If you finish early , practice with more datasets on Kaggle.

Day 21 to 30 : Supervised ML

The best method to understand the topics under this segment is to first understand the statistical and geometric intuition , visualizations and then its implementation using the sci-kit library.

  • Day 21: Linear Regression and Regression Metrics

  • Day 22: Multiple Linear Regression

  • Day 23: Gradient Descent - Introduction (this is overwhelming as a topic)

  • Day 24: Gradient Descent - Batch, Mini Batch

  • Day 25: Gradient Descent - Stochastic

  • Day 26: Buffer Period.

  • Day 27: Polynomial Regression

  • Day 28: Bias Variance Tradeoff and all three regularisation techniques

  • Day 29: Perceptron Trick

  • Day 30: Logistic Regression

Day 31 to 40: Supervised ML

  • Day 31: Classification Metrics

  • Day 32: Softmax Regression

  • Day 33: Multinomial Regression

  • Day 34: Decision Trees and Regression Trees

  • Day 35: Naive Bayes

  • Day 36: Ensemble Learning Overview

  • Day 37: Bagging (Bootstrap Aggregating)

  • Day 38: Random Forests

  • Day 39: AdaBoost

  • Day 40: Gradient Boosting for Regression

Day 41 to 50 : Supervised ML

  • Day 41: Gradient Boosting for Classification

  • Day 42: XGBoost for Regression

  • Day 43: XGBoost for Regression

  • Day 44: Support Vector Machines

  • Day 45: Support Vector Machines

  • Day 46: Work on a Kaggle Notebook

  • Day 47: Revise a topic that you think is necessary from what you just understood from Kaggle Notebook.

  • Day 48: Fine tuning hyperparameters : Grid Search

  • Day 49: Random Search

  • Day 50: Cross Validation

Day 51 to 60 : Unsupervised Learning

  • Day 51: Buffer Period.

  • Day 52: K means Clustering

  • Day 53: Agglomerative Clustering (Hierarchical)

  • Day 54: DBSCAN Clustering

  • Day 55: Streamlit

  • Day 56:Streamlit

  • Day 57: Build some projects

  • Day 58: Buffer Period

  • Day 59: Flask

  • Day 60: Flask

Day 61 to 70 : Domain Knowledge

At some point, I found myself a bit confused about what to dive into next. I ended up checking out videos on business analytics to grasp how machine learning is applied. I also checked tutorials about marketing analytics and projects related to customer analytics. If you're like me, you can pick areas that connect with your own field. Since I'm studying business, that's where I focused. But hey, you could explore healthcare, marketing, geospatial data analytics, and more. I also took some time to brush up on Tableau, Statistics, and SQL. I didn't want to break my learning streak, so I stuck to data science and covered all these areas.

Day 71 to 80 : Natural Language Processing

  • Day 71 : Introduction to NLP

  • Day 72: NLP pipelines

  • Day 73: Text Preprocessing (Removing , Tokenisation , Stemming and Lemitisation)

  • Day 74: Parts of Speech Tagging using Spacy

  • Day 75: Text Representation

  • Day 76: Word Embeddings (Word2Vec)

  • Day 77: Text Classification

  • Day 78: Project

  • Day 79: Buffer Period.

  • Day 80: Buffer Period.

Day 81 to 90 : Deep Learning

  • Day 81 : What are Neural Networks and its types

  • Day 82: MLP and MLP notation

  • Day 83: Forward Propagation

  • Day 84: Backpropagation Algorithm (and how to improve it using memoization)

  • Day 85: Buffer Day

  • Day 86: Buffer Day

  • Day 87: Vanishing Gradient Descent Problem

  • Day 88: Fine tuning hyper parameters for improving neural networks

  • Day 89: Regularization Techniques : Early Stopping

  • Day 90: Regularization Techniques :Dropout Layers

Day 91 to 100 : Deep Learning

  • Day 91 : Activation Functions : Tanh(x) function , sigmoid function

  • Day 92: Rectified Linear Unit Function and its variants

  • Day 93: Weight Initialization Techniques : with zero , with random

  • Day 94: Weight Initialization Techniques : Xavier Glorat , He method

  • Day 95: Batch Normalization

  • Day 96: Optimizers : SDG with momentum

  • Day 97: Optimizers : Nesterov Momentum

  • Day 98: Optimizers : Ada Grad

  • Day 99: Optimizers : RMSprop

  • Day 100: Optimizers : Adam

Conclusion

I hope this roadmap serves as a helpful guide on your machine learning journey. Remember, learning is a unique and personal adventure, and there's no one-size-fits-all approach. Feel free to explore, experiment, and find what resonates best with you. If you found this blog insightful, consider giving it a thumbs up and sharing it with fellow learners. Let's connect on social media for more updates, discussions, and shared experiences. Your feedback and questions are always welcome – together, let's make this learning experience even more rewarding! Happy coding! 🚀✨