I received many messages inquiring about my study plan for the 100 Days of Machine Learning challenge. A big thanks to CampusX's YouTube playlists, which were my primary resource throughout the journey. They've put together a dedicated playlist for the 100 days of machine learning and much more—do check it out!
You can explore other resources here.
But first the prerequisites
I was extremely comfortable in Python and data science libraries (Pandas, NumPy, and Matplotlib) before I started out with machine learning. You cannot escape Pandas at all as a data scientist since most data you get introduced to is in tabular format. If you want to explore Kaggle as a platform as well, you would need Pandas.
Since I am pursuing a degree in data science, my college syllabus had me practice basic statistics , linear algebra , calculus and probability as well.
I did my 100 Days of Machine Learning during my summer vacation, covering feature engineering, exploratory data analysis, supervised learning, unsupervised learning, deep learning (ANNs and an intro to CNNs), and natural language processing concepts. The libraries that I got to explored were: BeautifulSoup, TensorFlow, PandasProfile, Streamlit, NLTK, Sci-Kit Learn, KerasTuner, makepipeline, and Spacy.
Day 0 : Lets make your roadmap...
Yes, I could share my roadmap directly, but blindly following it might not be the best approach. Machine Learning is vast, and I'm certain there are hundreds of concepts within this field that I haven't even heard of.
I also covered NLP, so you could try that too. Personally, I believe I should have allocated more time to Deployment, especially considering that 2024 is leaning towards MLOps trends.
Some little tips and things to keep in mind:
Don't stress out about the syntax. I didn't, at least. You kind of get used to it while practicing, and Python is not that wordy.
Focus on why you're learning the concept first. For example, a popular interview question is: 'Why do we need logistic regression if we already have linear regression?' You need to keep in mind that with the rise of AI tools, syntax, and rote memorization are not at all necessary. The model is built on your knowledge of data science , understanding of the problem and domain , and the intuition you possess about the dataset.
Data preprocessing has more weight than modeling in real life. Let me explain. As you keep exploring this field, you'll realize that ML is pretty cool and amazing, but most models fail because data scientists are not able to get the right kind of datasets. You'll never get the perfect dataset. In most cases, you will have to make the best out of the datasets you have. A good model performs poorly with bad data. A weaker model can surprise you (and will) if you focus on the cleaning and preprocessing steps. So, learn them with patience instead of jumping directly into ML.
I included revision days when I could not muster up the motivation to study or code. You can use ChatGPT to generate interview questions or quizzes for yourself. Revise instead of breaking your streak. You can even work on a few Kaggle notebooks, write a few blogs as a way of revising.
If your maths is weak then the first 20 days need to be dedicated to statistics , probability , calculus and linear algebra. Then you can start following my roadmap.
Day 1 to 10: Warming Up and Handling Data
Day 1: Git and GitHub. Please also check out DagsHub.
Day 2: Statistics ( give it minimum 4 hrs and rest you ll learn along)
Day 3: Probability and Distributions
Day 4: NumPy
Day 5: Working with CSV files (Pandas) and JSON data format
Day 6: Fetching Data from API
Day 7: Fetching Data from a Database
Day 8: Web Scraping
Day 9: Exploratory Data Analysis
Day 10: MS EXCEL
Day 11 to 20 : Feature Engineering
Day 11: What is ML and what are tensors.
Day 12: Introduction to Feature Engineering, Scaling - Standardization and Normalization
Day 13: Encoding Categorical Data - Ordinal Encoding and One-Hot Encoding
Day 14-15: Column Transformations (Power, Function, Log, Reciprocal, Square Root, Box-Cox, and YeoJohnson)
Day 16: Binning, Binarization, Discretization , Quantile Binning, KMeans Binning
Day 17-18: Principal Component Analysis (PCA)
Day 19-20: Imputation Techniques
If you finish early , practice with more datasets on Kaggle.
Day 21 to 30 : Supervised ML
The best method to understand the topics under this segment is to first understand the statistical and geometric intuition , visualizations and then its implementation using the sci-kit library.
Day 21: Linear Regression and Regression Metrics
Day 22: Multiple Linear Regression
Day 23: Gradient Descent - Introduction (this is overwhelming as a topic)
Day 24: Gradient Descent - Batch, Mini Batch
Day 25: Gradient Descent - Stochastic
Day 26: Buffer Period.
Day 27: Polynomial Regression
Day 28: Bias Variance Tradeoff and all three regularisation techniques
Day 29: Perceptron Trick
Day 30: Logistic Regression
Day 31 to 40: Supervised ML
Day 31: Classification Metrics
Day 32: Softmax Regression
Day 33: Multinomial Regression
Day 34: Decision Trees and Regression Trees
Day 35: Naive Bayes
Day 36: Ensemble Learning Overview
Day 37: Bagging (Bootstrap Aggregating)
Day 38: Random Forests
Day 39: AdaBoost
Day 40: Gradient Boosting for Regression
Day 41 to 50 : Supervised ML
Day 41: Gradient Boosting for Classification
Day 42: XGBoost for Regression
Day 43: XGBoost for Regression
Day 44: Support Vector Machines
Day 45: Support Vector Machines
Day 46: Work on a Kaggle Notebook
Day 47: Revise a topic that you think is necessary from what you just understood from Kaggle Notebook.
Day 48: Fine tuning hyperparameters : Grid Search
Day 49: Random Search
Day 50: Cross Validation
Day 51 to 60 : Unsupervised Learning
Day 51: Buffer Period.
Day 52: K means Clustering
Day 53: Agglomerative Clustering (Hierarchical)
Day 54: DBSCAN Clustering
Day 55: Streamlit
Day 56:Streamlit
Day 57: Build some projects
Day 58: Buffer Period
Day 59: Flask
Day 60: Flask
Day 61 to 70 : Domain Knowledge
At some point, I found myself a bit confused about what to dive into next. I ended up checking out videos on business analytics to grasp how machine learning is applied. I also checked tutorials about marketing analytics and projects related to customer analytics. If you're like me, you can pick areas that connect with your own field. Since I'm studying business, that's where I focused. But hey, you could explore healthcare, marketing, geospatial data analytics, and more. I also took some time to brush up on Tableau, Statistics, and SQL. I didn't want to break my learning streak, so I stuck to data science and covered all these areas.
Day 71 to 80 : Natural Language Processing
Day 71 : Introduction to NLP
Day 72: NLP pipelines
Day 73: Text Preprocessing (Removing , Tokenisation , Stemming and Lemitisation)
Day 74: Parts of Speech Tagging using Spacy
Day 75: Text Representation
Day 76: Word Embeddings (Word2Vec)
Day 77: Text Classification
Day 78: Project
Day 79: Buffer Period.
Day 80: Buffer Period.
Day 81 to 90 : Deep Learning
Day 81 : What are Neural Networks and its types
Day 82: MLP and MLP notation
Day 83: Forward Propagation
Day 84: Backpropagation Algorithm (and how to improve it using memoization)
Day 85: Buffer Day
Day 86: Buffer Day
Day 87: Vanishing Gradient Descent Problem
Day 88: Fine tuning hyper parameters for improving neural networks
Day 89: Regularization Techniques : Early Stopping
Day 90: Regularization Techniques :Dropout Layers
Day 91 to 100 : Deep Learning
Day 91 : Activation Functions : Tanh(x) function , sigmoid function
Day 92: Rectified Linear Unit Function and its variants
Day 93: Weight Initialization Techniques : with zero , with random
Day 94: Weight Initialization Techniques : Xavier Glorat , He method
Day 95: Batch Normalization
Day 96: Optimizers : SDG with momentum
Day 97: Optimizers : Nesterov Momentum
Day 98: Optimizers : Ada Grad
Day 99: Optimizers : RMSprop
Day 100: Optimizers : Adam
Conclusion
I hope this roadmap serves as a helpful guide on your machine learning journey. Remember, learning is a unique and personal adventure, and there's no one-size-fits-all approach. Feel free to explore, experiment, and find what resonates best with you. If you found this blog insightful, consider giving it a thumbs up and sharing it with fellow learners. Let's connect on social media for more updates, discussions, and shared experiences. Your feedback and questions are always welcome – together, let's make this learning experience even more rewarding! Happy coding! 🚀✨