Duration: 6–10 Weeks
Prerequisite: Basics of Python, Pandas, NumPy, Matplotlib
Goal: Start building real ML models + SQL + EDA + feature engineering
Handling missing values (advanced)
Outlier detection (IQR, Z-score)
Data transformations:
Log transformation
Scaling (StandardScaler, MinMaxScaler)
Encoding (Label encoding, One-hot encoding)
Combining & merging multiple datasets
Feature selection basics
Clean a large CSV dataset
Fix missing values
Handle outliers
Encode categorical data
Summary statistics
Distribution analysis
Correlation heatmaps
Pairplot analysis
Identifying patterns & trends
Removing skewness
Matplotlib
Seaborn
EDA on a Real Dataset (Students / Sales / House Prices)
Create 10 charts
Create a full EDA report
GROUP BY and HAVING
JOINs (Inner, Left, Right, Full)
Window Functions:
ROW_NUMBER
RANK
SUM() OVER()
CTEs (Common Table Expressions)
Subqueries
Build analytics queries
Create a mini SQL project on sales dataset
Linear Regression (review)
Logistic Regression
Decision Trees
Random Forest
KNN (K-Nearest Neighbor)
Naive Bayes
K-Means Clustering (unsupervised)
Train-test split
Cross-validation
Bias vs Variance
Confusion Matrix
Precision, Recall, F1 Score
Overfitting & Underfitting
Build classification & regression models
Compare accuracies
Tune hyperparameters
Feature scaling
Normalization
Binning / Bucketing
Polynomial features
Feature importance
Removing multicollinearity (VIF)
Improve model accuracy with feature engineering
Feature selection experiments
Advanced visualizations:
Boxplots
Heatmaps
Violin plots
Pairplots
Plot styling
Insights narration
Create an insights dashboard
Present visual findings to a client
Choose any one project:
📌 Project 1: Student Performance Predictor
📌 Project 2: Sales Forecasting Model
📌 Project 3: Customer Segmentation (K-Means)
📌 Project 4: Loan Eligibility Classification
📌 Project 5: House Price Prediction (Regression)
Data cleaning
EDA
Model building
Model comparison
Final accuracy report
Visual presentation
Git & GitHub for version control
Virtual environments
API basics
How to read research papers
You will be able to:
✔ Clean and prepare real-world datasets
✔ Perform full EDA
✔ Write advanced SQL queries
✔ Build multiple ML models
✔ Choose best model using evaluation metrics
✔ Create intermediate-level data science projects