golfbag Link to heading
status – finished linear regression (2025-09-17)
๐๏ธ 12-Week ML + ETL Curriculum (Golf Bag Training) Link to heading
This curriculum covers 12 canonical ML categories (the โgolf bagโ) with ETL practice.
Each week introduces a dataset, ETL tasks, and ML goals. Use this as a checklist in Obsidian.
๐ Week 1: Linear Regression Link to heading
- Dataset: California Housing (scikit-learn)
- ETL Tasks:
- Load CSV, validate schema
- Handle missing values
- Normalize numeric fields
- ML Tasks:
- Train/test split
- Fit linear regression
- Interpret coefficients
๐ Week 2: Logistic Regression Link to heading
- Dataset: Breast Cancer Wisconsin
- ETL Tasks:
- Encode categorical features
- Scale inputs
- ML Tasks:
- Train binary classifier
- Evaluate with ROC, precision, recall
๐ Week 3: Decision Trees Link to heading
- Dataset: Iris Dataset
- ETL Tasks:
- Basic EDA (histograms, scatter plots)
- Verify balanced classes
- ML Tasks:
- Build decision tree
- Visualize splits
- Observe overfitting
๐ Week 4: Random Forests Link to heading
- Dataset: UCI Adult / Census Income
- ETL Tasks:
- Encode categorical features
- Impute missing values
- Balance classes
- ML Tasks:
- Train random forest
- Measure feature importance
- Compare to logistic regression
๐ Week 5: K-Means Clustering Link to heading
- Dataset: MNIST Digits (scikit-learn or full MNIST)
- ETL Tasks:
- Flatten images โ tabular form
- Apply PCA for preprocessing
- ML Tasks:
- Run K-Means (k=10)
- Visualize clusters
- Evaluate with silhouette score
๐ Week 6: PCA (Dimensionality Reduction) Link to heading
- Dataset: UCI Wine Dataset
- ETL Tasks:
- Normalize features
- Check correlations
- ML Tasks:
- Run PCA
- Plot explained variance
- Create 2D scatter plot
๐ Week 7: Neural Networks (MLP) Link to heading
- Dataset: MNIST Digits
- ETL Tasks:
- Normalize pixel intensities
- Train/val/test split
- Manage batch sizes
- ML Tasks:
- Build 2-layer MLP
- Compare accuracy to logistic regression
๐ Week 8: CNN (Convolutions) Link to heading
- Dataset: CIFAR-10
- ETL Tasks:
- Resize and augment images
- Split train/val/test
- ML Tasks:
- Build CNN with conv + pooling
- Apply dropout
- Compare to transfer learning
๐ Week 9: RNN / LSTM (Sequences) Link to heading
- Dataset: Shakespeare Corpus (tiny) or IMDB Reviews
- ETL Tasks:
- Tokenize sequences
- Pad to fixed length
- Manage vocabulary size
- ML Tasks:
- Train LSTM
- Predict next char / classify sentiment
๐ Week 10: Transformers (BERT) Link to heading
- Dataset: SST-2 Sentiment Treebank
- ETL Tasks:
- Tokenize with HuggingFace
- Create train/dev/test splits
- ML Tasks:
- Fine-tune BERT
- Evaluate F1 score
- Compare to LSTM baseline
๐ Week 11: Reinforcement Learning (Q-Learning) Link to heading
- Dataset/Env: OpenAI Gym CartPole
- ETL Tasks:
- Log state/action/reward data
- Define schema for episodes
- ML Tasks:
- Train Q-learning agent
- Plot reward curve
- Test convergence
๐ Week 12: Anomaly Detection Link to heading
- Dataset: Kaggle Credit Card Fraud Detection
- ETL Tasks:
- Handle extreme class imbalance
- Scale features
- Stratified train/test split
- ML Tasks:
- Train Isolation Forest
- Compare to Autoencoder
- Evaluate precision@k
๐ Notes Link to heading
- Each week: spend equal time on ETL and ML โ good pipelines make models shine.
- Keep notebooks + scripts versioned (Git).
- Optional stretch goal: orchestrate pipelines (Airflow, Dagster, Prefect).
- Track experiments: MLflow, Weights & Biases, or simple CSV logs.