golfbag Link to heading

status – finished linear regression (2025-09-17)

๐ŸŒ๏ธ 12-Week ML + ETL Curriculum (Golf Bag Training) Link to heading

This curriculum covers 12 canonical ML categories (the โ€œgolf bagโ€) with ETL practice.
Each week introduces a dataset, ETL tasks, and ML goals. Use this as a checklist in Obsidian.


๐Ÿ“… Week 1: Linear Regression Link to heading

  • Dataset: California Housing (scikit-learn)
  • ETL Tasks:
    • Load CSV, validate schema
    • Handle missing values
    • Normalize numeric fields
  • ML Tasks:
    • Train/test split
    • Fit linear regression
    • Interpret coefficients

๐Ÿ“… Week 2: Logistic Regression Link to heading

  • Dataset: Breast Cancer Wisconsin
  • ETL Tasks:
    • Encode categorical features
    • Scale inputs
  • ML Tasks:
    • Train binary classifier
    • Evaluate with ROC, precision, recall

๐Ÿ“… Week 3: Decision Trees Link to heading

  • Dataset: Iris Dataset
  • ETL Tasks:
    • Basic EDA (histograms, scatter plots)
    • Verify balanced classes
  • ML Tasks:
    • Build decision tree
    • Visualize splits
    • Observe overfitting

๐Ÿ“… Week 4: Random Forests Link to heading

  • Dataset: UCI Adult / Census Income
  • ETL Tasks:
    • Encode categorical features
    • Impute missing values
    • Balance classes
  • ML Tasks:
    • Train random forest
    • Measure feature importance
    • Compare to logistic regression

๐Ÿ“… Week 5: K-Means Clustering Link to heading


๐Ÿ“… Week 6: PCA (Dimensionality Reduction) Link to heading

  • Dataset: UCI Wine Dataset
  • ETL Tasks:
    • Normalize features
    • Check correlations
  • ML Tasks:
    • Run PCA
    • Plot explained variance
    • Create 2D scatter plot

๐Ÿ“… Week 7: Neural Networks (MLP) Link to heading

  • Dataset: MNIST Digits
  • ETL Tasks:
    • Normalize pixel intensities
    • Train/val/test split
    • Manage batch sizes
  • ML Tasks:
    • Build 2-layer MLP
    • Compare accuracy to logistic regression

๐Ÿ“… Week 8: CNN (Convolutions) Link to heading

  • Dataset: CIFAR-10
  • ETL Tasks:
    • Resize and augment images
    • Split train/val/test
  • ML Tasks:
    • Build CNN with conv + pooling
    • Apply dropout
    • Compare to transfer learning

๐Ÿ“… Week 9: RNN / LSTM (Sequences) Link to heading


๐Ÿ“… Week 10: Transformers (BERT) Link to heading

  • Dataset: SST-2 Sentiment Treebank
  • ETL Tasks:
    • Tokenize with HuggingFace
    • Create train/dev/test splits
  • ML Tasks:
    • Fine-tune BERT
    • Evaluate F1 score
    • Compare to LSTM baseline

๐Ÿ“… Week 11: Reinforcement Learning (Q-Learning) Link to heading

  • Dataset/Env: OpenAI Gym CartPole
  • ETL Tasks:
    • Log state/action/reward data
    • Define schema for episodes
  • ML Tasks:
    • Train Q-learning agent
    • Plot reward curve
    • Test convergence

๐Ÿ“… Week 12: Anomaly Detection Link to heading

  • Dataset: Kaggle Credit Card Fraud Detection
  • ETL Tasks:
    • Handle extreme class imbalance
    • Scale features
    • Stratified train/test split
  • ML Tasks:
    • Train Isolation Forest
    • Compare to Autoencoder
    • Evaluate precision@k

๐Ÿ“ Notes Link to heading

  • Each week: spend equal time on ETL and ML โ€” good pipelines make models shine.
  • Keep notebooks + scripts versioned (Git).
  • Optional stretch goal: orchestrate pipelines (Airflow, Dagster, Prefect).
  • Track experiments: MLflow, Weights & Biases, or simple CSV logs.