March 15, 2025

W6 | Advanced Regression Techniques & Model Evaluation

Understanding Data & Common Concepts

To build a strong foundation in machine learning, we must first understand the core elements of data representation:

  • Notation: The mathematical symbols used to represent features, labels, and model parameters.
  • Labeled Dataset: Data where each input has a corresponding output, essential for supervised learning.
  • Data-matrix: A structured table where rows represent samples and columns represent features.
  • Label Vector: A column of output values corresponding to each data point.
  • Data-point: A single example from the dataset.

Example: House Price Prediction

Imagine you're trying to predict house prices. The dataset contains information about house size, number of rooms, location, and price.

  • Data-matrix: Each row represents a house, each column represents a feature (size, rooms, location, etc.).
  • Label Vector: The final column represents the actual price of each house.
  • Data-point: A single house with its features and price.

Mean Squared Error (MSE) – Measuring Model Accuracy

MSE is a widely used loss function to measure the difference between actual and predicted values: MSE=1ni=1n(yiy^i)2MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 A lower MSE means better model performance.

Example: Predicting Student Exam Scores

If a model predicts that a student will score 85, but the actual score is 90, the squared error is (90-85)^2 = 25. MSE averages these errors across multiple predictions.

Overfitting vs. Underfitting – The Bias-Variance Tradeoff

Understanding how models generalize is critical to machine learning success.

Overfitting – Learning Too Much

When a model memorizes the training data rather than learning general patterns, it performs poorly on unseen data.

  • Example: A student memorizing answers instead of understanding concepts.
  • Solution: Data augmentation, regularization, and pruning.

Toy Dataset – A Small-Scale Example

A toy dataset is a small, simplified dataset used for quick experiments. It helps in understanding model behavior before scaling to large datasets.

Data Augmentation – Expanding Training Data

To combat overfitting, we can artificially increase data by:

  • Rotating or flipping images in image classification.
  • Adding noise to numerical datasets.
  • Translating text data for NLP models.

Example: Handwriting Recognition

If you only train a model on perfectly written letters, it may struggle with different handwriting styles. Data augmentation (adding slight distortions) improves generalization.

Underfitting – Learning Too Little

A model that is too simple fails to capture the underlying patterns in data.

  • Example: A student only learning addition when trying to solve algebra problems.
  • Solution: Increasing model complexity, adding more features, or reducing regularization.

Model Complexity – Finding the Right Balance

A model should be complex enough to capture patterns but simple enough to generalize well.

Regularization – Controlling Model Complexity

Regularization techniques help prevent overfitting by penalizing overly complex models.

Ridge Regression – L2 Regularization

Ridge regression adds a penalty to large coefficient values: J(θ)=MSE+λj=1nθj2J(\theta) = MSE + \lambda \sum_{j=1}^{n} \theta_j^2 This prevents overfitting by shrinking parameter values.

LASSO Regression – L1 Regularization

LASSO (Least Absolute Shrinkage and Selection Operator) forces some coefficients to become zero, effectively selecting features: J(θ)=MSE+λj=1nθjJ(\theta) = MSE + \lambda \sum_{j=1}^{n} |\theta_j| This helps with feature selection in high-dimensional data.

Example: Movie Recommendation System

LASSO regression can eliminate unimportant features (like a user’s browser history) while keeping relevant ones (like movie genre preference) to improve recommendations.

Cross-Validation – Evaluating Model Performance

To ensure our model generalizes well, we use cross-validation techniques.

k-Fold Cross-Validation

  • Splits data into k subsets (folds)
  • Trains model on k-1 folds and tests on the remaining fold
  • Repeats k times to ensure robustness

Leave-One-Out Cross-Validation (LOOCV)

  • Uses all data points except one for training
  • Tests on the excluded data point
  • Repeats for every data point

Example: Diagnosing Disease with Medical Data

Cross-validation ensures that a model predicting disease outcomes generalizes well across different patients, avoiding bias from a specific subset of data.

Probabilistic View of Regression

Regression can also be viewed through a probabilistic lens by modeling the likelihood of output values given input features. This helps in uncertainty estimation and Bayesian regression techniques.

Example: Weather Prediction

Instead of predicting a single temperature, a probabilistic regression model can output a temperature range with probabilities, helping meteorologists communicate uncertainty.

By mastering these advanced concepts in W6, students will gain a deeper understanding of model evaluation, regularization techniques, and strategies for handling overfitting and underfitting.