March 15, 2025

W8 | Generative Models & Naïve Bayes Classifier

Understanding Data & Common Concepts

Before diving into generative models, let's revisit key foundational concepts:

  • Notation: The mathematical symbols used to represent data, probabilities, and classifiers.
  • Labeled Dataset: A dataset where each input is associated with an output label, crucial for supervised learning.
  • Data-matrix: A structured representation where each row is a data sample, and each column is a feature.
  • Label Vector: A column of target values corresponding to each data point.
  • Data-point: An individual sample from the dataset.
  • Label Set: The unique categories that a classification model predicts.

Example: Handwritten Digit Recognition

In a dataset like MNIST (used for recognizing handwritten digits 0-9):

  • Data-matrix: Each row is an image of a digit, each column represents pixel values.
  • Label Vector: The digit associated with each image.
  • Data-point: A single handwritten digit.
  • Label Set: {0, 1, 2, ..., 9} (the possible digits).

Discriminative vs. Generative Modeling

Machine learning models can be classified into discriminative and generative approaches:

  • Discriminative Models learn a direct mapping between input features and labels.
    • Example: Logistic Regression, Support Vector Machines (SVMs)
    • Focus: Finding decision boundaries between different classes.
  • Generative Models learn how data is generated and use that to classify new inputs.
    • Example: Naïve Bayes, Gaussian Mixture Models (GMMs)
    • Focus: Estimating probability distributions for each class.

Real-Life Analogy

  • Discriminative Approach: A detective directly looking for evidence linking a suspect to a crime.
  • Generative Approach: A detective first understanding how crimes are generally committed and then determining if the suspect fits a known pattern.

Generative Models

Generative models attempt to estimate the probability distribution of each class, then use Bayes’ theorem to classify new data points.

Example: Speech Generation

Generative models can be used to generate realistic speech samples by learning distributions of phonemes in human speech.

Naïve Bayes – A Simple Yet Powerful Generative Model

Naïve Bayes is based on Bayes' Theorem: P(YX)=P(XY)P(Y)P(X)P(Y|X) = \frac{P(X|Y) P(Y)}{P(X)} Where:

  • P(YX)P(Y|X) is the probability of class Y given input X.
  • P(XY)P(X|Y) is the likelihood of observing X given class Y.
  • P(Y)P(Y) is the prior probability of class Y.
  • P(X)P(X) is the probability of input X.

Naïve Assumption: It assumes features are conditionally independent, simplifying calculations.

Example: Spam Email Detection

  • Features: Presence of words like "free," "win," "prize."
  • P(Spam | Email Content) is computed based on word probabilities in spam vs. non-spam emails.

Challenges and Solutions in Naïve Bayes

1. Zero Probability Problem

  • Problem: If a word never appears in spam emails, P(XSpam)=0P(X|Spam) = 0, which invalidates calculations.
  • Solution: Laplace Smoothing (adding a small value to all counts).

2. Feature Independence Assumption

  • Problem: Features are often correlated (e.g., "discount" and "offer" frequently appear together).
  • Solution: Use models like Bayesian Networks or Hidden Markov Models.

3. Handling Continuous Data

  • Problem: Naïve Bayes assumes categorical data.
  • Solution: Use Gaussian Naïve Bayes for continuous data distributions.

Example: Sentiment Analysis

Naïve Bayes is commonly used for classifying product reviews as positive or negative based on word frequencies.

By mastering W8 concepts, students will be able to understand probabilistic models, apply generative classification, and solve real-world problems using Naïve Bayes.