March 15, 2025

W7 | Classification & Decision Trees

Understanding Data & Common Concepts

To build a strong foundation in machine learning classification, we must first understand the core elements of data representation:

  • Notation: Mathematical symbols used to represent features, labels, and classifiers.
  • Labeled Dataset: Data where each input has a corresponding label, crucial for supervised learning.
  • Data-matrix: A structured table where rows represent samples and columns represent features.
  • Label Vector: A column of output values corresponding to each data point.
  • Data-point: A single example from the dataset.
  • Label Set: The unique categories in classification problems (e.g., "spam" or "not spam").

Example: Email Spam Detection

Imagine a dataset containing emails with features like word frequency, sender address, and subject length.

  • Data-matrix: Each row represents an email, each column represents a feature.
  • Label Vector: The final column indicates whether the email is spam or not.
  • Data-point: A single email with its features and label.

Zero-One Error – Measuring Classification Accuracy

Zero-One Error calculates the fraction of incorrect classifications: Error=1ni=1nI(yiy^i)Error = \frac{1}{n} \sum_{i=1}^{n} I(y_i \neq \hat{y}_i) A lower error means better model performance.

Example: Identifying Cat vs. Dog Images

If a classifier predicts "cat" for 10 images but misclassifies 2, the Zero-One Error is 2/10 = 0.2 (or 20%).

Linear Classifier – Simple Classification Approach

A linear classifier separates data using a straight line (or hyperplane in higher dimensions).

Example: Pass or Fail Prediction

A model predicting student success based on hours studied and past performance might use a line to separate "pass" and "fail" students.

K-Nearest Neighbors (KNN) – Instance-Based Learning

KNN classifies a point based on the majority class among its "k" nearest neighbors.

Example: Movie Genre Classification

If a new movie is similar to 3 action movies and 2 dramas, KNN assigns it to "action" based on majority voting.

Decision Trees – Interpretable Classification Models

Decision Trees split data at each node based on the most significant feature, forming a tree structure.

Binary Tree Structure

Each decision node splits into two branches based on a threshold.

Entropy – Measuring Node Impurity

Entropy measures uncertainty in a node: H=pilog2piH = - \sum p_i \log_2 p_i A lower entropy means purer nodes.

Example: Loan Approval

A decision tree for loan approval may split data based on salary, credit score, and debt-to-income ratio.

Decision Stump – A Simple Tree

A decision stump is a decision tree with only one split.

Example: Filtering Spam Emails

A decision stump might classify spam based only on whether the subject contains "free money" or not.

Growing a Tree – Building a Powerful Classifier

Decision trees grow by recursively splitting nodes until stopping criteria (e.g., max depth) are met.

Example: Diagnosing a Disease

A decision tree might first check fever, then cough, and finally blood test results to diagnose flu vs. COVID-19.

References

Further reading and resources for in-depth understanding of classification models and decision trees.