Understanding Data & Common Concepts
To build a strong foundation in machine learning classification, we must first understand the core elements of data representation:
- Notation: Mathematical symbols used to represent features, labels, and classifiers.
- Labeled Dataset: Data where each input has a corresponding label, crucial for supervised learning.
- Data-matrix: A structured table where rows represent samples and columns represent features.
- Label Vector: A column of output values corresponding to each data point.
- Data-point: A single example from the dataset.
- Label Set: The unique categories in classification problems (e.g., "spam" or "not spam").
Example: Email Spam Detection
Imagine a dataset containing emails with features like word frequency, sender address, and subject length.
- Data-matrix: Each row represents an email, each column represents a feature.
- Label Vector: The final column indicates whether the email is spam or not.
- Data-point: A single email with its features and label.
Zero-One Error – Measuring Classification Accuracy
Zero-One Error calculates the fraction of incorrect classifications: A lower error means better model performance.
Example: Identifying Cat vs. Dog Images
If a classifier predicts "cat" for 10 images but misclassifies 2, the Zero-One Error is 2/10 = 0.2 (or 20%).
Linear Classifier – Simple Classification Approach
A linear classifier separates data using a straight line (or hyperplane in higher dimensions).
Example: Pass or Fail Prediction
A model predicting student success based on hours studied and past performance might use a line to separate "pass" and "fail" students.
K-Nearest Neighbors (KNN) – Instance-Based Learning
KNN classifies a point based on the majority class among its "k" nearest neighbors.
Example: Movie Genre Classification
If a new movie is similar to 3 action movies and 2 dramas, KNN assigns it to "action" based on majority voting.
Decision Trees – Interpretable Classification Models
Decision Trees split data at each node based on the most significant feature, forming a tree structure.
Binary Tree Structure
Each decision node splits into two branches based on a threshold.
Entropy – Measuring Node Impurity
Entropy measures uncertainty in a node: A lower entropy means purer nodes.
Example: Loan Approval
A decision tree for loan approval may split data based on salary, credit score, and debt-to-income ratio.
Decision Stump – A Simple Tree
A decision stump is a decision tree with only one split.
Example: Filtering Spam Emails
A decision stump might classify spam based only on whether the subject contains "free money" or not.
Growing a Tree – Building a Powerful Classifier
Decision trees grow by recursively splitting nodes until stopping criteria (e.g., max depth) are met.
Example: Diagnosing a Disease
A decision tree might first check fever, then cough, and finally blood test results to diagnose flu vs. COVID-19.
References
Further reading and resources for in-depth understanding of classification models and decision trees.