Cross-Entropy in AI
Cross-entropy is a concept from Information Theory that is widely used in machine learning to measure how different two probability distributions are. In AI, it is most commonly used as a loss function to evaluate how well a modelโs predicted probabilities match the actual (true) labels.
๐ง Intuition
Think of cross-entropy as a way to answer:
โHow surprised is the model when it sees the true answer?โ
- If the model assigns high probability to the correct answer โ low surprise โ low loss
- If the model assigns low probability to the correct answer โ high surprise โ high loss
๐ Formal Definition
For a true distribution P and predicted distribution Q, cross-entropy is:
H(P, Q) = - ฮฃ P(x) * log(Q(x))
In classification (simplified case):
If the true label is one-hot encoded:
Loss = -log(predicted_probability_of_true_class)
๐ Example
Suppose youโre doing a classification task with 3 classes:
- True label:
[0, 1, 0](Class 2 is correct)
Case 1: Good prediction
Predicted: [0.1, 0.8, 0.1] Loss = -log(0.8) โ 0.22 (low)
Case 2: Bad prediction
Predicted: [0.7, 0.2, 0.1] Loss = -log(0.2) โ 1.61 (high)
๐ The worse the prediction, the higher the loss.
โ๏ธ Why Cross-Entropy is Used
- Works naturally with probabilities
- Strongly penalizes confident wrong predictions
- Differentiable โ ideal for gradient-based optimization
- Pairs well with softmax in classification models
๐ Relationship to Other Concepts
- Entropy: Measures uncertainty in a distribution
- Cross-Entropy: Measures mismatch between two distributions
KL(P || Q) = CrossEntropy(P, Q) - Entropy(P)
๐งฉ Where Youโll See It
- Classification models (e.g., logistic regression, neural networks)
- Language models (predicting next word probabilities)
- Image classification tasks
- Any probabilistic prediction system
๐งญ Quick Summary
- Cross-entropy measures how wrong a predicted probability distribution is
- Lower is better
- Itโs the default loss function for most classification problems in AI