Softmax vs Logsoftmax: What is Difference?

Both Softmax and LogSoftmax are activation functions used in machine learning models, especially for classification tasks, but they serve different purposes. Let’s compare them in terms of functionality, use cases, and performance.

1️⃣ Softmax

Purpose: Softmax is used to convert raw scores (logits) into probabilities for multi-class classification. It ensures that the sum of the output probabilities equals 1.
Output: A vector of probabilities between 0 and 1 for each class.
Use Case: Most commonly used in the output layer of a classification model to assign probabilities to different classes.

Formula:

Si=exi∑jexjS_i = \frac{e^{x_i}}{\sum_{j} e^{x_j}}Si=∑jexjexi

Where:

xix_ixi is the raw score (logit) for class iii,
The denominator sums over all logits to normalize the probabilities.

Example:

import numpy as np

def softmax(x):
    exp_x = np.exp(x - np.max(x))  # For numerical stability
    return exp_x / np.sum(exp_x)

logits = np.array([2.0, 1.0, 0.1])
print(softmax(logits))  # Output: [0.659, 0.242, 0.099]

2️⃣ LogSoftmax

Purpose: LogSoftmax is the logarithmic version of Softmax. Instead of returning probabilities, it returns logarithms of the probabilities.
Output: A vector of log-probabilities. This is more numerically stable, especially when calculating loss functions like cross-entropy.
Use Case: Commonly used when log-probabilities are needed, especially for more stable loss function computations (e.g., in combination with negative log-likelihood loss).

Formula:

LogSoftmax(xi)=log⁡(exi∑jexj)=xi−log⁡(∑jexj)\text{LogSoftmax}(x_i) = \log\left(\frac{e^{x_i}}{\sum_{j} e^{x_j}}\right) = x_i – \log\left(\sum_{j} e^{x_j}\right)LogSoftmax(xi)=log(∑jexjexi)=xi−log(j∑exj)

Where:

xix_ixi is the raw score (logit) for class iii,
The log of the sum is computed for normalization.

Example:

import numpy as np

def logsoftmax(x):
    max_x = np.max(x)  # For numerical stability
    log_sum_exp = np.log(np.sum(np.exp(x - max_x)))
    return x - log_sum_exp

logits = np.array([2.0, 1.0, 0.1])
print(logsoftmax(logits))  # Output: [-0.417, -1.417, -2.417]

🔑 Key Differences

Feature	Softmax	LogSoftmax
Output	Probabilities (values between 0 and 1)	Logarithmic probabilities (log values)
Purpose	Converts logits to probabilities	Converts logits to log-probabilities
Formula	exi∑jexj\frac{e^{x_i}}{\sum_{j} e^{x_j}}∑jexjexi	xi−log⁡(∑jexj)x_i – \log\left(\sum_{j} e^{x_j}\right)xi−log(∑jexj)
Numerical Stability	Less stable, especially for large logits	More stable, avoids overflow/underflow
Use Case	Multi-class classification output	Typically used with cross-entropy loss and log-likelihood calculations
Efficiency	Can be less efficient for large logits	More efficient when used with loss functions like cross-entropy loss

🛠️ When to Use Each?

Use Softmax:
- When you need actual probabilities (values between 0 and 1) for interpretation or decision-making in classification tasks.
Use LogSoftmax:
- When you need logarithmic probabilities and want numerical stability for tasks like cross-entropy loss, which requires log-probabilities for the loss calculation.

Which One to Use?

Softmax is more suitable when you are directly interpreting the output as probabilities.
LogSoftmax is preferred when you are using log-probabilities in loss calculations, especially when you need better numerical stability (e.g., using cross-entropy loss with logits).

Let me know if you need further clarification!

ApexDelight