• March 20, 2025

Softmax vs Logsoftmax: What is Difference?

Both Softmax and LogSoftmax are activation functions used in machine learning models, especially for classification tasks, but they serve different purposes. Letโ€™s compare them in terms of functionality, use cases, and performance.


1๏ธโƒฃ Softmax

  • Purpose: Softmax is used to convert raw scores (logits) into probabilities for multi-class classification. It ensures that the sum of the output probabilities equals 1.
  • Output: A vector of probabilities between 0 and 1 for each class.
  • Use Case: Most commonly used in the output layer of a classification model to assign probabilities to different classes.

Formula:

Si=exiโˆ‘jexjS_i = \frac{e^{x_i}}{\sum_{j} e^{x_j}}Siโ€‹=โˆ‘jโ€‹exjโ€‹exiโ€‹โ€‹

Where:

  • xix_ixiโ€‹ is the raw score (logit) for class iii,
  • The denominator sums over all logits to normalize the probabilities.

Example:

import numpy as np

def softmax(x):
exp_x = np.exp(x - np.max(x)) # For numerical stability
return exp_x / np.sum(exp_x)

logits = np.array([2.0, 1.0, 0.1])
print(softmax(logits)) # Output: [0.659, 0.242, 0.099]

2๏ธโƒฃ LogSoftmax

  • Purpose: LogSoftmax is the logarithmic version of Softmax. Instead of returning probabilities, it returns logarithms of the probabilities.
  • Output: A vector of log-probabilities. This is more numerically stable, especially when calculating loss functions like cross-entropy.
  • Use Case: Commonly used when log-probabilities are needed, especially for more stable loss function computations (e.g., in combination with negative log-likelihood loss).

Formula:

LogSoftmax(xi)=logโก(exiโˆ‘jexj)=xiโˆ’logโก(โˆ‘jexj)\text{LogSoftmax}(x_i) = \log\left(\frac{e^{x_i}}{\sum_{j} e^{x_j}}\right) = x_i – \log\left(\sum_{j} e^{x_j}\right)LogSoftmax(xiโ€‹)=log(โˆ‘jโ€‹exjโ€‹exiโ€‹โ€‹)=xiโ€‹โˆ’log(jโˆ‘โ€‹exjโ€‹)

Where:

  • xix_ixiโ€‹ is the raw score (logit) for class iii,
  • The log of the sum is computed for normalization.

Example:

import numpy as np

def logsoftmax(x):
max_x = np.max(x) # For numerical stability
log_sum_exp = np.log(np.sum(np.exp(x - max_x)))
return x - log_sum_exp

logits = np.array([2.0, 1.0, 0.1])
print(logsoftmax(logits)) # Output: [-0.417, -1.417, -2.417]

๐Ÿ”‘ Key Differences

FeatureSoftmaxLogSoftmax
OutputProbabilities (values between 0 and 1)Logarithmic probabilities (log values)
PurposeConverts logits to probabilitiesConverts logits to log-probabilities
Formulaexiโˆ‘jexj\frac{e^{x_i}}{\sum_{j} e^{x_j}}โˆ‘jโ€‹exjโ€‹exiโ€‹โ€‹xiโˆ’logโก(โˆ‘jexj)x_i – \log\left(\sum_{j} e^{x_j}\right)xiโ€‹โˆ’log(โˆ‘jโ€‹exjโ€‹)
Numerical StabilityLess stable, especially for large logitsMore stable, avoids overflow/underflow
Use CaseMulti-class classification outputTypically used with cross-entropy loss and log-likelihood calculations
EfficiencyCan be less efficient for large logitsMore efficient when used with loss functions like cross-entropy loss

๐Ÿ› ๏ธ When to Use Each?

  • Use Softmax:
    • When you need actual probabilities (values between 0 and 1) for interpretation or decision-making in classification tasks.
  • Use LogSoftmax:
    • When you need logarithmic probabilities and want numerical stability for tasks like cross-entropy loss, which requires log-probabilities for the loss calculation.

Which One to Use?

  • Softmax is more suitable when you are directly interpreting the output as probabilities.
  • LogSoftmax is preferred when you are using log-probabilities in loss calculations, especially when you need better numerical stability (e.g., using cross-entropy loss with logits).

Let me know if you need further clarification!

Leave a Reply

Your email address will not be published. Required fields are marked *