• March 20, 2025

Tanh vs Softmax: Which is Better?

Both Tanh (Hyperbolic Tangent) and Softmax are activation functions, but they serve different purposes in machine learning.


1️⃣ Tanh (Hyperbolic Tangent)

  • Formula: tanh⁡(x)=ex−e−xex+e−x\tanh(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}tanh(x)=ex+e−xex−e−x​
  • Range: (-1, 1)
  • Behavior:
    • Outputs values between -1 and 1 (zero-centered).
    • Works well for hidden layers in deep networks.
  • Derivative: ddxtanh⁡(x)=1−tanh⁡2(x)\frac{d}{dx} \tanh(x) = 1 – \tanh^2(x)dxd​tanh(x)=1−tanh2(x)
  • Advantages:
    ✅ Zero-centered output → faster training in deep networks.
    ✅ Handles negative inputs better than sigmoid.
  • Disadvantages:
    ❌ Vanishing gradient problem for large/small values.

Example in PyTorch:

import torch
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
tanh_output = torch.tanh(x)
print(tanh_output) # tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])

2️⃣ Softmax

  • Formula: Softmax(xi)=exi∑exj\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum e^{x_j}}Softmax(xi​)=∑exj​exi​​
  • Range: (0, 1)
  • Behavior:
    • Outputs a probability distribution (sum = 1).
    • Used in multi-class classification.
  • Derivative: ∂Si∂xj=Si(δij−Sj)\frac{\partial S_i}{\partial x_j} = S_i (\delta_{ij} – S_j)∂xj​∂Si​​=Si​(δij​−Sj​)
  • Advantages:
    ✅ Converts raw scores into probabilities.
    ✅ Ensures sum of outputs is 1.
  • Disadvantages:
    ❌ Sensitive to large input values → may need normalization.

Example in PyTorch:

import torch.nn.functional as F
x = torch.tensor([2.0, 1.0, 0.1])
softmax_output = F.softmax(x, dim=0)
print(softmax_output) # tensor([0.6590, 0.2424, 0.0986])

🔑 Key Differences

FeatureTanhSoftmax
Formulaex−e−xex+e−x\frac{e^x – e^{-x}}{e^x + e^{-x}}ex+e−xex−e−x​exi∑exj\frac{e^{x_i}}{\sum e^{x_j}}∑exj​exi​​
Range(-1, 1)(0, 1)
Use CaseHidden layersOutput layer (multi-class classification)
Zero-centered?✅ Yes❌ No
Gradient Issues?✅ Vanishing gradient✅ Sensitive to large values
Probability Interpretation?❌ No✅ Yes

🛠️ When to Use Each?

  • Use Tanh in hidden layers when you want zero-centered values.
  • Use Softmax in the output layer for multi-class classification to get probability scores.

🚀 Which is Better?

  • For hidden layers → Tanh is better.
  • For multi-class classification output → Softmax is better.

Let me know if you need further clarification! 🚀

Leave a Reply

Your email address will not be published. Required fields are marked *