Tanh vs Softmax: Which is Better?
Both Tanh (Hyperbolic Tangent) and Softmax are activation functions, but they serve different purposes in machine learning.
1️⃣ Tanh (Hyperbolic Tangent)
- Formula: tanh(x)=ex−e−xex+e−x\tanh(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}tanh(x)=ex+e−xex−e−x
- Range: (-1, 1)
- Behavior:
- Outputs values between -1 and 1 (zero-centered).
- Works well for hidden layers in deep networks.
- Derivative: ddxtanh(x)=1−tanh2(x)\frac{d}{dx} \tanh(x) = 1 – \tanh^2(x)dxdtanh(x)=1−tanh2(x)
- Advantages:
✅ Zero-centered output → faster training in deep networks.
✅ Handles negative inputs better than sigmoid. - Disadvantages:
❌ Vanishing gradient problem for large/small values.
Example in PyTorch:
import torch
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
tanh_output = torch.tanh(x)
print(tanh_output) # tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])
2️⃣ Softmax
- Formula: Softmax(xi)=exi∑exj\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum e^{x_j}}Softmax(xi)=∑exjexi
- Range: (0, 1)
- Behavior:
- Outputs a probability distribution (sum = 1).
- Used in multi-class classification.
- Derivative: ∂Si∂xj=Si(δij−Sj)\frac{\partial S_i}{\partial x_j} = S_i (\delta_{ij} – S_j)∂xj∂Si=Si(δij−Sj)
- Advantages:
✅ Converts raw scores into probabilities.
✅ Ensures sum of outputs is 1. - Disadvantages:
❌ Sensitive to large input values → may need normalization.
Example in PyTorch:
import torch.nn.functional as F
x = torch.tensor([2.0, 1.0, 0.1])
softmax_output = F.softmax(x, dim=0)
print(softmax_output) # tensor([0.6590, 0.2424, 0.0986])
🔑 Key Differences
Feature | Tanh | Softmax |
---|---|---|
Formula | ex−e−xex+e−x\frac{e^x – e^{-x}}{e^x + e^{-x}}ex+e−xex−e−x | exi∑exj\frac{e^{x_i}}{\sum e^{x_j}}∑exjexi |
Range | (-1, 1) | (0, 1) |
Use Case | Hidden layers | Output layer (multi-class classification) |
Zero-centered? | ✅ Yes | ❌ No |
Gradient Issues? | ✅ Vanishing gradient | ✅ Sensitive to large values |
Probability Interpretation? | ❌ No | ✅ Yes |
🛠️ When to Use Each?
- Use Tanh in hidden layers when you want zero-centered values.
- Use Softmax in the output layer for multi-class classification to get probability scores.
🚀 Which is Better?
- For hidden layers → Tanh is better.
- For multi-class classification output → Softmax is better.
Let me know if you need further clarification! 🚀