Tanh vs Sigmoid: Which is Better?
Both Tanh (Hyperbolic Tangent) and Sigmoid are activation functions commonly used in neural networks. However, they have key differences in their range, gradient behavior, and suitability for deep learning.
1️⃣ Tanh (Hyperbolic Tangent)
- Formula: f(x)=ex−e−xex+e−xf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+e−xex−e−x
- Range: (-1, 1)
- Behavior:
- Outputs values between -1 and 1 (zero-centered).
- Smooth and differentiable everywhere.
- Advantages:
✅ Zero-centered output, making optimization easier.
✅ Works well when inputs contain both positive and negative values. - Disadvantages:
❌ Vanishing Gradient Problem: For large ∣x∣|x|∣x∣, gradients become very small, slowing learning.
Example in PyTorch:
import torch
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
tanh_output = torch.tanh(x)
print(tanh_output) # tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])
2️⃣ Sigmoid (Logistic Function)
- Formula: f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1
- Range: (0, 1)
- Behavior:
- Outputs values between 0 and 1, making it useful for probability-based tasks.
- Maps large negative values to near 0 and large positive values to near 1.
- Advantages:
✅ Good for binary classification problems.
✅ Outputs can be interpreted as probabilities. - Disadvantages:
❌ Vanishing Gradient Problem: Large or small values have very small gradients, slowing training.
❌ Not zero-centered: Can cause slower convergence in deep networks.
Example in PyTorch:
sigmoid_output = torch.sigmoid(x)
print(sigmoid_output) # tensor([0.1192, 0.2689, 0.5000, 0.7311, 0.8808])
🔑 Key Differences
Feature | Tanh | Sigmoid |
---|---|---|
Formula | f(x)=ex−e−xex+e−xf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+e−xex−e−x | f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1 |
Range | (-1, 1) | (0, 1) |
Zero-centered? | ✅ Yes | ❌ No |
Vanishing Gradient? | ✅ Yes | ✅ Yes (worse than Tanh) |
Best for | Hidden layers | Output layer for binary classification |
Probability Interpretation? | ❌ No | ✅ Yes |
🛠️ When to Use Each?
- Use Tanh in hidden layers because it’s zero-centered and allows better gradient flow.
- Use Sigmoid in the output layer for binary classification when probabilities are needed.
🚀 Which is Better?
- Tanh is generally better than Sigmoid for hidden layers because it’s zero-centered.
- Sigmoid is still useful for binary classification output layers but is rarely used in hidden layers due to its vanishing gradient issue.
Let me know if you need further clarification! 🚀