Tanh vs Sigmoid: Which is Better?
Both Tanh (Hyperbolic Tangent) and Sigmoid are activation functions commonly used in neural networks. However, they have key differences in their range, gradient behavior, and suitability for deep learning.
1๏ธโฃ Tanh (Hyperbolic Tangent)
- Formula: f(x)=exโeโxex+eโxf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+eโxexโeโxโ
- Range: (-1, 1)
- Behavior:
- Outputs values between -1 and 1 (zero-centered).
- Smooth and differentiable everywhere.
- Advantages:
โ Zero-centered output, making optimization easier.
โ Works well when inputs contain both positive and negative values. - Disadvantages:
โ Vanishing Gradient Problem: For large โฃxโฃ|x|โฃxโฃ, gradients become very small, slowing learning.
Example in PyTorch:
import torch
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
tanh_output = torch.tanh(x)
print(tanh_output) # tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])
2๏ธโฃ Sigmoid (Logistic Function)
- Formula: f(x)=11+eโxf(x) = \frac{1}{1 + e^{-x}}f(x)=1+eโx1โ
- Range: (0, 1)
- Behavior:
- Outputs values between 0 and 1, making it useful for probability-based tasks.
- Maps large negative values to near 0 and large positive values to near 1.
- Advantages:
โ Good for binary classification problems.
โ Outputs can be interpreted as probabilities. - Disadvantages:
โ Vanishing Gradient Problem: Large or small values have very small gradients, slowing training.
โ Not zero-centered: Can cause slower convergence in deep networks.
Example in PyTorch:
sigmoid_output = torch.sigmoid(x)
print(sigmoid_output) # tensor([0.1192, 0.2689, 0.5000, 0.7311, 0.8808])
๐ Key Differences
| Feature | Tanh | Sigmoid |
|---|---|---|
| Formula | f(x)=exโeโxex+eโxf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+eโxexโeโxโ | f(x)=11+eโxf(x) = \frac{1}{1 + e^{-x}}f(x)=1+eโx1โ |
| Range | (-1, 1) | (0, 1) |
| Zero-centered? | โ Yes | โ No |
| Vanishing Gradient? | โ Yes | โ Yes (worse than Tanh) |
| Best for | Hidden layers | Output layer for binary classification |
| Probability Interpretation? | โ No | โ Yes |
๐ ๏ธ When to Use Each?
- Use Tanh in hidden layers because it’s zero-centered and allows better gradient flow.
- Use Sigmoid in the output layer for binary classification when probabilities are needed.
๐ Which is Better?
- Tanh is generally better than Sigmoid for hidden layers because itโs zero-centered.
- Sigmoid is still useful for binary classification output layers but is rarely used in hidden layers due to its vanishing gradient issue.
Let me know if you need further clarification! ๐