Tanh vs Sigmoid: Which is Better?

Both Tanh (Hyperbolic Tangent) and Sigmoid are activation functions commonly used in neural networks. However, they have key differences in their range, gradient behavior, and suitability for deep learning.

1️⃣ Tanh (Hyperbolic Tangent)

Formula: f(x)=ex−e−xex+e−xf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+e−xex−e−x
Range: (-1, 1)
Behavior:
- Outputs values between -1 and 1 (zero-centered).
- Smooth and differentiable everywhere.
Advantages:
✅ Zero-centered output, making optimization easier.
✅ Works well when inputs contain both positive and negative values.
Disadvantages:
❌ Vanishing Gradient Problem: For large ∣x∣|x|∣x∣, gradients become very small, slowing learning.

Example in PyTorch:

import torch
x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
tanh_output = torch.tanh(x)
print(tanh_output)  # tensor([-0.9640, -0.7616,  0.0000,  0.7616,  0.9640])

2️⃣ Sigmoid (Logistic Function)

Formula: f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1
Range: (0, 1)
Behavior:
- Outputs values between 0 and 1, making it useful for probability-based tasks.
- Maps large negative values to near 0 and large positive values to near 1.
Advantages:
✅ Good for binary classification problems.
✅ Outputs can be interpreted as probabilities.
Disadvantages:
❌ Vanishing Gradient Problem: Large or small values have very small gradients, slowing training.
❌ Not zero-centered: Can cause slower convergence in deep networks.

Example in PyTorch:

sigmoid_output = torch.sigmoid(x)
print(sigmoid_output)  # tensor([0.1192, 0.2689, 0.5000, 0.7311, 0.8808])

🔑 Key Differences

Feature	Tanh	Sigmoid
Formula	f(x)=ex−e−xex+e−xf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+e−xex−e−x	f(x)=11+e−xf(x) = \frac{1}{1 + e^{-x}}f(x)=1+e−x1
Range	(-1, 1)	(0, 1)
Zero-centered?	✅ Yes	❌ No
Vanishing Gradient?	✅ Yes	✅ Yes (worse than Tanh)
Best for	Hidden layers	Output layer for binary classification
Probability Interpretation?	❌ No	✅ Yes

🛠️ When to Use Each?

Use Tanh in hidden layers because it’s zero-centered and allows better gradient flow.
Use Sigmoid in the output layer for binary classification when probabilities are needed.

🚀 Which is Better?

Tanh is generally better than Sigmoid for hidden layers because it’s zero-centered.
Sigmoid is still useful for binary classification output layers but is rarely used in hidden layers due to its vanishing gradient issue.

Let me know if you need further clarification! 🚀

ApexDelight

Tanh vs Sigmoid: Which is Better?

1️⃣ Tanh (Hyperbolic Tangent)

Example in PyTorch:

2️⃣ Sigmoid (Logistic Function)

Example in PyTorch:

🔑 Key Differences

🛠️ When to Use Each?

🚀 Which is Better?

Leave a Reply Cancel reply