• March 20, 2025

ReLU vs Tanh: Which is Better Activation Function?

Both ReLU (Rectified Linear Unit) and Tanh (Hyperbolic Tangent) are widely used activation functions, but they behave differently and are suited for different scenarios.


1️⃣ ReLU (Rectified Linear Unit)

  • Formula: f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)
  • Behavior:
    • Outputs x if x>0x > 0x>0, otherwise 0.
    • Non-linear but suppresses negative values to zero.
  • Advantages:
    ✅ Computationally efficient (simple max operation).
    ✅ Helps avoid vanishing gradients (compared to sigmoid/tanh).
  • Disadvantages:
    Dying ReLU Problem: Some neurons may stop learning if they always output 0 for negative values.

Example in PyTorch:

import torch.nn.functional as F
import torch

x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
relu_output = F.relu(x)
print(relu_output) # tensor([0., 0., 0., 1., 2.])

2️⃣ Tanh (Hyperbolic Tangent)

  • Formula: f(x)=ex−e−xex+e−xf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+e−xex−e−x​
  • Behavior:
    • Output range: (-1, 1) (zero-centered).
    • Smooth and differentiable everywhere.
  • Advantages:
    Zero-centered output helps in faster learning.
    ✅ Suitable for cases where both positive and negative values matter.
  • Disadvantages:
    Vanishing Gradient Problem: For large values of xxx, gradients become very small, slowing learning.
    ❌ More computationally expensive than ReLU.

Example in PyTorch:

tanh_output = torch.tanh(x)
print(tanh_output) # tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])

🔑 Key Differences

FeatureReLUTanh
Formulaf(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)f(x)=ex−e−xex+e−xf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+e−xex−e−x​
Range[0,∞)[0, \infty)[0,∞)(−1,1)(-1, 1)(−1,1)
Gradient1 for x>0x > 0x>0, 0 for x≤0x \leq 0x≤0Approaches 0 for large (
Zero-centered?❌ No✅ Yes
Computational Cost✅ Faster (simple max)❌ Slower (exponentials involved)
Best forDeep networks, avoiding vanishing gradientShallow networks, when negative values matter

🛠️ When to Use Each?

  • Use ReLU in deep networks where vanishing gradients are a problem.
  • Use Tanh in shallow networks or when values need to be zero-centered.

🚀 Which is Better?

  • ReLU is better for deep learning because it avoids vanishing gradients.
  • Tanh is useful when both positive and negative values are needed but is not ideal for very deep networks.

Let me know if you need further clarification! 🚀

Leave a Reply

Your email address will not be published. Required fields are marked *