• March 20, 2025

ReLU vs Tanh: Which is Better Activation Function?

Both ReLU (Rectified Linear Unit) and Tanh (Hyperbolic Tangent) are widely used activation functions, but they behave differently and are suited for different scenarios.


1๏ธโƒฃ ReLU (Rectified Linear Unit)

  • Formula: f(x)=maxโก(0,x)f(x) = \max(0, x)f(x)=max(0,x)
  • Behavior:
    • Outputs x if x>0x > 0x>0, otherwise 0.
    • Non-linear but suppresses negative values to zero.
  • Advantages:
    โœ… Computationally efficient (simple max operation).
    โœ… Helps avoid vanishing gradients (compared to sigmoid/tanh).
  • Disadvantages:
    โŒ Dying ReLU Problem: Some neurons may stop learning if they always output 0 for negative values.

Example in PyTorch:

import torch.nn.functional as F
import torch

x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
relu_output = F.relu(x)
print(relu_output) # tensor([0., 0., 0., 1., 2.])

2๏ธโƒฃ Tanh (Hyperbolic Tangent)

  • Formula: f(x)=exโˆ’eโˆ’xex+eโˆ’xf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+eโˆ’xexโˆ’eโˆ’xโ€‹
  • Behavior:
    • Output range: (-1, 1) (zero-centered).
    • Smooth and differentiable everywhere.
  • Advantages:
    โœ… Zero-centered output helps in faster learning.
    โœ… Suitable for cases where both positive and negative values matter.
  • Disadvantages:
    โŒ Vanishing Gradient Problem: For large values of xxx, gradients become very small, slowing learning.
    โŒ More computationally expensive than ReLU.

Example in PyTorch:

tanh_output = torch.tanh(x)
print(tanh_output) # tensor([-0.9640, -0.7616, 0.0000, 0.7616, 0.9640])

๐Ÿ”‘ Key Differences

FeatureReLUTanh
Formulaf(x)=maxโก(0,x)f(x) = \max(0, x)f(x)=max(0,x)f(x)=exโˆ’eโˆ’xex+eโˆ’xf(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}}f(x)=ex+eโˆ’xexโˆ’eโˆ’xโ€‹
Range[0,โˆž)[0, \infty)[0,โˆž)(โˆ’1,1)(-1, 1)(โˆ’1,1)
Gradient1 for x>0x > 0x>0, 0 for xโ‰ค0x \leq 0xโ‰ค0Approaches 0 for large (
Zero-centered?โŒ Noโœ… Yes
Computational Costโœ… Faster (simple max)โŒ Slower (exponentials involved)
Best forDeep networks, avoiding vanishing gradientShallow networks, when negative values matter

๐Ÿ› ๏ธ When to Use Each?

  • Use ReLU in deep networks where vanishing gradients are a problem.
  • Use Tanh in shallow networks or when values need to be zero-centered.

๐Ÿš€ Which is Better?

  • ReLU is better for deep learning because it avoids vanishing gradients.
  • Tanh is useful when both positive and negative values are needed but is not ideal for very deep networks.

Let me know if you need further clarification! ๐Ÿš€

Leave a Reply

Your email address will not be published. Required fields are marked *