• March 20, 2025

ReLU vs Leaky ReLU: What is Difference?

Both ReLU (Rectified Linear Unit) and Leaky ReLU are popular activation functions in deep learning, mainly used in neural networks to introduce non-linearity. However, they handle negative values differently, which affects the training performance.


1️⃣ ReLU (Rectified Linear Unit)

  • Formula: f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)
  • Behavior:
    • If x>0x > 0x>0, output is xxx (linear for positive values).
    • If x≤0x \leq 0x≤0, output is 000 (completely suppresses negative values).
  • Advantages:
    • Computationally efficient (simple max operation).
    • Helps avoid vanishing gradients (better than sigmoid/tanh).
  • Disadvantages:
    • Dying ReLU Problem: If many neurons output zero (especially for negative values), they may never recover, leading to “dead neurons.”

Example in PyTorch

import torch
import torch.nn.functional as F

x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
relu_output = F.relu(x)
print(relu_output) # Output: tensor([0., 0., 0., 1., 2.])

2️⃣ Leaky ReLU

  • Formula: f(x)={x,x>0αx,x≤0f(x) = \begin{cases} x, & x > 0 \\ \alpha x, & x \leq 0 \end{cases}f(x)={x,αx,​x>0x≤0​ Where α\alphaα is a small positive constant (e.g., 0.01).
  • Behavior:
    • Allows small negative outputs instead of zero, avoiding the dying ReLU problem.
  • Advantages:
    • Prevents neurons from becoming inactive (dead).
    • Helps gradient flow even for negative inputs.
  • Disadvantages:
    • Introduces a small negative slope, which may not always be optimal.

Example in PyTorch

leaky_relu_output = F.leaky_relu(x, negative_slope=0.01)
print(leaky_relu_output) # Output: tensor([-0.0200, -0.0100, 0.0000, 1.0000, 2.0000])

🔑 Key Differences

FeatureReLULeaky ReLU
Formulaf(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)f(x)=xf(x) = xf(x)=x if x>0x > 0x>0, else αx\alpha xαx
Negative ValuesZeroes out negativesAllows small negative values
Dying ReLU ProblemYes (neurons may stop learning)No (neurons stay active)
Computational CostFast (single max operation)Slightly higher (multiplication for negative inputs)
Gradient FlowZero for negative inputsSmall gradient for negative inputs
UsageWorks well in most casesUseful when many neurons become inactive

🛠️ When to Use Each?

  • Use ReLU when your model does not suffer from the dying ReLU problem.
  • Use Leaky ReLU when you notice neurons becoming inactive or your network is not learning well in deeper layers.

Which is Better?

  • ReLU is the default choice due to its simplicity and efficiency.
  • Leaky ReLU is better when dealing with dying neurons or if the network struggles with learning.

Let me know if you need further clarifications! 🚀

Leave a Reply

Your email address will not be published. Required fields are marked *