ReLU vs Leaky ReLU: What is Difference?

Both ReLU (Rectified Linear Unit) and Leaky ReLU are popular activation functions in deep learning, mainly used in neural networks to introduce non-linearity. However, they handle negative values differently, which affects the training performance.

1️⃣ ReLU (Rectified Linear Unit)

Formula: f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)
Behavior:
- If x>0x > 0x>0, output is xxx (linear for positive values).
- If x≤0x \leq 0x≤0, output is 000 (completely suppresses negative values).
Advantages:
- Computationally efficient (simple max operation).
- Helps avoid vanishing gradients (better than sigmoid/tanh).
Disadvantages:
- Dying ReLU Problem: If many neurons output zero (especially for negative values), they may never recover, leading to “dead neurons.”

Example in PyTorch

import torch
import torch.nn.functional as F

x = torch.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
relu_output = F.relu(x)
print(relu_output)  # Output: tensor([0., 0., 0., 1., 2.])

2️⃣ Leaky ReLU

Formula: f(x)={x,x>0αx,x≤0f(x) = \begin{cases} x, & x > 0 \\ \alpha x, & x \leq 0 \end{cases}f(x)={x,αx,x>0x≤0 Where α\alphaα is a small positive constant (e.g., 0.01).
Behavior:
- Allows small negative outputs instead of zero, avoiding the dying ReLU problem.
Advantages:
- Prevents neurons from becoming inactive (dead).
- Helps gradient flow even for negative inputs.
Disadvantages:
- Introduces a small negative slope, which may not always be optimal.

Example in PyTorch

leaky_relu_output = F.leaky_relu(x, negative_slope=0.01)
print(leaky_relu_output)  # Output: tensor([-0.0200, -0.0100,  0.0000,  1.0000,  2.0000])

🔑 Key Differences

Feature	ReLU	Leaky ReLU
Formula	f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x)	f(x)=xf(x) = xf(x)=x if x>0x > 0x>0, else αx\alpha xαx
Negative Values	Zeroes out negatives	Allows small negative values
Dying ReLU Problem	Yes (neurons may stop learning)	No (neurons stay active)
Computational Cost	Fast (single max operation)	Slightly higher (multiplication for negative inputs)
Gradient Flow	Zero for negative inputs	Small gradient for negative inputs
Usage	Works well in most cases	Useful when many neurons become inactive

🛠️ When to Use Each?

Use ReLU when your model does not suffer from the dying ReLU problem.
Use Leaky ReLU when you notice neurons becoming inactive or your network is not learning well in deeper layers.

Which is Better?

ReLU is the default choice due to its simplicity and efficiency.
Leaky ReLU is better when dealing with dying neurons or if the network struggles with learning.

Let me know if you need further clarifications! 🚀

ApexDelight

ReLU vs Leaky ReLU: What is Difference?

1️⃣ ReLU (Rectified Linear Unit)

Example in PyTorch

2️⃣ Leaky ReLU

Example in PyTorch

🔑 Key Differences

🛠️ When to Use Each?

Which is Better?

Leave a Reply Cancel reply