Loss functions

Loss functions

Loss functions are a way to quantify the performance of an algorithm. They are feedback mechanisms that demonstrate how well your algorithm has understood its learning dataset. If the value is high it is making a large loss which means that the algorithm is not able to capture majority data. Our goal is to minimize the loss function.

Loss functions in deep learning capture the mathematical relationship between the input parameters (weights and biases of the neural network) and the output results.

Loss Functions are not Cost Functions.

Loss functions are derived on the basis of a single data point, while a cost function is the aggregate of all the loss functions over the entire dataset. The cost function is what the optimization algorithm minimizes during the training process.

Regression Losses

Loss FunctionMean Squared Error (MSE)Mean Absolute Error (MAE)Huber Loss
Definitionaverage of squared difference between predictions and actual observations.the average of sum of absolute differences between predictions and actual observations.takes the advantageous characteristics of the MAE and MSE functions and combines them into a single loss function.
mathematicallyMSE = (1/n) * Σ(yᵢ - ȳ)²MAE = (1/n) * Σyᵢ - ȳ
Also known asL2 lossL1 losssmooth absolute error
Use CaseUse Mean Squared Error when you desire to have large errors penalized more than smaller ones.Suitable when the distribution of errors is expected to be asymmetric or when you want to focus on the magnitude of errors without emphasizing outliers.The crucial parameter in the Huber loss is denoted as delta (δ), acting as a threshold that establishes the numerical limit for deciding whether the loss should be calculated using a quadratic or linear approach.
AdvantageSimple and commonly usedRobust to outliersThe hybrid nature of Huber Loss makes it less sensitive to outliers, just like MAE, but also penalizes minor errors within the data sample, similar to MSE.
DisadvantageSensitive to outliersOptimization may be more challenging because the gradient is constant for all errors, making it less informative for the direction of parameter updates.Requires tuning of the δ parameter

the MAE and MSE are errors that are efficiently explained in the table. Lets continue with understanding the Huber Loss is detail.

Huber Loss

The mathematical equation for Huber Loss is as follows:

The Huber Loss has two modes depending on the delta parameter. The modes are linear or quadratic.

Quadratic Component for Small Errors (1/2) * (f(x) - y)^2 : characterizes the advantages of MSE that penalize outliers; within Huber Loss, this is applied to errors smaller than delta, which ensures a more accurate prediction from the model.

Linear Component for Large Errorsδ * |f(x) - y| - (1/2) * δ^2: utilizes the linear calculation of loss similar to MAE, where there is less sensitivity to the error size to ensure the trained model isn’t over-penalizing large errors, especially if the dataset contains outliers or unlikely-to-occur data samples.

If you like the blog so far make sure to like and share it!

Classification Losses

Loss FunctionBinary Cross-Entropy Loss or Log LossCategorical Cross-Entropy LossHinge Loss
Definitionmeasures how far off a machine learning model's predictions are from the actual target values. It does this by summing up the negative logarithm of the predicted probabilities.This loss function uses the negative logarithm of the predicted probability assigned to the correct class for each sample and sums these values across all samples. It is an extension of Binary Cross-Entropy Loss to handle more than two classes.quantifies the classification error of a machine learning model by measuring the distance between its predictions and the decision boundary.
Use Caselogistic regression problems and in training ANNs designed to predict the likelihood of a data sample belonging to a class and leverage the sigmoid activation function internally.multiclass classificationsmaximum margin classifications
AdvantagesStandard choice for binary classification tasksCommon choice for multiclass classificationEffective for SVMs, encourages robust decision boundaries
DisadvantagesMay struggle with class imbalances and noisy labelsRequires one-hot encoding of class labelsNot differentiable, which can limit certain optimization methods

References

https://www.datacamp.com/tutorial/loss-function-in-machine-learning

Conclusion

And there you have it! We have covered the fundamentals of loss functions completely. I hope this blog serves as a good place for quick reviews. Like share and comment!