KL Divergence - a lucid introduction

 

KL Divergence for Machine Learning

The objective of life is just to minimize a KL objective.

We often come across Kullback-Leibler Divergence or KL Divergence in machine learning literature for example in deep generative models like VAEs. I try to present a lucid explanation of KL Divergence- a super important mathematical concept in machine learning.

Simply put, KL Divergence measures the difference between two probability distributions.

I will go over the definitions and the various interpretations of KL Divergence and over time will try to argue the following fact: "Both the problems of supervised learning and reinforcement learning are simply minimizing the KL divergence objective."

What's the KL Divergence?

The Kullback-Leibler divergence (hereafter written as KL divergence) is a measure of how a probability distribution differs from another probability distribution. Classically, in Bayesian theory, there is some true distribution P(X) and we'd like to estimate P with an approximate distribution Q(X). In this context, the KL divergence measures the distance from the approximate distribution Q to the true distribution P.

Mathematically, consider two probability distributions P, Q on some space X. The Kullback-Leibler divergence from Q to P (written as DKL(PQ)).






















































KL Divergence - a lucid introduction

  KL Divergence for Machine Learning The objective of life is just to minimize a KL objective. We often come across Kullback-Leibler Diverge...