Back to the index. Or to the chambers.
This article has 32 links. View as Cloud or List.
Loading ...
Planetmath Browser (2008—2009)
BSD licence | A django site
All content from PlanetMath.org, FDL
→ The original article on PlanetMath.org
Other Formats: LaTeX
Shannons Theorem Entropy
Definition (Discrete)
Let
be a discrete random variable
on a finite set
, with probability distribution
function
. The entropy
of
is
defined as
![]() |
(1) |
If
and
are random variables
on
and
respectively, the joint entropy of
and
is
Discussion
The Shannon entropy was first introduced by Shannon in 1948 in his landmark paper “A Mathematical Theory of Communication.” The entropy is a functional of the probability distribution function
Characterization
We write
as
. The Shannon entropy
satisfies the following properties.
- For any
,
is a continuous
and
symmetric function
on variables
,
.
- Event
of probability zero does not contribute to the entropy, i.e. for any
,
- Entropy is maximized when the probability distribution
is
uniform. For all
,
This follows from Jensen inequality,
- If
,
,
are
non-negative real numbers
summing
up to one, and
, then
If we partition the
outcomes of the random experiment into
groups, each group contains
elements, we can do the
experiment in two steps: first determine the group to which the
actual outcome belongs to, and second find the outcome in this
group. The probability that you will observe group
is
.
The conditional probability
distribution function
given group
is
. The entropy
is the entropy of the probability distribution conditioned on group
. Property 4 says that the total information is the sum
of the information you gain in the first step,
, and a weighted sum of the entropies conditioned on each
group.
Khinchin in 1957 showed that the only function satisfying the above assumptions is of the form:
Definition (Continuous)
Entropy in the continuous case is called differential entropy.
Discussion--Continuous Entropy
Despite its seductively analogous form, continuous entropy cannot be obtained as a limiting case of discrete entropy.
We wish to obtain a generally finite
measure as the “bin size” goes to zero. In the discrete case, the bin size
is the (implicit) width
of each of the
(finite or infinite) bins/buckets/states whose probabilities are the
. As we generalize to the continuous domain, we must make this width explicit.
To do this, start with a continuous function
discretized as shown in the figure:
![]() |
(2) |
![]() |
(3) |
We will denote
![]() |
(4) |
![]() |
(5) | |
![]() |
(6) |
![]() |
and |
(7) |
![]() |
![]() |
(8) |
![]() |
(9) |







and

![$\displaystyle h[f] = \lim_{\Delta \to 0} \left[H^{\Delta} + \log \Delta\right] = -\int_{-\infty}^{\infty} f(x) \log f(x) dx.$](http://myyn.org/static/assets/mathbrowser/article/2063/images/img56.png)