# What is the interpretatation of individual contributions to the Shannon entropy?

+ 5 like - 0 dislike
272 views

If $X=\{ x_1,x_2,\dots,x_n\}$ are assigned probabilities $p(x_i)$, then the entropy is defined as

$\sum_{i=1}^n\ p(x_i)\,\cdot\left(-\log p(x_i)\right).$

One may call $I(x_i)=-\log p(x_i)$ the information associated with $x_i$ and consider the above an expectation value. In some systems it make sense to view $p$ as the rate of occurrence of $x_i$ and then high low $p(x_i)$ the "value of your surprise" whenever $x_i$ happens corresponds with $I(x_i)$ being larger. It's also worth noting that $p$ is a constant function, we get a Boltzmann-like situation.

Question: Now I wonder, given $\left|X\right|>1$, how I can interpret, for fixed indexed $j$ a single term $p(x_i)\,\cdot\left(-\log p(x_i)\right)$. What does this "$x_j^\text{th}$ contribution to the entropy" or "price" represent? What is $p\cdot\log(p)$ if there are also other probabilities. Thoughts: It's zero if $p$ is one or zero. In the first case, the surprise of something that will occur with certainty is none and in the second case it will never occur and hence costs nothing. Now

$\left(-p\cdot\log(p)\right)'=\log(\frac{1}{p})-1.$

With respect to $p$, The function has a maximum which, oddly, is at the same time a fixed point, namely $\dfrac{1}{e}=0.368\dots$. That is to say, the maximal contribution of a single term to $p(x_i)\,\cdot\left(-\log p(x_i)\right)$ will arise if for some $x_j$, you have $p(x_j)\approx 37\%$.

My question arose when someone asked me what the meaning for $x^x$ having a minimum $x_0$ at $x_0=\dfrac{1}{e}$ is. This is naturally $e^{x\log(x)}$ and I gave an example about signal transfer. The extrema is the individual contribution with maximal entropy and I wanted to argue that, after optimization of encoding/minimization of the entropy, events that happen with a probability $p(x_j)\approx 37\%$ of the time will in total "most boring for you to send". The occur relatively often and the optimal length of encoding might not be too short. But I lack interpretation of the individual entropy-contribution to see if this idea makes sense, or what a better reading of it is.

It also relates to those information units, e.g. nat. One over $e$ is the minimum, weather you work base $e$ (with the natural log) or with $\log_2$, and $-\log_2(\dfrac{1}{e})=\ln(2)$.

edit: Related: I just stumbled upon $\frac{1}{e}$ as probability: 37% stopping rule.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
retagged Feb 15, 2015
Are you unhappy with the idea that the Shannon entropy is the "average information"? That is, the expectation value of the random variable $I(X)$. In this case $-p_i\log p_i$ is just the weighted contribution of the event $x_i$ to this average.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Mark Mitchison
@MarkMitchison: To answer your question: No, I'm not unhappy with that interpretation for the whole sum (I've pointed that it takes the form of an expectation value.)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

+ 2 like - 0 dislike

This is a bit of a negative answer, but consider instead an expectation of some other quantity, such as the energy: $$\langle E \rangle = \sum_i p_i E_i.$$ Now, it's obvious what $E_i$ means - it's the energy of state $i$ - but what does $p_iE_i$ mean? The answer is not very much really - it's the contribution of state $i$ to the expected energy, but it's very rarely if ever useful to consider this except in the context of summing up all the states' contributions.

In the context of information theory, $-p_i\log p_i$ is the same. $-\log p_i$ is the meaningful thing - it's the "surprisal", or information gained upon learning that state $i$ is in fact the true state. $-p_i\log p_i$ is the contribution of state $i$ to the Shannon entropy, but it isn't really meaningful except in the context of summing up all the contributions from all the states.

In particular, as far as I've ever been able to see, the value that maximises it, $1/e$, isn't a particularly special probability in information theory terms. The reason is that you always have to add up the contributions from the other states as well, and this changes the maximum.

In particular, for a two-state system, there is another state whose probability has to be $1-p$. Consequently, its Shannon entropy is given by $H_2 = -p\log p - (1-p)\log (1-p)$, and this function has its maximum not at $1/e$ but at $1/2$.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
answered Feb 5, 2015 by (495 points)
I'm not sure if your answer is more thant "I don't know an interpretation", but thanks for the response. Regarding the last bit, I'm also looking out for an interpretation of $\zeta(s)=\sum_{i=1}^Np_i^{-s}$, which has $H=\zeta'(-1)$. But that's another story :)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
@NikolajK it's not so much "I don't know an interpretation" (although I don't) as "here's a reason why you wouldn't expect there to be an interpretation."

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
@NikolajK your other thing seems related to the Rényi entropy. I do know a nice interpretation of the Rényi entropy, though its exact relation to your formula would need some thought.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Yeah, I was thinking about that argument for a second. I guess "I don't know anyone who knows an interpretation for this related quantity either" is indeed more information. Did you intent to post two different links? There are indeed several q-analogs, e.g Tsallis entropy.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
it's not just a related quantity, it's a more general one that includes yours, but anyway. Yes I meant to post two links. The second is an interpretation of the Renti entropy. I don't know a good interpretation of the Tsallis entropy, it always seemed a bit arbitrary to me.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Aah, I'm not sure if it works. My idea was to set up a situation where you can send either signal $A$ or $B$, but where there's a cost to send one of the signals but not the other. Then by trying to maximise (total information transmitted)/(expected cost) you might end up maximising $-p(A)\log p(A)$ to get $p(A)=1/e$ as the optimum. But the exact thing I thought of doesn't work, so I need to think more about it.
 Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead. To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL. Please consult the FAQ for as to how to format your post. This is the answer box; if you want to write a comment instead, please use the 'add comment' button. Live preview (may slow down editor)   Preview Your name to display (optional): Email me at this address if my answer is selected or commented on: Privacy: Your email address will only be used for sending these notifications. Anti-spam verification: If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:p$\hbar\varnothing$sicsOverflowThen drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds). To avoid this verification in future, please log in or register.