What is the interpretatation of individual contributions to the Shannon entropy?

2993 views

If $X=\{ x_1,x_2,\dots,x_n\}$ are assigned probabilities $p(x_i)$, then the entropy is defined as

$\sum_{i=1}^n\ p(x_i)\,\cdot\left(-\log p(x_i)\right).$

One may call $I(x_i)=-\log p(x_i)$ the information associated with $x_i$ and consider the above an expectation value. In some systems it make sense to view $p$ as the rate of occurrence of $x_i$ and then high low $p(x_i)$ the "value of your surprise" whenever $x_i$ happens corresponds with $I(x_i)$ being larger. It's also worth noting that $p$ is a constant function, we get a Boltzmann-like situation.

Question: Now I wonder, given $\left|X\right|>1$, how I can interpret, for fixed indexed $j$ a single term $p(x_i)\,\cdot\left(-\log p(x_i)\right)$. What does this "$x_j^\text{th}$ contribution to the entropy" or "price" represent? What is $p\cdot\log(p)$ if there are also other probabilities.

enter image description here

Thoughts: It's zero if $p$ is one or zero. In the first case, the surprise of something that will occur with certainty is none and in the second case it will never occur and hence costs nothing. Now

$\left(-p\cdot\log(p)\right)'=\log(\frac{1}{p})-1.$

With respect to $p$, The function has a maximum which, oddly, is at the same time a fixed point, namely $\dfrac{1}{e}=0.368\dots$. That is to say, the maximal contribution of a single term to $p(x_i)\,\cdot\left(-\log p(x_i)\right)$ will arise if for some $x_j$, you have $p(x_j)\approx 37\%$.

My question arose when someone asked me what the meaning for $x^x$ having a minimum $x_0$ at $x_0=\dfrac{1}{e}$ is. This is naturally $e^{x\log(x)}$ and I gave an example about signal transfer. The extrema is the individual contribution with maximal entropy and I wanted to argue that, after optimization of encoding/minimization of the entropy, events that happen with a probability $p(x_j)\approx 37\%$ of the time will in total "most boring for you to send". The occur relatively often and the optimal length of encoding might not be too short. But I lack interpretation of the individual entropy-contribution to see if this idea makes sense, or what a better reading of it is.

It also relates to those information units, e.g. nat. One over $e$ is the minimum, weather you work base $e$ (with the natural log) or with $\log_2$, and $-\log_2(\dfrac{1}{e})=\ln(2)$.

edit: Related: I just stumbled upon $\frac{1}{e}$ as probability: 37% stopping rule.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

asked Feb 4, 2015 in Theoretical Physics by NikolajK (200 points) [ no revision ]
retagged Feb 15, 2015

Are you unhappy with the idea that the Shannon entropy is the "average information"? That is, the expectation value of the random variable $I(X)$. In this case $-p_i\log p_i$ is just the weighted contribution of the event $x_i$ to this average.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Mark Mitchison

commented Feb 4, 2015 by Mark Mitchison (270 points) [ no revision ]

@MarkMitchison: To answer your question: No, I'm not unhappy with that interpretation for the whole sum (I've pointed that it takes the form of an expectation value.)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

commented Feb 5, 2015 by NikolajK (200 points) [ no revision ]

Your comment on this question:

To answer, leave an answer instead. Comments are usually for non-answers.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
To alert a user, please use the "@" command and remove spaces from the username, example, the user "John Doe" should be pinged as "@JohnDoe", while the user "Johndoe" should be pinged as "@Johndoe". The post author is always automatically pinged (unless you are the post author).
Please consult the FAQ for as to how to format your post.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

1 Answer

This is a bit of a negative answer, but consider instead an expectation of some other quantity, such as the energy: $$ \langle E \rangle = \sum_i p_i E_i. $$ Now, it's obvious what $E_i$ means - it's the energy of state $i$ - but what does $p_iE_i$ mean? The answer is not very much really - it's the contribution of state $i$ to the expected energy, but it's very rarely if ever useful to consider this except in the context of summing up all the states' contributions.

In the context of information theory, $-p_i\log p_i$ is the same. $-\log p_i$ is the meaningful thing - it's the "surprisal", or information gained upon learning that state $i$ is in fact the true state. $-p_i\log p_i$ is the contribution of state $i$ to the Shannon entropy, but it isn't really meaningful except in the context of summing up all the contributions from all the states.

In particular, as far as I've ever been able to see, the value that maximises it, $1/e$, isn't a particularly special probability in information theory terms. The reason is that you always have to add up the contributions from the other states as well, and this changes the maximum.

In particular, for a two-state system, there is another state whose probability has to be $1-p$. Consequently, its Shannon entropy is given by $H_2 = -p\log p - (1-p)\log (1-p)$, and this function has its maximum not at $1/e$ but at $1/2$.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel

answered Feb 5, 2015 by Nathaniel (495 points) [ no revision ]

I'm not sure if your answer is more thant "I don't know an interpretation", but thanks for the response. Regarding the last bit, I'm also looking out for an interpretation of $\zeta(s)=\sum_{i=1}^Np_i^{-s}$, which has $H=\zeta'(-1)$. But that's another story :)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

commented Feb 5, 2015 by NikolajK (200 points) [ no revision ]

@NikolajK it's not so much "I don't know an interpretation" (although I don't) as "here's a reason why you wouldn't expect there to be an interpretation."

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel

commented Feb 5, 2015 by Nathaniel (495 points) [ no revision ]

@NikolajK your other thing seems related to the Rényi entropy. I do know a nice interpretation of the Rényi entropy, though its exact relation to your formula would need some thought.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel

commented Feb 5, 2015 by Nathaniel (495 points) [ no revision ]

Yeah, I was thinking about that argument for a second. I guess "I don't know anyone who knows an interpretation for this related quantity either" is indeed more information. Did you intent to post two different links? There are indeed several q-analogs, e.g Tsallis entropy.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

commented Feb 5, 2015 by NikolajK (200 points) [ no revision ]

it's not just a related quantity, it's a more general one that includes yours, but anyway. Yes I meant to post two links. The second is an interpretation of the Renti entropy. I don't know a good interpretation of the Tsallis entropy, it always seemed a bit arbitrary to me.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel

commented Feb 5, 2015 by Nathaniel (495 points) [ no revision ]

What I meant is that the links point at the same page. You mean the Rényi entropy entropy is less arbitrary because of the interpretations pointed out in the paper?

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

commented Feb 5, 2015 by NikolajK (200 points) [ no revision ]

Oh sorry. The first was just a Wikipedia link. Yes, the point is that paper makes the Rényi entropy feel like a physically and informationally meaningful thing, whereas to me the Tsallis seems like a mysterious equation that comes from nowhere. (But if you know an interpretation I'd like to hear it.)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel

commented Feb 5, 2015 by Nathaniel (495 points) [ no revision ]

Just stumbled upon $\frac{1}{e}$ as probability: 37% stopping rule.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

commented Feb 14, 2015 by NikolajK (200 points) [ no revision ]

Interesting - thanks. I've thought of another case where it comes up as well, which I've been meaning to post as an answer. Let me see if I have time to do that now.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel

commented Feb 14, 2015 by Nathaniel (495 points) [ no revision ]

Aah, I'm not sure if it works. My idea was to set up a situation where you can send either signal $A$ or $B$, but where there's a cost to send one of the signals but not the other. Then by trying to maximise (total information transmitted)/(expected cost) you might end up maximising $-p(A)\log p(A)$ to get $p(A)=1/e$ as the optimum. But the exact thing I thought of doesn't work, so I need to think more about it.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel

commented Feb 14, 2015 by Nathaniel (495 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

Your answer

Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
Please consult the FAQ for as to how to format your post.
This is the answer box; if you want to write a comment instead, please use the 'add comment' button.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if my answer is selected or commented on:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:
p$\hbar$ysics$\varnothing$verflow
Then drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds).

Please complete the anti-spam verification

News

Tools for paper authors

Tools for SE users

Public \(\beta\) tools

Most popular tags

Site Statistics

What is the interpretatation of individual contributions to the Shannon entropy?

Your comment on this question:

Live Preview

Preview

1 Answer

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview

News

Tools for paper authors

Tools for SE users

Public \(\beta\) tools

Most popular tags

Related questions

Site Statistics

What is the interpretatation of individual contributions to the Shannon entropy?

Your comment on this question:

Live Preview

Preview

1 Answer

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview