Quantcast
  • Register
PhysicsOverflow is a next-generation academic platform for physicists and astronomers, including a community peer review system and a postgraduate-level discussion forum analogous to MathOverflow.

Welcome to PhysicsOverflow! PhysicsOverflow is an open platform for community peer review and graduate-level Physics discussion.

Please help promote PhysicsOverflow ads elsewhere if you like it.

News

New printer friendly PO pages!

Migration to Bielefeld University was successful!

Please vote for this year's PhysicsOverflow ads!

Please do help out in categorising submissions. Submit a paper to PhysicsOverflow!

... see more

Tools for paper authors

Submit paper
Claim Paper Authorship

Tools for SE users

Search User
Reclaim SE Account
Request Account Merger
Nativise imported posts
Claim post (deleted users)
Import SE post

Users whose questions have been imported from Physics Stack Exchange, Theoretical Physics Stack Exchange, or any other Stack Exchange site are kindly requested to reclaim their account and not to register as a new user.

Public \(\beta\) tools

Report a bug with a feature
Request a new functionality
404 page design
Send feedback

Attributions

(propose a free ad)

Site Statistics

145 submissions , 122 unreviewed
3,930 questions , 1,398 unanswered
4,848 answers , 20,603 comments
1,470 users with positive rep
501 active unimported users
More ...

What is the interpretatation of individual contributions to the Shannon entropy?

+ 4 like - 0 dislike
75 views

If $X=\{ x_1,x_2,\dots,x_n\}$ are assigned probabilities $p(x_i)$, then the entropy is defined as

$\sum_{i=1}^n\ p(x_i)\,\cdot\left(-\log p(x_i)\right).$

One may call $I(x_i)=-\log p(x_i)$ the information associated with $x_i$ and consider the above an expectation value. In some systems it make sense to view $p$ as the rate of occurrence of $x_i$ and then high low $p(x_i)$ the "value of your surprise" whenever $x_i$ happens corresponds with $I(x_i)$ being larger. It's also worth noting that $p$ is a constant function, we get a Boltzmann-like situation.

Question: Now I wonder, given $\left|X\right|>1$, how I can interpret, for fixed indexed $j$ a single term $p(x_i)\,\cdot\left(-\log p(x_i)\right)$. What does this "$x_j^\text{th}$ contribution to the entropy" or "price" represent? What is $p\cdot\log(p)$ if there are also other probabilities.

enter image description here

Thoughts: It's zero if $p$ is one or zero. In the first case, the surprise of something that will occur with certainty is none and in the second case it will never occur and hence costs nothing. Now

$\left(-p\cdot\log(p)\right)'=\log(\frac{1}{p})-1.$

With respect to $p$, The function has a maximum which, oddly, is at the same time a fixed point, namely $\dfrac{1}{e}=0.368\dots$. That is to say, the maximal contribution of a single term to $p(x_i)\,\cdot\left(-\log p(x_i)\right)$ will arise if for some $x_j$, you have $p(x_j)\approx 37\%$.

My question arose when someone asked me what the meaning for $x^x$ having a minimum $x_0$ at $x_0=\dfrac{1}{e}$ is. This is naturally $e^{x\log(x)}$ and I gave an example about signal transfer. The extrema is the individual contribution with maximal entropy and I wanted to argue that, after optimization of encoding/minimization of the entropy, events that happen with a probability $p(x_j)\approx 37\%$ of the time will in total "most boring for you to send". The occur relatively often and the optimal length of encoding might not be too short. But I lack interpretation of the individual entropy-contribution to see if this idea makes sense, or what a better reading of it is.

It also relates to those information units, e.g. nat. One over $e$ is the minimum, weather you work base $e$ (with the natural log) or with $\log_2$, and $-\log_2(\dfrac{1}{e})=\ln(2)$.


edit: Related: I just stumbled upon $\frac{1}{e}$ as probability: 37% stopping rule.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
asked Feb 4, 2015 in Theoretical Physics by NikolajK (195 points) [ no revision ]
retagged Feb 15, 2015
Are you unhappy with the idea that the Shannon entropy is the "average information"? That is, the expectation value of the random variable $I(X)$. In this case $-p_i\log p_i$ is just the weighted contribution of the event $x_i$ to this average.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Mark Mitchison
@MarkMitchison: To answer your question: No, I'm not unhappy with that interpretation for the whole sum (I've pointed that it takes the form of an expectation value.)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

1 Answer

+ 2 like - 0 dislike

This is a bit of a negative answer, but consider instead an expectation of some other quantity, such as the energy: $$ \langle E \rangle = \sum_i p_i E_i. $$ Now, it's obvious what $E_i$ means - it's the energy of state $i$ - but what does $p_iE_i$ mean? The answer is not very much really - it's the contribution of state $i$ to the expected energy, but it's very rarely if ever useful to consider this except in the context of summing up all the states' contributions.

In the context of information theory, $-p_i\log p_i$ is the same. $-\log p_i$ is the meaningful thing - it's the "surprisal", or information gained upon learning that state $i$ is in fact the true state. $-p_i\log p_i$ is the contribution of state $i$ to the Shannon entropy, but it isn't really meaningful except in the context of summing up all the contributions from all the states.

In particular, as far as I've ever been able to see, the value that maximises it, $1/e$, isn't a particularly special probability in information theory terms. The reason is that you always have to add up the contributions from the other states as well, and this changes the maximum.

In particular, for a two-state system, there is another state whose probability has to be $1-p$. Consequently, its Shannon entropy is given by $H_2 = -p\log p - (1-p)\log (1-p)$, and this function has its maximum not at $1/e$ but at $1/2$.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
answered Feb 5, 2015 by Nathaniel (495 points) [ no revision ]
Most voted comments show all comments
I'm not sure if your answer is more thant "I don't know an interpretation", but thanks for the response. Regarding the last bit, I'm also looking out for an interpretation of $\zeta(s)=\sum_{i=1}^Np_i^{-s}$, which has $H=\zeta'(-1)$. But that's another story :)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
@NikolajK it's not so much "I don't know an interpretation" (although I don't) as "here's a reason why you wouldn't expect there to be an interpretation."

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
@NikolajK your other thing seems related to the Rényi entropy. I do know a nice interpretation of the Rényi entropy, though its exact relation to your formula would need some thought.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Yeah, I was thinking about that argument for a second. I guess "I don't know anyone who knows an interpretation for this related quantity either" is indeed more information. Did you intent to post two different links? There are indeed several q-analogs, e.g Tsallis entropy.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
it's not just a related quantity, it's a more general one that includes yours, but anyway. Yes I meant to post two links. The second is an interpretation of the Renti entropy. I don't know a good interpretation of the Tsallis entropy, it always seemed a bit arbitrary to me.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Most recent comments show all comments
Interesting - thanks. I've thought of another case where it comes up as well, which I've been meaning to post as an answer. Let me see if I have time to do that now.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Aah, I'm not sure if it works. My idea was to set up a situation where you can send either signal $A$ or $B$, but where there's a cost to send one of the signals but not the other. Then by trying to maximise (total information transmitted)/(expected cost) you might end up maximising $-p(A)\log p(A)$ to get $p(A)=1/e$ as the optimum. But the exact thing I thought of doesn't work, so I need to think more about it.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel

Your answer

Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
Please consult the FAQ for as to how to format your post.
This is the answer box; if you want to write a comment instead, please use the 'add comment' button.
Live preview (may slow down editor)   Preview
Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
Anti-spam verification:
If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:
p$\hbar$ysi$\varnothing$sOverflow
Then drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds).
To avoid this verification in future, please log in or register.




user contributions licensed under cc by-sa 3.0 with attribution required

Your rights
...