Let's posit an event E, which you assume will occur with probability 0.3
Then someone tells you it will actually occur with probability 0.5 How much information, in bits, does that message contain?
My textbook does not address this question. I have an idea regarding the solution, but seek other opinions/analysis...
Caveat: the naive formula, as found in any communications book, does not apply... i.e. log(1/p) = log 2 = 1 bit, is not it. That would be the answer, if E was assumed P = 0.5, and then you received a message that it had occurred. However, in this case, the message predicts E with P = 0.5, which you had a priori assigned 0.3
Further, you discount the accuracy of his message, assigning it a 80% chance of reliability. (hence 20% chance that your original 0.3 estimate is correct) Now how much information does his message convey?
If they specified the probability to 1 binary place after the decimal (i.e. 0.1 base 2) that would be 1-bit. If they specified it to 5 binary places (i.e. 0.10000 base 2) that would be 5-bits :-)
Information, in the Shannon sense, is always a "reduction of uncertainty". So you have to have some uncertainty to reduce. "Uncertainty" is the entropy of a probability distribution. You need an uncertainty over the probability of event E - a probability distribution over the probability of event E (a "prior" - e.g. the probability that the probability of event E is between 0.29 and 0.31 is 0.6 ...).
To make it concrete lets say event E is "coin comes up heads". You want to estimate the probability of E. You need an initial distribution over the interval [0, 1], which will change as evidence comes in. Because this is a distribution over a continuous, real-valued variable (the "bias" of the coin) it must be a probability *density* function - the probability of any specific value (like 0.3) is zero (infinitesimal). You can only assign a finite probability to an interval (or the union of intervals).
So you have some pdf (probability density function), which is zero outside [0, 1], nonnegative in [0, 1]. And whose integral from 0 to 1 = 1. To get the probability that the bias lies in a particular subinterval [a, b] you integrate the pdf over [a, b].
To get the entropy (in bits) of the pdf you integrate -pdf(x)log2(pdf(x)) with x ranging from 0 to 1.
Now some evidence comes in, a report that the probability of event E is 0.5. Still ill-formed - you need a means of updating the pdf given the report. You get a new pdf after the report, compute its entropy, subtract that entropy from the entropy of the old pdf, and that is the information in bits the report provides.
So lets say your original pdf was the uniform distribution: pdf(x) = 1, x in [0, 1]. This has an entropy of 0 bits, which may seem strange, but for continuous distributions only differences in entropy have meaning.
Now lets say that after the report pdf(x) = 32 for x in [31/64, 33/64], and zero elsewhere. This distribution has an entropy of -5 bits (which should now not seem strange); old entropy - new entropy = 0 - -5bits = 5 bits. The report has provided 5 bits of information (it has reduced your uncertainty by 5 bits). Intuitively, it has shrunk the interval of uncertainty by a factor of 32.
It's not the formula that's naive, but its application.
Right.
That's a little muddled. You need an estimate of P(E) before the report and an estimate of P(E) after the report, where "estimate" is a pdf.
Still muddled. Lets go back to flipping coins. Say you start with the uniform distribution. You flip the coin 10 times and it comes up heads 3 times. Calculate the pdf and its entropy. Now you get a report "I flipped that same coin 40 times, and I got 20 heads". Pool the data, calulate the new pdf and its entropy, subtract new entropy from old. Exercise left to reader.
Hehe.
But, I'm no expert in information theory. Just know the basics.
Does the 'will actually occur with probability 0.5' constitute a measurement of the probability? And, what is the error distribution of that measurement? Or, is it a prediction from mathematical first principles (like, a theorem)?
And what was the error distribution of the first 'measurement'?
You have to integrate p log p over the full range (0 to 1 in the case of probability of an event) and think of the second measurement as increasing the (already present) information just like two data points has more info than one.
As Michael posted, the information content is a measure of the reduction in uncertainty (or increase in certainty). When E shows up, the reduction would have been from .3 to 1.0 which would have been log2(1/.3) - log2(1/1) or 1.74 bits.
But this update of the uncertainty would have been from .3 to .5 so it's log2(1/.3) - log2(1/.5) -log2(.3) - -log(.5) 1.74 - 1 .74 bits
So that update gave you .74 bits of additional information about event E (or about the state of the universe that produces event E).
When E shows up later, that acts as another update which carries 1 bit of additional information about E (i.e. you know for sure the universe was in the state required to produce the event E).
Now, if you had some measure of uncertainty about receiving the message from the person, you could also talk about how much information the receipt of the message gave you about that message (instead of about E).
I have not seen such issued explained in a textbook either. But what I write here makes logical sense to me.
You would still need to produce an updated probability about E. If this
80% chance the message was right meant the updated probability was:
.8 * .5 + .2 * .3 = 0.46
then the update would be from 0.30 to 0.46 so the information gain would be
log2(.46) - log2(.3) = -1.12 - 1.74 = .62 bits
If you received an update that caused your probability to be decreased, then that update would carry a negative amount of information about the upcoming event E, which means your uncertainty about the event had increased (your certainty had decreased) - the update took information about E away from you.
We, i.e. humans, do it all the time. It's called Bayesian statistics.
Points for honesty. However, I do wonder, why people will say "I'm no expert", then proceed to give an opinion... information theory may be useful here...
Dude, you've gone way off the deep end. No need for estimates, pdf, or anything. I posed the simplest case, practically page 1 of Shannon's paper: you learn that an event E with probability p occurred, what is the information in that message?
Ans: log(1/p)
The point is, we posit the probability p. It's given, no need for distributions etc.
No need for all that. The 'new pdf' is 0.5. It's given.
Not necessarily wrong, per se, but way way overcomplicated.
Again, estimates and statistics are unnecessary, a red herring.
egad I think you need to reduce the entropy of your thinking...
No, this is where it gets a bit tricky. The 'new event' of the message "p = .5" is not the event E, but the message itself. The message is the event!
There you go! Now: the event F = "the prob. of E is .5" has occurred. To determine its information, we need to know the conditional probability of F, given that we assumed p(E) = .3.
hmmmm...
Not bad, so far...
NEGATIVE information?!?! Someone is seriously confused - information is never less than zero.
Problem for the student: which message, in this example, would contain zero bits of information?
Bwahahahaha! Another good one to add to my collection ;-)
...Jim Thompson
-- | James E.Thompson, P.E. | mens | | Analog Innovations, Inc. | et | | Analog/Mixed-Signal ASIC's and Discrete Systems | manus | | Phoenix, Arizona Voice:(480)460-2350 | | | E-mail Address at Website Fax:(480)460-2142 | Brass Rat | |
formatting link
| 1962 | I love to cook with wine. Sometimes I even put it in the food.
I've seen lots of cases where the experts sit back and say nothing till some schmuck, like me, takes a stab at the problem and really screws it up. Then the the experts jump in and straighten out my sorry ass. And everyone is happy. ;-)
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.