information theory problem

- M
- mark-t2
  
  Contact options for registered users
posted
17 years ago

Tue, Nov 7, 2006 3:19 AM

Let's posit an event E, which you assume will occur with probability 0.3

Then someone tells you it will actually occur with probability 0.5 How much information, in bits, does that message contain?

My textbook does not address this question. I have an idea regarding the solution, but seek other opinions/analysis...

Caveat: the naive formula, as found in any communications book, does not apply... i.e. log(1/p) = log 2 = 1 bit, is not it. That would be the answer, if E was assumed P = 0.5, and then you received a message that it had occurred. However, in this case, the message predicts E with P = 0.5, which you had a priori assigned 0.3

Further, you discount the accuracy of his message, assigning it a 80% chance of reliability. (hence 20% chance that your original 0.3 estimate is correct) Now how much information does his message convey?

Mark

- P
- Peter Webb
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 7, 2006 4:12 AM

I think its none. It either happens or it doesn't happen.

You can't apply statistical analysis to one non-repeated (or non-repeatable) observation.

Then again, I am no expert.

- D
- David Marcus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 7, 2006 6:08 AM

I think you can, e.g., Bayesian statistics. However, I don't know enough about information to answer the OP's question.

--
David Marcus

- A
- Andrew Holme
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 7, 2006 2:24 PM

If they specified the probability to 1 binary place after the decimal (i.e. 0.1 base 2) that would be 1-bit. If they specified it to 5 binary places (i.e. 0.10000 base 2) that would be 5-bits :-)

- A
- Andrew Holme
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 7, 2006 2:29 PM

I meant: after the binary point ....

- J
- J.A. Legris
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 7, 2006 2:37 PM

Superman! Help!

[Look! up in the sky! ...]

-- Joe Legris

- M
- Michael Olea
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Nov 7, 2006 5:33 PM

At this point the question is ill-posed.

Information, in the Shannon sense, is always a "reduction of uncertainty". So you have to have some uncertainty to reduce. "Uncertainty" is the entropy of a probability distribution. You need an uncertainty over the probability of event E - a probability distribution over the probability of event E (a "prior" - e.g. the probability that the probability of event E is between 0.29 and 0.31 is 0.6 ...).

To make it concrete lets say event E is "coin comes up heads". You want to estimate the probability of E. You need an initial distribution over the interval [0, 1], which will change as evidence comes in. Because this is a distribution over a continuous, real-valued variable (the "bias" of the coin) it must be a probability *density* function - the probability of any specific value (like 0.3) is zero (infinitesimal). You can only assign a finite probability to an interval (or the union of intervals).

So you have some pdf (probability density function), which is zero outside [0, 1], nonnegative in [0, 1]. And whose integral from 0 to 1 = 1. To get the probability that the bias lies in a particular subinterval [a, b] you integrate the pdf over [a, b].

To get the entropy (in bits) of the pdf you integrate -pdf(x)log2(pdf(x)) with x ranging from 0 to 1.

Now some evidence comes in, a report that the probability of event E is 0.5. Still ill-formed - you need a means of updating the pdf given the report. You get a new pdf after the report, compute its entropy, subtract that entropy from the entropy of the old pdf, and that is the information in bits the report provides.

So lets say your original pdf was the uniform distribution: pdf(x) = 1, x in [0, 1]. This has an entropy of 0 bits, which may seem strange, but for continuous distributions only differences in entropy have meaning.

Now lets say that after the report pdf(x) = 32 for x in [31/64, 33/64], and zero elsewhere. This distribution has an entropy of -5 bits (which should now not seem strange); old entropy - new entropy = 0 - -5bits = 5 bits. The report has provided 5 bits of information (it has reduced your uncertainty by 5 bits). Intuitively, it has shrunk the interval of uncertainty by a factor of 32.

It's not the formula that's naive, but its application.

Right.

That's a little muddled. You need an estimate of P(E) before the report and an estimate of P(E) after the report, where "estimate" is a pdf.

Still muddled. Lets go back to flipping coins. Say you start with the uniform distribution. You flip the coin 10 times and it comes up heads 3 times. Calculate the pdf and its entropy. Now you get a report "I flipped that same coin 40 times, and I got 20 heads". Pool the data, calulate the new pdf and its entropy, subtract new entropy from old. Exercise left to reader.

Hehe.

But, I'm no expert in information theory. Just know the basics.

-- Michael

- W
- whit3rd
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 8, 2006 4:00 AM

Does the 'will actually occur with probability 0.5' constitute a measurement of the probability? And, what is the error distribution of that measurement? Or, is it a prediction from mathematical first principles (like, a theorem)?

And what was the error distribution of the first 'measurement'?

You have to integrate p log p over the full range (0 to 1 in the case of probability of an event) and think of the second measurement as increasing the (already present) information just like two data points has more info than one.

- C
- Curt Welch
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Nov 8, 2006 8:13 PM

I think it works like this:

As Michael posted, the information content is a measure of the reduction in uncertainty (or increase in certainty). When E shows up, the reduction would have been from .3 to 1.0 which would have been log2(1/.3) - log2(1/1) or 1.74 bits.

But this update of the uncertainty would have been from .3 to .5 so it's log2(1/.3) - log2(1/.5) -log2(.3) - -log(.5) 1.74 - 1 .74 bits

So that update gave you .74 bits of additional information about event E (or about the state of the universe that produces event E).

When E shows up later, that acts as another update which carries 1 bit of additional information about E (i.e. you know for sure the universe was in the state required to produce the event E).

Now, if you had some measure of uncertainty about receiving the message from the person, you could also talk about how much information the receipt of the message gave you about that message (instead of about E).

I have not seen such issued explained in a textbook either. But what I write here makes logical sense to me.

You would still need to produce an updated probability about E. If this

80% chance the message was right meant the updated probability was:

.8 * .5 + .2 * .3 = 0.46

then the update would be from 0.30 to 0.46 so the information gain would be

log2(.46) - log2(.3) = -1.12 - 1.74 = .62 bits

If you received an update that caused your probability to be decreased, then that update would carry a negative amount of information about the upcoming event E, which means your uncertainty about the event had increased (your certainty had decreased) - the update took information about E away from you.

--
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                                        http://NewsReader.Com/

- J
- joseph2k
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Nov 9, 2006 12:58 PM

Bayesian methods cannot come into play for one-off, they require developed baseline numerical data to even begin to apply.

--
 JosephKK
 Gegen dummheit kampfen die Gotter Selbst, vergebens.  
  --Schiller

- J
- joseph2k
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Nov 9, 2006 1:06 PM

Thanks for stating relatively clearly what my experienced intuition told me.

--
 JosephKK
 Gegen dummheit kampfen die Gotter Selbst, vergebens.  
  --Schiller

- M
- Mark-T
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Dec 5, 2006 4:07 AM

We, i.e. humans, do it all the time. It's called Bayesian statistics.

Points for honesty. However, I do wonder, why people will say "I'm no expert", then proceed to give an opinion... information theory may be useful here...

Mark

- M
- Mark-T
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Dec 5, 2006 4:23 AM

Not at all.

snip

Good golly, Miss Molly!

Dude, you've gone way off the deep end. No need for estimates, pdf, or anything. I posed the simplest case, practically page 1 of Shannon's paper: you learn that an event E with probability p occurred, what is the information in that message?

Ans: log(1/p)

The point is, we posit the probability p. It's given, no need for distributions etc.

No need for all that. The 'new pdf' is 0.5. It's given.

Not necessarily wrong, per se, but way way overcomplicated.

Again, estimates and statistics are unnecessary, a red herring.

egad I think you need to reduce the entropy of your thinking...

"I'm no expert, and here's what I think..."

Mark

- M
- Mark-T
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Dec 5, 2006 4:25 AM

lol So much for that good ol' boy intuition...

Mark

- M
- Mark-T
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Dec 5, 2006 4:39 AM

Or simply (log(1/3).

No, this is where it gets a bit tricky. The 'new event' of the message "p = .5" is not the event E, but the message itself. The message is the event!

There you go! Now: the event F = "the prob. of E is .5" has occurred. To determine its information, we need to know the conditional probability of F, given that we assumed p(E) = .3.

hmmmm...

Not bad, so far...

NEGATIVE information?!?! Someone is seriously confused - information is never less than zero.

Problem for the student: which message, in this example, would contain zero bits of information?

Mark

- M
- Mark-T
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Dec 5, 2006 4:44 AM

No, it constitutes an event. Call it F.

None of the above.

Irrelevant. You are given p(E) = .3 No error.

Why do people go overboard?

Mark

- M
- Michael A. Terrell
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Dec 5, 2006 7:06 AM

Other people get tired of listening to them whine and toss them off the boat.

-- Service to my country? Been there, Done that, and I've got my DD214 to prove it. Member of DAV #85.

Michael A. Terrell Central Florida

- J
- Jim Thompson
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Dec 5, 2006 2:49 PM

Bwahahahaha! Another good one to add to my collection ;-)

...Jim Thompson

formatting link

| 1962 | I love to cook with wine. Sometimes I even put it in the food.

- R
- Richard The Dreaded Libertaria
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Dec 6, 2006 2:24 AM

On Mon, 04 Dec 2006 20:07:40 -0800, Mark-T wrote: ...

I think it's called "discussion", or "free exchange of ideas." ;-)

Cheers! Rich

- J
- John Popelish
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Dec 6, 2006 4:00 AM

I've seen lots of cases where the experts sit back and say nothing till some schmuck, like me, takes a stab at the problem and really screws it up. Then the the experts jump in and straighten out my sorry ass. And everyone is happy. ;-)