why did Shannon only consider sequences satisfying the same distribution when computing capacity?

Why did Shannon only consider sequences satisfying the same distribution when computing capacity? what happen if we are allowed to pick sequences satisfying any distribution?

These questions are motivated by the recent paper "A brief introduction on Shannon's information theory" by Chen.

This is about if Shannon's limit can be broken or not.

Yan

Reply to
yanli0008
Loading thread data ...

If by "sequences satisfying the same distribution" you mean that Shannon put some constraint other than dimensionality on coding -- no, he did not, that assertion is incorrect.

If by that phrase you mean why did he not consider non-Gaussian noise, it's because analysis with non-Gaussian noise is difficult, and his paper was already a giant one.

Other people have investigated Shannon-type limits in the presence of non- Gaussian noise. In general, you can often do better than a superficial reading of Shannon's paper would indicate, but all of this is _already known_, and _has been known for decades_, and _were not written by people looking for investors_.

There are no perpetual motion machines. If you're an investor, beware. If you're a shill, shame on you.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com 

I'm looking for work -- see my website!
Reply to
Tim Wescott

? 2017?2?19???? UTC

+8??5:03:13?Tim Wescott???
n
s

-

I read Shannon's paper again, the channel capacity limit is obtained in the following way if I understand correctly: suppose we have an alphabet of n letters and each appears with probability p_i. Now we consider sequences of these letters where each letter appears with frequency specified by p_i. N ow we count how many sequences sharing this same probability we can pick su ch that they are distinguishable after going through the channel. Surely, e ach distribution gives such a number. The capacity limit of the channel is then to find the optimal p_i giving the maximum this number.

Is this correct?

If this is correct, then by "same distribution" I mean the optimal distribu tion p_i giving the maximum number of distinguishable sequences.

Tell me where I made any mistake, thanks.

Yan

Reply to
yanli0008

Shannon's capacity theorem assumes that your p_i = 1/n for all i. For data that does not, you compress the data. Shannon proved that, too. Google for "Shannon" and "entropy".

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com 

I'm looking for work -- see my website!
Reply to
Tim Wescott

The most important 'but' here, is that the Gaussian result is soluble, and the small-noise approximation converges to that result as a limit. Gaussian equals small-noise limit in most real cases.

Also, the central limit theorem tells us to expect Gaussian distribution; those big displays of balls-bouncing-from-pins are a great example:

Reply to
whit3rd

Blessed are those that would denigrate Shannon's Theorems, for they shall be forever labeled shamans >:-} ...Jim Thompson

--
| James E.Thompson                                 |    mens     | 
| Analog Innovations                               |     et      | 
| Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    | 
| STV, Queen Creek, AZ 85142    Skype: skypeanalog |             | 
| Voice:(480)460-2350  Fax: Available upon request |  Brass Rat  | 
| E-mail Icon at http://www.analog-innovations.com |    1962     | 

     Thinking outside the box... producing elegant solutions.
Reply to
Jim Thompson

The central limit theorem tells us to expect Gaussian, but experiment often tells us otherwise. In particular, if you add up a bazzilion teeny random variables then the result will tend to Gaussian -- unless even one of those teeny random variables has an infinite variance, in which case all of the averaging in the world isn't going to make it finite.

Doesn't make much difference at 300MHz, but it sure does at 300kHz.

--
Tim Wescott 
Control systems, embedded software and circuit design 
I'm looking for work!  See my website if you're interested 
http://www.wescottdesign.com
Reply to
Tim Wescott

? 2017?2?18???? UTC

-5??7:28:31?Tim Wescott???

UTC+8??5:03:13?Tim Wescott??? ?

l
.
t
e
.

Tim, are you talking about data compress rate? in that case, surely we cons ider each probability distribution separately. However, for channel capacit y, it can not be always 1/n achieving the capacity.

Thanks,

Yan

Reply to
yanli0008

Go read Shannon's 1948 paper. Come back when you fully understand it.

--
Tim Wescott 
Control systems, embedded software and circuit design 
I'm looking for work!  See my website if you're interested 
http://www.wescottdesign.com
Reply to
Tim Wescott

[snip]

|_____________________________________| |

Is that the polite way of saying, "Go away and don't come back" ?>:-} ...Jim Thompson

--
| James E.Thompson                                 |    mens     | 
| Analog Innovations                               |     et      | 
| Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    | 
| STV, Queen Creek, AZ 85142    Skype: skypeanalog |             | 
| Voice:(480)460-2350  Fax: Available upon request |  Brass Rat  | 
| E-mail Icon at http://www.analog-innovations.com |    1962     | 

     Thinking outside the box... producing elegant solutions.
Reply to
Jim Thompson

Only if he doesn't apply nose to grindstone and read the paper.

And by "read" I don't mean just let his eyes travel over the words.

This is explained in a bazzilion different books on communications theory. Granted, it's not something that you can understand without effort, but it's there to understand.

--
Tim Wescott 
Control systems, embedded software and circuit design 
I'm looking for work!  See my website if you're interested 
http://www.wescottdesign.com
Reply to
Tim Wescott

? 2017?2?20???? UTC

-5??3:04:20?Tim Wescott???

e

This is so arrogant. Anyway, claiming 1/n always achieves channel capacity for any channel, you must be kidding.

If that is the case, just tell me what is the meaning of the formula C=ma x_X (H(X)-H(X|Y))? just assume X to be the 1/n distribution would do the jo b in your logic.

Maybe you are a super expert and do not want to explain such a fundamental question, then just do not come back and answer in an arrogant manner. That would be appreciated!

Reply to
yanli0008

Equal probability for each state with delta-function autocorrelation achieves maximum entropy, i.e. maximum information per bit. Any other distribution would _reduce_ channel capacity.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs 
Principal Consultant 
ElectroOptical Innovations LLC 
Optics, Electro-optics, Photonics, Analog Electronics 

160 North State Road #203 
Briarcliff Manor NY 10510 

hobbs at electrooptical dot net 
http://electrooptical.net
Reply to
Phil Hobbs

If the channel is capable of representing n distinct symbols then a data coding that maximises your surprise at seeing the next symbol is as good as you can ever get. Namely the maximum entropy solution where all symbols are equally likely to occur in your data.

What "Y" are you taking as being known a priori here?

If there is channel dependent noise then obviously the bad channels have to carry less data, but the original model was simpler.

The Shannon entropy paper is still worth you reading:

formatting link

Starts with the noiseless case and then proceeds to add noise.

--
Regards, 
Martin Brown
Reply to
Martin Brown

That is a very rude and arrogant thing to say. If he has some misunderstandings, why shouldn't he come here to discuss it? You are not the SED overlord. If you don't wish to discuss the topic with him then I suggest *you* to not post. :-P

--

Rick C
Reply to
rickman

? 2017?2?21???? UTC

-5??10:28:12?Mart> > ? 2017?2?20????

UTC-5??3:04:20?Tim Wescott??? ?

Martin,

Thanks. 'Y' is the probability distribution at the output side of the given channel. Please loot at page 22-24 in Shannon's paper here (the subscript `y` there). Of course, I knew some channels essentially do not depend on th e channel itself, in these cases you just need to maximize the entropy of t he input, just like AWGN. However, I am talking about general channels. And I am talking about whethe r there is any generalization of Shannon's theory.

I did not say Shannon's theory is incorrect. I am asking what happens if al l probability distributions are taken into consideration. If only one proba bility distribution is allowed to pick (see my first two posts for the exac t meaning), Shannon's result is perfectly right. No doubt.

Yan

Reply to
yanli0008

? 2017?2?21???? UTC

-5??10:41:43?rickman???

Thanks, Rick.

Reply to
yanli0008

Arrogant is refusing to study a complicated subject and then getting bent out of shape when someone suggests you do your homework.

And yes, that IS a hint.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com 

I'm looking for work -- see my website!
Reply to
Tim Wescott

I explained things sufficiently. He did not want to believe. The material is publicly available.

What's the issue again?

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com 

I'm looking for work -- see my website!
Reply to
Tim Wescott

If he wanted it explained, kindly, with references and details, he would've paid for the tutoring. :)

Which, if I might dare to suggest... Tim might even be willing to do. (But you might not like the price. :^) )

Tim

--
Seven Transistor Labs, LLC 
Electrical Engineering Consultation and Contract Design 
Website: http://seventransistorlabs.com
Reply to
Tim Williams

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.