Why did Shannon only consider sequences satisfying the same distribution when computing capacity? what happen if we are allowed to pick sequences satisfying any distribution?
These questions are motivated by the recent paper "A brief introduction on Shannon's information theory" by Chen.
This is about if Shannon's limit can be broken or not.
If by "sequences satisfying the same distribution" you mean that Shannon put some constraint other than dimensionality on coding -- no, he did not, that assertion is incorrect.
If by that phrase you mean why did he not consider non-Gaussian noise, it's because analysis with non-Gaussian noise is difficult, and his paper was already a giant one.
Other people have investigated Shannon-type limits in the presence of non- Gaussian noise. In general, you can often do better than a superficial reading of Shannon's paper would indicate, but all of this is _already known_, and _has been known for decades_, and _were not written by people looking for investors_.
There are no perpetual motion machines. If you're an investor, beware. If you're a shill, shame on you.
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
I'm looking for work -- see my website!
I read Shannon's paper again, the channel capacity limit is obtained in the following way if I understand correctly: suppose we have an alphabet of n letters and each appears with probability p_i. Now we consider sequences of these letters where each letter appears with frequency specified by p_i. N ow we count how many sequences sharing this same probability we can pick su ch that they are distinguishable after going through the channel. Surely, e ach distribution gives such a number. The capacity limit of the channel is then to find the optimal p_i giving the maximum this number.
Is this correct?
If this is correct, then by "same distribution" I mean the optimal distribu tion p_i giving the maximum number of distinguishable sequences.
Shannon's capacity theorem assumes that your p_i = 1/n for all i. For data that does not, you compress the data. Shannon proved that, too. Google for "Shannon" and "entropy".
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
I'm looking for work -- see my website!
The most important 'but' here, is that the Gaussian result is soluble, and the small-noise approximation converges to that result as a limit. Gaussian equals small-noise limit in most real cases.
Also, the central limit theorem tells us to expect Gaussian distribution; those big displays of balls-bouncing-from-pins are a great example:
Blessed are those that would denigrate Shannon's Theorems, for they shall be forever labeled shamans >:-} ...Jim Thompson
--
| James E.Thompson | mens |
| Analog Innovations | et |
| Analog/Mixed-Signal ASIC's and Discrete Systems | manus |
| STV, Queen Creek, AZ 85142 Skype: skypeanalog | |
| Voice:(480)460-2350 Fax: Available upon request | Brass Rat |
| E-mail Icon at http://www.analog-innovations.com | 1962 |
Thinking outside the box... producing elegant solutions.
The central limit theorem tells us to expect Gaussian, but experiment often tells us otherwise. In particular, if you add up a bazzilion teeny random variables then the result will tend to Gaussian -- unless even one of those teeny random variables has an infinite variance, in which case all of the averaging in the world isn't going to make it finite.
Doesn't make much difference at 300MHz, but it sure does at 300kHz.
--
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work! See my website if you're interested
http://www.wescottdesign.com
Tim, are you talking about data compress rate? in that case, surely we cons ider each probability distribution separately. However, for channel capacit y, it can not be always 1/n achieving the capacity.
Go read Shannon's 1948 paper. Come back when you fully understand it.
--
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work! See my website if you're interested
http://www.wescottdesign.com
Is that the polite way of saying, "Go away and don't come back" ?>:-} ...Jim Thompson
--
| James E.Thompson | mens |
| Analog Innovations | et |
| Analog/Mixed-Signal ASIC's and Discrete Systems | manus |
| STV, Queen Creek, AZ 85142 Skype: skypeanalog | |
| Voice:(480)460-2350 Fax: Available upon request | Brass Rat |
| E-mail Icon at http://www.analog-innovations.com | 1962 |
Thinking outside the box... producing elegant solutions.
Only if he doesn't apply nose to grindstone and read the paper.
And by "read" I don't mean just let his eyes travel over the words.
This is explained in a bazzilion different books on communications theory. Granted, it's not something that you can understand without effort, but it's there to understand.
--
Tim Wescott
Control systems, embedded software and circuit design
I'm looking for work! See my website if you're interested
http://www.wescottdesign.com
This is so arrogant. Anyway, claiming 1/n always achieves channel capacity for any channel, you must be kidding.
If that is the case, just tell me what is the meaning of the formula C=ma x_X (H(X)-H(X|Y))? just assume X to be the 1/n distribution would do the jo b in your logic.
Maybe you are a super expert and do not want to explain such a fundamental question, then just do not come back and answer in an arrogant manner. That would be appreciated!
Equal probability for each state with delta-function autocorrelation achieves maximum entropy, i.e. maximum information per bit. Any other distribution would _reduce_ channel capacity.
Cheers
Phil Hobbs
--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC
Optics, Electro-optics, Photonics, Analog Electronics
160 North State Road #203
Briarcliff Manor NY 10510
hobbs at electrooptical dot net
http://electrooptical.net
If the channel is capable of representing n distinct symbols then a data coding that maximises your surprise at seeing the next symbol is as good as you can ever get. Namely the maximum entropy solution where all symbols are equally likely to occur in your data.
What "Y" are you taking as being known a priori here?
If there is channel dependent noise then obviously the bad channels have to carry less data, but the original model was simpler.
The Shannon entropy paper is still worth you reading:
formatting link
Starts with the noiseless case and then proceeds to add noise.
That is a very rude and arrogant thing to say. If he has some misunderstandings, why shouldn't he come here to discuss it? You are not the SED overlord. If you don't wish to discuss the topic with him then I suggest *you* to not post. :-P
Thanks. 'Y' is the probability distribution at the output side of the given channel. Please loot at page 22-24 in Shannon's paper here (the subscript `y` there). Of course, I knew some channels essentially do not depend on th e channel itself, in these cases you just need to maximize the entropy of t he input, just like AWGN. However, I am talking about general channels. And I am talking about whethe r there is any generalization of Shannon's theory.
I did not say Shannon's theory is incorrect. I am asking what happens if al l probability distributions are taken into consideration. If only one proba bility distribution is allowed to pick (see my first two posts for the exac t meaning), Shannon's result is perfectly right. No doubt.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.