statistics folly

More options Jul 16, 1:24 pm Newsgroups: sci.math From: gearhead Date: Fri, 16 Jul 2010 10:24:41 -0700 (PDT) Local: Fri, Jul 16 2010 1:24 pm Subject: stats/probability question Reply | Reply to author | Forward | Print | Individual message | Show original | Remove | Report this message | Find messages by this author

I'm an engineering undergrad in an intro stats course. We had a question in the book that's really dumb.

problem as stated:

Your candidate has 55% of the votes in the entire school. But only

100 students will show up to vote. What is the probability that the underdog (the one with 45% support) will win? To find out, set up a simulation. a) Describe how you will simulate a component and its outcomes. b) Describe how you will simulate a trial. c) Describe the response variable.

The answer in the back of the book says using a two digit random number to determine each vote (00-54 for your candidate, 55-99 for the underdog) you would run a string of trials with 100 votes to each trial.

Now, this is one misconceived exercise. Let me explain why.

Say the school has 1000 students. If all of them show up, the underdog has 0% chance of winning. If exactly one voter shows up, underdog has 45% chance of winning. In an election where 100 voters show up, underdog's chance of winning the election HAS to lie somewhere between 0% and 45%. No ifs, ands or buts. The probability of a win for underdog can never exceed 45%. When the exercise asks "how often will the underdog win" I interpret that as meaning what are his chances, i.e., the probability that he will win. But if you run a simulation, you can get anything, including results above 45%. I don't think simulating has any validity here, at least the procedure suggested in the answer key. That is a lot of simulating to do by hand, 100 per trial, but it is nowhere close to even starting to answer the actual question. You would first of all have to know the population of the school and then do some very demanding simulations that would only be practical on a computer.

Practical considerations aside, the question is meaningless unless know something about the magnitude of the school population. Consider: if the total population is 108, the underdog cannot win, because he only has 49 (48.6 rounded up) supporters total. Chance of winning 0%. Period. "Underdog" has NO CHANCE of winning the election. But if you run a simulation the way the book suggests, he's going to win some. In fact he wins about half. I'm saying the book is wrong. Back to our school of 1000 students, out of whom 450 would vote for "underdog." If only 100 students vote, what are his chances of winning? Simulation will send you on the wrong track here unless you're ready for some head scratching and a big grind on the computer. I'm sure this problem has a neat theoretical solution.

In class today I saw this problem and just was mystified until I worked out the implications, and now it's clear that it's just incredibly stupid. How would you convince the teacher of that? If I point out that it's impossible to get any answer above 45%, she might say, well this isn't theoretical, we're just running simulations, which is the whole point of the game. To convince her I might have to work out the actual correct simulation methodology, which is likely a very big headache and something I don't have time for. So I may just let it slide and not even bring it up. But I'm still interested in the theoretical solution, if anybody can cough it up. It's a probability problem now, not empirical statistics.

--------------------------------------- Posted through

formatting link

Reply to
Michael Robinson
Loading thread data ...

They're really just wording the question kinda poorly (and they're also assuming the student population is very, very large -- as you point out, if there are only 100 kids at the school, you can come up with very definitive answers). What they really mean is something like:

-- You're performing sampling where 45% of the time you get answer A (someone votes for the underdog), and 55% of the time you get answer B (a vote for the other guy). If you perform 100 random samples, what's the likelihood that you'll get more than 50 'A' answers?

This is a standard statistics question, along the lines of, "If you roll a fair dice 100 times, what's the likelihood you'll get '3' 20 or more times?"

Part of engineering is figuring out what your "customer" really wants when their own description is kinda flaky. :-)

---Joel

Reply to
Joel Koltner

Quite common in medical: "This new procedure has a 15% success rate!" (applause) ... "How many candidates were in the patient pool for the study?" ... "Twenty" (silence)

One question I always pondered is, why are they teaching this in engineering school anyhow? When I started at university it was all engineering stuff. Plus math, chemistry, mechanical engineering, but all pretty well geared towards us becoming EEs some day.

--
Regards, Joerg

http://www.analogconsultants.com/
 Click to see the full signature
Reply to
Joerg

if

definitive

(someone

the

times?"

If the school population is many, many magnitudes larger than the number of voters, the chance that underdog will win just reduces to 45% (the same as the underdog's chance of winning if only one student votes). And in the case where the school population is relatively small, the simulation methodology suggested is so bad it's not even wrong. Sampling will always return about 45%, and we have seen that the chances of the underdog winning can range as low as zero. The exercise is meaningless. I think I should go for a walk.

--------------------------------------- Posted through

formatting link

Reply to
Michael Robinson

r

that

a

en

of

s

ng

.

Hmm that's not right, as long as there are more than 100 students at the school the answer should be the same. The standard deviation from a sample goes as the square root of the number of samples. 100 samples means 10% is about the error. Since 10% is about what the 45% person needs to win I would guess that this happens about one standard deviation of the time.. about 13%. What is the answer in the book?

George H. (Of course this is a 'back of the envelope' calculation and there may be factors of 2 or pi floating around)

Reply to
George Herold

"Michael Robinson" wrote in message news:spKdnR2-x6_1Mt3RnZ2dnUVZ snipped-for-privacy@giganews.com...

Only if you do a very large number of samples as well.

Consider the following: You have a very large school, 45% want the underdog. If only one person shows up to vote, what's the likelihood the underdog wins? Clearly 45%, right?

But now let's have THREE people show up to vote... what are the underdog's chances? (U=underdog, N=non-underdog)

Votes Likelihood Winner Underdog Weighting UUU 0.091 U 0.091 UUN 0.111 U 0.111 UNU 0.111 U 0.111 UNN 0.136 N

NUU 0.111 U 0.111 NUN 0.136 N

NNU 0.136 N

NNN 0.166 N

Underdog's chance: 0.425

(hopefully the formatting doesn't get too messed up there...)

As you can see, the answer isn't 45% but rather 42.5%. By taking a limited number of samples, you get a certain about of "noise" in the final outcome due to the inability to cast a fraction of a vote; that makes the outcome different from the case where every voter is counted (in which case the noise averages out to zero).

Somewhat-related EE application: Sigma-delta modulators?

---Joel

Reply to
Joel Koltner

Some EE ends up using it? :-)

Stats show up an awful lot in...

-- Communication texts, worrying about the effect of nose on signal intelligibility --> Those trying to cook up new modulation formats should worry about this

-- Error-correcting codes --> Those worrying about choosing error-correctoin schemes should worry about it

-- Phil Hobbs' book :-)

-- Tim Wescott's book :-)

I think the real answer is that curriciulums often have historical roots that are hard to change even when the material becomes of margin use for most students. Many a practicing BSEE can do just fine recalling no more statistics than, e.g., how to calculate a mean...

---Joel

Reply to
Joel Koltner

Uh, no.

The probability distribution of the resulting vote is a binomial distribution

formatting link
with a peak at 55 votes for the winner and 45 votes for the loser. It'll have a variance of 100 * 0.45 * 0.55 = 24.75. With that many votes it'll be pretty close to a normal distribution, so the probability that a vote will go the wrong way is about 16%.

So when you get back from your walk, you probably want to brush up on your statistics.

Doing this by simulation makes no sense unless the aim of the exercise is to teach the student how to do Monte Carlo simulation, or to help them get a feel for that 16% probability of a wrong vote.

--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

Also Monte Carlo in SPICE, named after _the_ casino city. Actually, formally it's a whole country unto itself.

Ok, yes, I agree that we all need it. My point really was, isn't this sort of stuff the job of a high school to teach? There has got to be a reason why we all must go to high school before heading towards engineering :-)

--
Regards, Joerg

http://www.analogconsultants.com/
 Click to see the full signature
Reply to
Joerg

Ah, sorry, I had missed that point. :-)

I did have a stats class in high school, but there were was another one in college as well... that was rather more advanced.

Although I'd have to say I learned more about stats when they started being applied in engineering classes rather than just being somewhat abstract mathematical tools.

---Joel

Reply to
Joel Koltner

o

t,

for

d that

ll a

when

r of

as

ling

ss.

Tim do you mind showing a bit more of your work? How did you get 16% from a variance of 24.75?

Thanks,

George H.

d text -

Reply to
George Herold

Tim do you mind showing a bit more of your work? How did you get 16% from a variance of 24.75?

Thanks,

George H.

There's an 87.5% chance he'll reply, with a 99% chance you won't understand ( and neither will I) :(

Reply to
TTman

Variance of 24.75 = sigma of around 5, 50 votes occurs at 5 votes away from the center (of 55 votes), or one sigma. There's a 34% probability that you'll hit a vote between 50 and 55, plus a 50% probability that you'll hit a vote somewhere between 50 and 100. That's an 84% chance of a correct vote, with 16% remaining for claims of stolen elections and arguments over voting procedures in Miami.

--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

College stats is well beyond high school stats. College stats (at least the one that I took) is a 4th year class from the mathematics department that leaves many of the math majors in the dust.

--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

I have some hand-waving about random processes in my book, but I don't think there's much real statistics in there. Cite a page number and I'll look, though.

If you want to get more than an intuitive grasp of the response of a control system to random input (either noise or a command that's modeled as stochastic) you need rather more material under your belt than I provide in that book.

Of course, once you _do_ get the necessary information, you can apply it using the book...

--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

You can get an exact answer using the binomial distribution only in trials with replacement. This problem descirbes trials without replacement.

If you use the binomial distribution here you will get an approximation.

If the school population is very large, then the approximation would be a good one because the trial will be close to one with replacement -- in other words, you'll seldom count a student twice when you do your sampling of 100 out of a much larger population.

The simple fact is that the book's naively constructed simulation solution will give an answer that approaches validity only assuming very large school population (and there's no point in doing a sim then because you already know the answer).

For any school population where the outcome is worth calculating -- say, a few hundred students -- the suggested sim is dead wrong. The "underdog's" chance of winning varies. Always less than 45%, approaching zero as the school pop approaches 109 or 108. While the sim always returns values centering around 45%.

Now can you see why I said it's a dumb problem?

Reply to
Michael Robinson

Yes, that's been discussed.

Are you saying that the 1600 student school that I attended makes calculations about it somehow not worthwhile?

(And no, I didn't do the math, so I don't have a good grasp of how closely the binomial distribution would approximate in this case -- but it's probably close)

Yes, but if you're upset at a problem that's not stated clearly, why didn't you clearly state your objections?

--
Tim Wescott
Wescott Design Services
 Click to see the full signature
Reply to
Tim Wescott

Ok, then I may have a stats deficiency in my brain cell portfolio :-)

--
Regards, Joerg

http://www.analogconsultants.com/
 Click to see the full signature
Reply to
Joerg

Oh, so Obama's election was just a statistical anomaly. I feel so much better now. MikeK

Reply to
amdx

is

get

Actually, the US electors who got out to vote for Obama represent a very large "school", and the likelysampling error on the result was about 0.12% of his winning margin - 9,522,083 - out of 131,257,328 votes cast. The square root of 131,257,328 is about 11,457.

formatting link

He won quite decisively - the biggest margin of any non-incumbent candidate so far.

-- Bill Sloman, Nijmegen

Reply to
Bill Sloman

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.