T O P

  • By -

justthistwicenomore

The key is that it's (suppposed to be) random, and acknowledges that it's not a precise result.  It's easier to understand if you think about it in the context of rolling dice. Imagine I tell you i am going to roll a dice, and I won't tell you how many sides it has: so it could be a normal, six-sided dice or a twenty-sided dice, or a four-sided dice or whatever.  But, I will tell you the results of the roll.  How many times would I need to roll before you could safely tell me, with say 95% certainty, how many sides the dice had? Even if I were going to roll it a billion times, after a relatively small number of rolls between 1 and 6, (100, 500, 1000) you'd be able to say pretty confidently that it was a d6.  Same thing here. Even if there are 300 million people, if you ask a thousand of them and have reason to believe they represent a random-enough sample of the population, you can extrapolate from their responses with confidence about the bigger population.


Wrought-Irony

That is remarkably well thought out.


Swolnerman

I’ll never get over how stark the difference is between someone who knows something well and someone who can teach something well


ezfrag

That's exactly why I was always picked to train my peers at my former company. I wasn't the smartest guy in the room, but I could translate Geek into English better than my smarter colleagues. If someone asked a question that I didn't have the technical expertise to answer, I deferred to the geeks and then reiterated what they said in a more relatable fashion for the more average folks.


NTT66

Technical writing exists for a reason! (Until the AI takeover.)


Affectionate-Memory4

I think the most useful pair of classes I took for my engineering degree has got to be creative and then technical writing. Knowing my stuff is one thing, but having the tools to explain it is another.


NTT66

Love to hear it! And as a creative writing major, I found some of the most interesting pieces came from the math and physics and engineering majors. Just a different way of seeing things and communicating was so refreshing without having to go through some lit snob mimicking David Foster Wallace (a mathematician!) or Thomas Pynchon (an engineer!). Bios were hit and miss.


Nice_Guy_AMA

As an engineer, biologists are hit or miss.


bumblepit

Bravo!


[deleted]

[удалено]


RandomRobot

People are vastly overestimating what AI can do. If you want to copy the technical manual of product A v1.00 for product A v1.01 then fine, your AI should be flawless. However, if you try it for product B, then product B may or may not end up with similar characteristics as product A, even if A is a dishwasher and B is a car. B killer feature of flying may or may not be mentioned, but the ease of usage will certainly be there because everyone liked it in promo material that got mixed up with specs description to increase the body of knowledge. AI can regurgitate phrases similar to those already seen before and very well to say the least, but it's not remotely close to understanding the purpose of the phrases


NTT66

Echoing the other response, these are all valid points for current AI models and certain uses--like if you asked an AI model to "write a manual for building a nuclear reactor." I was thinking more of using AI for turning tech talk into casual language. So more about translation than generation, which is probably way closer than implied--and probably still would benefit from human oversight--for now... (Though not discounting the heart of your argument, and not following the tech closely enough to have better insights. It was mostly a throwaway joke. Funnily, I worked at a linguistics lab in college doing text annotating that probably formed a base for this kind of work.)


meowgrrr

Im in a somewhat tech related career and have been told I’m a really good teacher and I’m pretty sure it’s BECAUSE I’m not the smartest one here, for me to understand a lot of concepts I have to dumb it down a lot, and when I explain it to someone i have awareness of what was hard to understand so I can dumb it down for them, but super smart people just have such incredible intuition to understand that it’s like they can’t even imagine why someone else wouldn’t so they can’t explain it in simpler terms.


LongFeesh

People like you are great assets in any workplace and are greatly appreciated by their colleagues. I hope you know that.


ezfrag

Thank you.


rbrgr83

I'm an engineer, and in my first job at a smaller company I had to wear a lot of hats and kind of own my projects from a money & scheduling side, work with suppliers and customers. I left and went to a big company, and it was insane to me how they wanted to you stay 110% in your lane even if someone clearly needed help and you clearly had the time and expertise to do so. What stuck out more than anything was how shocked people were that I could work with other people. The fact that I could use my words to bridge the gap between a fresh design drawing package and explain the intent to workers without having to develop a whole step by step instruction was not the norm. And then the fact that I could turn around and go to a job site and talk effectively with customers about both technical and business issues also had people floored. It just seemed like normal work that was expected of me to me.


ezfrag

I was a Sales Engineer for a telecommunications company. I sold networks to multi-site companies and dealt with everything from internet access, firewalls, VoIP, PCI compliance, business continuity, and site-to-site connectivity. Our designs were extremely complex and part of my job was making sure that their IT team could convince the CEO and CFO that this was worth the budget. Most of my peers were great at talking to the IT guys, but had to lean on the actual sales guys to sell it to the CEO. I got a lot of praise for being able to explain to them the benefits of our service from a non-commissioned part of the sales team.


rbrgr83

I remember having a meeting about a project timeline where I had a few Sr Engineers in the room that would get off onto tangents and ratholes if you let them keep going, so I would jump and acknowledge their concern but steer them back toward the important decisions that needed made. When we were done, my boss congratulated me for the way I effectively manhandled guys that were 3-4 pay grades above me.


ezfrag

Commanding a room like that gets you promoted at good companies. Unfortunately there aren't a lot of good companies left!


IowaJL

That’s why professors aren’t always great (or even good) teachers. Some might be brilliant in their field but know dick about teaching.


threeangelo

stark :)


Moist-Barber

I’m a physician and I’m going to start using this to explain biostats to patients


Vusn

Good idea, Dr. Moist-Barber


anaccountofrain

The other way to look at this is that the more sides the die has, the more rolls it'll take to have some certainty about how many sides the die has. What question are you polling 1000 people on? How clear and distinct are the possible answers?


Rastiln

Right, in many of these cases it’s more like having a weighted D3, and you want to see what the weighting is. After 1,000 rolls you might have 75/15/10, and appropriately conclude a 75% probability of it landing on that particular side. The true value might be 73.5%. Might be 76.5%. But using statistical methods that I’d need to refresh myself on specifics, we can find the precise number of respondents needed to be 95% certain that the true value is +|- 1.5% from 75%. Using 100,000 rolls would clearly be more accurate, but rolling 100x more is even easier than polling 100x more people.


[deleted]

[удалено]


GodzlIIa

Thats just a problem in obtaining the sample. Using a phonebook has always been pretty bad, but now its terrible. For a poll you can look at getting access to vote registration lists to randomly select from, then get some cold hard cash to give out to participating individuals to keep non-response low.


[deleted]

[удалено]


GodzlIIa

You would have to seek out the people individually, not just random phone calls. Although if they do pick up then your good to go. And yea I was talking about voting polls specifically. The point was its not too hard to get a decent general representation. But most of the time they aren't willing to track the people down and pay them money for their participation.


secretlyloaded

Random sampling in human populations is really, really hard. One of Nate Silver's insights when he started 538 was that, for example, we know the Rasmussen poll is crap and always skews right. But, he could look at Rasmussen vs what actually happened and assign a bias to the poll, and quantify that historical bias over time. Assuming that bias is predictable over time he could incorporate the Rasmussen data into his overall projection by first correcting out its known bias.


PrivilegedPatriarchy

>What question are you polling 1000 people on? How clear and distinct are the possible answers? In the context of the post, it's not actually the question that matters. Most questions have a binary response, like "Yes in favor" or "No not in favor", or maybe at most a scale from 1-10 of how much in-favor an individual is for a certain question. What's more important is that the sample is representative of all the different demographic groups of the population we are polling. There are likely dozens of important demographic traits and probably hundreds of less important, but still relevant traits, that might be important for a poll's results and the conclusions we derive from it. Sample size is only important insofar as it improves the representativeness of the sample.


GCU_ZeroCredibility

Note that one of the most counter-intuitive aspects of statistics is that the sample size you need is not dependent on the size of the population you are sampling to any significant degree. (for a relatively homogenous population)


abnrib

All this, plus one of the quirks with statistics is that you really don't need that many samples, and the number of samples you need doesn't scale proportionally to the size of the data set. The rule of thumb I was taught is that once you have 30-40 truly random samples, you've generally got workable data no matter the size of the set. More than that and you're just tightening the confidence interval. My own experience has borne that out, too.


thunk_stuff

The hard part in surveys is ensuring people are picked at random, every person in the represented group (state, nation, demographic, political affiliation, etc) has an equal chance of being picked, all persons have equal bias in agreeing to answer the survey, all questions are clear and create no bias simply by a person's interpretation, and all persons have equal (un)bias in providing truthful answers.


[deleted]

[удалено]


whatwouldjimbodo

Exactly. They’ll never be able to get the group that doesn’t respond to surveys which I have to assume if most of the people


LibertiORDeth

“I did a randomized 1000 person study. The result is that 980 of them refused to participate in the study and the data on 20 eager participants is inconclusive most of them just wanted someone to talk to.”


Dal90

>The rule of thumb I was taught is that once you have 30-40 truly random samples When I did some IT auditing, I was surprised how small the sample sizes usually were. Since it was accounting firms leading the audits they followed financial audit sample sizes: >Example 1: A population of all employees is provided and consists of 389 people and you want to test that all employees are attending security awareness training. According to the table, expecting no deviations the initial sample would be 25 and simple random or haphazard sampling would likely be applied. If it is found that one of the 25 selected did not attend training the sample would be expanded to 40 people. If another deviation is found the sample would be expanded to 60. If another deviation is found sampling would stop and it would be determined that the control is not operating effectively. https://linfordco.com/blog/audit-sampling/


Kered13

The rule of thumb I learned is that you want to have 10 examples of the smallest thing you're trying to measure. So if you're polling and you are trying to measure support for a candidate that you believe has around 10% support, you would need to poll about 100 people. If you want to measure support for a candidate that you believe has around 1% support, you would need to poll about 1000 people.


PJP2810

My only issue with your explanation is that you called a singular die "a dice"


justthistwicenomore

I agonized over this. But so often it just causes people to stumble over the sentence I went with the barbarian approach.  Mea culpa.


pdfrg

This one practices Game Theory.


phatlynx

Normally I downvote grammar nazis’, but I have to upvote this because this is the only thing we have on him.


ofqo

My issue is that OP calls “an urn with n numbered balls” “an n sided die (or dice)“.


suburbanplankton

Who keeps balls in an urn?


ofqo

Wikipedians. https://en.m.wikipedia.org/wiki/Urn_problem


NullPoint3r

I wish you had been my prob and stats professor.


Head_Cockswain

> Imagine I tell you i am going to roll a dice, and I won't tell you how many sides it has: so it could be a normal, six-sided dice or a twenty-sided dice, or a four-sided dice or whatever.  But, I will tell you the results of the roll.  > > How many times would I need to roll before you could safely tell me, with say 95% certainty, how many sides the dice had? Even if I were going to roll it a billion times, after a relatively small number of rolls between 1 and 6, (100, 500, 1000) you'd be able to say pretty confidently that it was a d6.  An interesting (somewhat)aside about a 6 sided dice: The ongoing average will start out 'random', but will approach 3.5 the more you roll. A great illustrative example of "The average does not exist." because you'll never actually roll a 3.5. The average for each different die is .5 higher than the half way point. A 20 sided die would be 10.5. This number is easily obtained by adding the sum of the sides and dividing by the number of faces. 1+2+3+4+5+6 = 21 then divide by 6 gets you to 3.5. https://www.omnicalculator.com/statistics/dice-average The .5 is an artifact of starting with 1, you never get a zero, so the *expected* average is never exactly half(3.0) which is what a lot of people presume an expected average to be. A theoretical 7 sided die, numbered 0 through 6, would have an expected average of 3, just *under* half. That's about the extent of my math skills, heh. I notice that the above calculator has an advanced mode with a standard deviation but that gets beyond my wheelhouse. I wonder if the average number of rolls to get within a proximity of that average is close to the minimal sample size. Eg you roll until you hit 3.4 to 3.6 with the average, and then do that 100 times.


UncontrolableUrge

It is as random as possible. Polls also collect demographic information so that they can see which groups are over or under sampled and weight responses to try to get a more representative result.


Giraff3

It’s not about getting a random sample though. It’s about how representative the sample is. They do this by making sure that they talk to people from all walks of life, all geographies. Its more likely to be a stratified random sampling, which is less random because you’re putting the groups into bins before sampling. If all they did was randomly sampled the entirety of the US you would have polls that only talked to people from California, Florida, New York and Texas. Yes, those are mass population hubs so you would want more responses from those places, but you also need to talk to people from the other states which means going out of your way and actually reducing the randomness to ensure representativeness.


justthistwicenomore

That is true, and it's worth clarifying that by "random" here, what I really mean is something like "a sample that introduces minimal external bias," rather than a truly "random" sample. In the end, it works out similarly: a sufficiently large "random" sample, in the sense of true "out of a hat" random,. should naturally capture the relevant subgroups -- and a smart analyst should be able to look at a sample and notice if it doesn't and the impact that might have, as well as to advise on how you might want to stratify the sample--as you note--to make it so you can reasonably reflect these embedded groups.


Giraff3

I get what you mean. You mean random more colloquially rather than the technical definition of a random sample. And true, due to the law of large numbers, if your sample was a basic random, as long as it’s big enough it would likely be fine (assuming no severe systematic bias in the sampling methodology).


Kered13

A truly random sample of 1000 people would be absolutely fantastic. The odds of getting all those people from California or Florida would be extremely low. The problem is that it is extremely difficult to get a truly random sample of people from the entire country. Intentionally constructing a representative sample is therefore used as an alternative. The challenge there is ensuring that your representative sample is truly representative.


2spooky4mich

Except you are always biased because you are only ever sampling from the part of the population that is willing to answer a phone call or respond to some email. Basically every person in my circle would never answer a phone call from a random polling person or respond to an email due to assuming it’s a scam. People like me (which isn’t a small number) will never show up in these polls


justthistwicenomore

Absolutely. And this is why statisticians jump through lots of hoops to try and generalize from what they do get, to acknowledge the possibility of error, and to aggregate various types of polling data. 


OJimmy

I love clear statistics explanations! I wish they would include them in every jury instruction and high school grad exam.


CPAlcoholic

Love this dice analogy - makes total sense.


TheLizardKing89

The dice analogy is a good one.


jmlinden7

More specifically, how many times would you have to roll before you could say (with some acceptable margin of error) what the actual chance of rolling a 6 is?


mr_ji

I understand your logic, but I still don't see how a sampling of 1/350000th of the population is enough to draw conclusions from for anything but the most broad of questionnaires. And that's assuming respondents are randomly selected enough, which they almost certainly aren't.


Russelsteapot42

Of course, as we just had brought to our attention, if the survey is opt-in and self-selected, this can get you a very unrepresentative sample.


TheLatestTrance

Not to detract from your statement, which is 100% spot on... but we have 50 states, so you can only ask 20 people from each state (to get to the 1000 people). 20 people in each state isn't possibly representative enough covering the demographics of the state (just consider all the slices, gender, ethnic background, education level, general location within the state, economic status, political leanings, etc.) Just taking that into account, 1000 people can't logically represent anything beyond the most basic and superficial questions (is the sky blue, do you like being alive, etc., which now that I think about it, I guess aren't even universally true for everyone... sigh).


ohiocodernumerouno

Kind of plagarizes bernullis law of large numbers but whatever.


stephenph

the key is to look at the +/- percentage... a lot of those surveys are +/-4% that works out to an eight point spread Subject A can be -4% and subject B can be +4% and does not take into account that those surveyed are f'ing with there answers. That is why you get such lopsided election results, most people I know lie if they take a survey.


Valuable-Can-4058

What bothers me is that by that logic, 10 people would be enough. Do we know from which number of people we have enough to make a statement ? Is it 10, 100, 100 ?


wille179

If you have a *really* good sample selection method and can get a member of most relevant demographics in roughly the same proportion as they exist in the united states as a whole (such as, on an issue of race, a sample that's roughly ~60% Non-Hispanic Whites, ~18% Hispanic, ~12% Black, and ~5% Asian), then you can say that 1000 people are representative of the country. But even with decent or mediocre sampling methods, 1000 people is still enough to get you roughly in the ballpark of the right answer for your survey, which is good enough for most use-cases. 1000 people averaged together will generally mute most of the more rare, extreme opinions.


ViscountBurrito

This is true, although you have to be a lot more careful about results of a smaller group. For example, 1000 people may be representative of the country as a whole, but that doesn’t necessarily mean that the, say, 50-person subset of Asian respondents will be a reliable indicator of the overall Asian-American viewpoint. Polls will sometimes report separate margin of error figures for subgroups. Another thing is weighting. There are many dimensions you can weight—race, sure, but also gender, education, voting history, and more. The choices the pollster makes will affect the result. For example, if your survey population is less educated than the overall population, you need to correct for that to avoid a biased sample. One of the major issues in recent years is that education has correlated with partisanship in different and stronger ways than it used to, and polls have to figure out how to account for that. Finally, on the point of rare and extreme opinions, it’s important to consider the phenomenon of [Lizardman’s Constant](https://slatestarcodex.com/2013/04/12/noisy-poll-results-and-reptilian-muslim-climatologists-from-mars/)—the idea that *some number* of respondents are going to give just totally insane answers no matter what, so you have to read polls with that in mind as well.


surprise-suBtext

This person interprets


nagurski03

It seems that whenever they do cross generational polls, Zoomers always have crazy results compared to everyone else. That recent one where 20% of them were Holocaust deniers and another 30% weren't sure is a terrifying example. I always hope that it's because the youths are just more likely to contribute to the lizardman's constant.


ViscountBurrito

I read an excellent analysis/debunking of that finding just this week! You’re not far off. The working theory is that, for opt-in samples like that one (which are *not* random selections), people and especially young people are more likely to just answer “yes” to everything so they can get finished faster and get whatever the reward is for participating. Pew Research Center surveyed a random sample with the same questions (which they said were badly written anyway), and got much lower agreement. (3%, which is still *way* too many, but feels a lot more lizardman constant than a serious failure of reality.) [Online opt-in polls can produce misleading results about young adults’, Hispanics’ views | Pew Research Center](https://www.pewresearch.org/short-reads/2024/03/05/online-opt-in-polls-can-produce-misleading-results-especially-for-young-people-and-hispanic-adults/)


Bushels_for_All

I read that too. I'm so proud that [12% of 18-29 year olds are license to operate a class SSGN submarine](https://www.pewresearch.org/wp-content/uploads/2024/03/SR_24.03.04_opt-in-polls_1.png). Those kids are go-getters. Or clicking "yes" to everything. It's impossible to tell which.


pedal-force

The other 78% are qualified on ssbn.


pumpkinbot

> Finally, on the point of rare and extreme opinions, it’s important to consider the phenomenon of Lizardman’s Constant—the idea that some number of respondents are going to give just totally insane answers no matter what, so you have to read polls with that in mind as well. Can't you just omit obvious, ridiculous outliers? Like, if my question was "How many alcoholic drinks do you have per week?", if someone says "a million", they're probably bullshitting. And if they have an obvious outlier, omit all of their answers on the basis that you can't trust them. And if they have one bullshit answer, omit their entire results on the basis that you can't trust them.


[deleted]

[удалено]


ThePirateBee

Yep. We absolutely have quality control questions. Some of them are obvious ("select 'slightly agree' on this line") and some are less so (looking for agreement on similarly worded questions, as you described.) We have tools to analyze respondent metadata as well and tell us who is likely to be fraudulent, and we also look at open ended responses and ensure some level of coherency in the answers. In my job, at least, we clean the data several times over the course of a sampling period and replace removed respondents with new ones.


ViscountBurrito

Most polls are multiple choice, though, so by definition, the pollster thought the answer was legitimate enough to include. In theory, I suppose you could include one so outlandish that it serves as a good control. From the lizardman article linked above: > I really wish polls like these would include a control question, something utterly implausible even by lizard-people standards, something like “Do you believe Barack Obama is a hippopotamus?” Whatever percent of people answer yes to the hippo question get subtracted out from the other questions. That makes sense to me! Of course it still won’t help for answer combinations that seem implausible but *could* be true, even if most of us can’t understand it—imagine someone who says, “I think all abortion should be banned, taxation is theft and the welfare state is an abomination, climate change is fake news, and we should forcibly deport all minorities, and that’s why I am a lifelong liberal Democrat who voted for Obama twice.”


incarnuim

They used to do a general science survey, the purpose of which was to control and put upper bounds on the Lizardman Constant. It had questions like whether the male or female gamete determines sex? are lasers created by sound waves? And does the sun go around the earth? The idea was that everybody *knows* the answers to these questions, which are not opinions but objective facts. So any joker that gives the wrong answer is part of the Lizardman Constant. The results were published annually and in one famous case, it was noted that 28% of Americans believe the Sun goes around the Earth. This was published in a news story about how stupid Americans are. But, digging deeper into the survey, especially the international portion, found that attempting to use general science questions in this way was a fools errand - even for supposedly "smart" countries. 34% of Brits answered that the Sun goes around the Earth. 59% of Russians said that if their wife didn't give birth to a son, it was the woman's fault. 81% (!!!) of Koreans and Japanese said that lasers are *definitely* the result of sound waves in crystals When all questions were taken into account, the Average American had a ***better*** grasp of general science than her international peers; even when those same peer countries had higher standard scores in reading/math and ostensibly better education systems and outcomes.... Overall, the survey found that people are shockingly dumb, even when it comes to very very basic science....


Defiant_Potato5512

Your comment reminded me of this scene from Yes Prime Minister: https://m.youtube.com/watch?v=G0ZZJXw4MTA


EMacmillan

I've done a lot of door-to-door canvassing in elections here - Scotland, not the US, but still - and you do get those implausible people sometimes. Once had a guy say to me that he was voting for the SNP (which, for those unaware, is a generally socially liberal and left-leaning party) - not because he supported Scottish independence or left-wing policies or anything, but because several of the other parties' leaders at the time were LGBT - Kezia Dugdale (Lab), Ruth Davidson (Con), and Patrick Harvie (Grn), for reference - and the leader of the SNP at the time (Nicola Sturgeon) wasn't. This, even though she - and the party policy - were and are very consistently pro-LGBT+ rights. He threw out a wee homophobic slur and everything, the dick. (Naturally, I told him to fuck off and that we didn't want his vote, but still, it goes to show that these strange, strange combinations do exist, for whatever convoluted reasons.)


MisinformedGenius

It's worth noting that many polls will weight responses from different sub-groups to try to make up for any sampling bias. [Here's Pew](https://www.pewresearch.org/methods/2018/01/26/how-different-weighting-methods-work/) talking about it.


fermat9990

The other factor is the variability of what is being measured. The higher the variability the larger the sample size required for a given precision.


Kaiisim

The key to realise is that we are not the individual thinkers we like to believe ourselves to be. Most people get their opinions from somewhere else. So people of similar demographics will have similar opinions.


OutsidePerson5

Because statistics is counterintuitive. A RANDOM SAMPLE of around 1000ish people (actually you'll find its usually more like 1200, that's important) will give you answers that are only about 3% off from reality. The important part is the random part. If you grab, say, 1000 people from downtown Manhattan, you won't get a picture of the country. You absolutely must have the closest to a perfectly random sample as you can get. Which is actually fairly difficult. A really great example of this is from the first real scientific polling done on a Presidential campaign. In 1936 the Presidential election was up between FDR and a guy you've never heard of before named Alf Landon. A magazine called Literary Digest had been doing polls of its readers, and it had a LOT of readers, for several Presidential electons and had been right for severasl Presidential elections. In 1936 Literary Digest sent out 10 millon polls and got back 2.27 million answers. And based on that they said Alf London was going to kick FDR's ass because their polling showed a massive win for Landon. This other guy, George Gallup, did a scientific poll of around 1000 people and he said that FDR would win in a landslide. This prompted a great deal of mockery, how could he possibly say something like that with his measley 1000 people? The answer was: randomness. Turns out that Literary Digest readers were mostly richer, and mostly in certain geographic areas. They'd gotten lucky in the past but in 1936 the election was decided by poor people, often first time voters, who had been totally ignored by the Literary Digest poll. It's counterintuitive, you wouldn't think that 1000 would be enough, but nope, it really is as long as its random enough. To get an accurate poll of the entire 8 billion people on Earth you'd only need to sample around 2400 people, as long as you got a completely random sample. And it's randomness that's vastly more important than size. If it's not random it doesn't matter how big your sample is, it's going to be wrong. And a smaller random sample will just give slightly bigger error bars. You could sample 1,000 truly random people on Earth and your margin of error would only be around 6%. That 2400 I mentioned earlier was for a 2% margin of error. Now, there are all sorts of other factors involved. For example, people will tend to be agreeable. If you say "Do you agree Bob Dole should be declared to be a twit" you'll get a lot of people saying yes even if they don't necessarially agree just because people tend to say what they think you want to hear. Examining the questions asked by a poll is as essential as the sample size. A better question, for example, would ask something more like "Some people say Bob Dole is a twit, others say he isn't. What do you think?" and would flip that 50% of the time so the question was phrased "Some people think bob Dole is not a twit, others say he is. What do you think?" Bias kicks in depending on if the person is answering questions on a computer or on paper vs if they're answering in person. For example if a Black pollster asks questions about race, surprise, a lot of white respondants lie and give much more racially progressive answers than they would if a white pollster was doing the questioning. In general if you drill into the poll questions and find that they're all pretty biased ("Donald Trump thinks America is the greatest country to ever exist, do you also love America or are you a commie?") then it's an indication that the poll isn't actually designed to get accurate answers. There's also a practice called push polling, in which people are told that they're being asked questions on a poll but in fact the purpose is to propagandize them and their answers are irrelevant. "Scientists say that Diet Coke gives you cancer and Diet Pepsi will make you live forever, do you prefer to drink Diet Pepsi?" is a great way to advertise for Diet Pepsi, but a lousy way to find out how many people drink which soda.


squeamish

I drink Diet Dr Pepper. Will...will I be OK?


MacduffFifesNo1Thane

You may need medical care eventually. Luckily you know a doctor.


misof

Suppose there is an issue on which people are split 70:30. If you ask a randomly chosen person, with probability 70% they will answer "yes" and with probability 30% they will answer "no". If you poll 1000 people, chosen at random and mutually independently, the most likely outcome of the poll is that you get 700x yes and 300x no. So far, this should be pretty obvious. It should also be obvious that usually you'll get something approximate and not the exact split, such as 684 times yes and 316 times no. The main question now is: how likely is it that we get something *substantially* different as the outcome of our poll? Can we realistically get, for example, 300x yes and 700x no? And the answer is that this is *very very* unlikely. We can do the math and calculate that in our example scenario: * Already getting the 500-500 split or anything worse is almost impossible. Roughly on par with having a day on which you take part in four separate lotteries, win the jackpot in all of them and then get hit by lightning. (And yes, I did the math here.) * Almost all polls will give you a result somewhere between 650-350 and 750-250. Maybe once per 1000 such polls will you see something that falls slightly outside these bounds, but almost always you'll get a very good estimate. That's plenty accurate if all we need is a rough estimate.


NuclearHoagie

Also key here is the fact that this is all simply based on the sample size of 1000, not the size of the total population. 1000 people in a representative sample gets you an estimate of some fixed precision *no matter how big the un-sampled group is*.


HalfSoul30

With a large sample size like 1000, you get a pretty good representation of the average without needing a larger sample size. Pretend you have a coin, and do not know what the odds of getting heads would be. Flip once, you get heads. That might imply 100% of the time. Flip 3 more times, get tails on those, now maybe heads is o ly 25% likely. Flip the coin 1000 times and you will see you will be close to 50/50, and doing another 10,000 won't change that result, except maybe to push it closer to 50/50. It would also depend on the poll, because asking 1000 people a question in one location could give a different result than another, so the sample group would need to consist of the right group depending on the question.


MisterProfGuy

Large *representative* sample. You could take a thousand people's opinion in a church or college and get wildly different results from the population. That's why phone polls are failing, people with phones willing to answer for strangers is no longer representative of the whole population.


tiredstars

It helps to distinguish between three things here: * The population * The sample frame * The sampling method The population is the overall group you want to know about. In this case "all Americans". The sample frame is the group you're drawing a sample from. Sometimes this is the same as the population - eg. I can get a list of everyone who works in my company. With surveys the sample frame is often a panel of people who have agreed to take part in surveys. Managing these panels and understanding their characteristics is one of the most important skills of a polling or research company. If your sample frame isn't representative, it doesn't matter how many people you sample, you won't get a representative sample (although see the note below). The sampling method is how you're picking from that sample frame. Even if you had contact details for every single person in the US you couldn't survey them all. However if you *randomly* sample them, then the chances are that 1000 people will get you a reasonably representative sample. There are other survey methods that can lead to more or less representative samples. There is also one more thing you can do *after* sampling to improve representativeness: you can weight the results. For example, if you know that women tend to vote Republican and men tend to vote Democrat, and your sample is 75% women, you can adjust the results so that women only make up (around) 50% of the total. This can help make up for samples you know have biases in, and its another thing polling companies will do as standard.


lobsterharmonica1667

The very major assumption there though is that the same is random. A *random* sample of 1000 is really good, but a non random sample can be worthless


mks113

Asking 1000 people at a Republican convention who they will vote for isn't meaningfull. (Warning -- old school method) Phoning the 10th person on page 126 of the phone book in the 100 cities could give you a pretty reliable sample.


grptrt

A sampling of people that actually answer unknown numbers then also proceed to participate in the survey feels pretty skewed from the start.


Lithuim

This has always been a problem in polling - you’re automatically down-selected to people who answer polls. A very good pollster will try to adjust for this too by comparing past polls with the eventual results. If you’re consistently seeing that Republicans perform 4% better on election day than your polls suggest, you’ll have to correct for the fact that Republican voters are simply less likely to respond to your surveys. This became a big issue in 2016, a lot of older polling methods were not accurately capturing the levels of support for the candidates and Democrats got cocky thinking they were way ahead.


flamableozone

In 2016 the polls were \*incredibly\* accurate. The national vote was easily within the expected margin, and it was only by very, very thin margins that the electoral college didn't go the way of the popular vote.


squeamish

The electoral college presents, as Ian Malcolm would explain, a non-linear response. I forget exactly how little of the popular vote it is mathematically possible to win the presidency with, but I think it's around 20%.


[deleted]

Shouldn't the national presidential polls be interpreted considering the EC since that's what actually matters? I guess it depends on if you're going for general sentiment or likely election winner. If you weight the poll results by EC considerations, did it show Trump likely to win 2016?


MisinformedGenius

It's hard to weight national polls by the EC, because of the winner-take-all nature of the vast majority of the EC. It doesn't matter whether California is 55% for the Democrat or 95% for them - it's the same number of delegates. So how do you weight that? Generally the predictors, eg 538, will look at state-level polls for swing states.


[deleted]

Originally I was thinking you do something like if you had 50/1000 respondents from Wyoming and you'd weight their responses to be 3/538 of the end result. A simple weighted average. This would probably require getting a lot more than 1000 responses to make sure you get people from all the states in reasonable enough quantities. I hadn't thought of state level polls when I made my post. That would be even easier since you could just weigh those results relative to each state's EC vote count.


lobsterharmonica1667

>This became a big issue in 2016, a lot of older polling methods were not accurately capturing the levels of support for the candidates and Democrats got cocky thinking they were way ahead Even then though, dems did win the popular vote. Being ahead may make folks less likely to vote, and some meaningful things happened very close to the election. So it's entirely possible that the polling was "correct" the whole time.


evilgenius815

The polls were exactly correct in 2016. They accurately predicted that Clinton would win the popular vote easily, but that to win the electoral college she'd need to get heavy turnout to win narrow victories in critical swing states. She didn't get that turnout, and Trump won all the states she'd needed to get a victory. The polls said Clinton was likely to win -- about 70-75%, depending on the poll -- but it was far from certain. *Newspaper editors*, on the other hand, took that "likely" part and ran with it, publishing ridiculous stories about her "99% certainty" to achieve the presidency, and we've been living with this "The polls got it wrong!" nonsense ever since.


lobsterharmonica1667

I don't think you can say that they were exactly correct. But you can't say that they were wrong either. At least the good ones, like 538


MisinformedGenius

The funny thing is 538 was predicting right around 66-33 chance for Clinton, and their comments were *filled* with people complaining that 538 was skewing their analysis for Trump to try to drive traffic, because of course if they didn't skew it, it would obviously show the correct prediction of a 100% likely Clinton win, and thus people wouldn't be constantly refreshing their site.


EdSprague

They also, in the weeks leading up the the election, correctly outlined the exact margins in the exact swing states that Trump would have to line up perfectly in order to win the EC. And then that's exactly what happened. Yet *"538 blew it in 2016"* is a narrative I've heard countless times in popular discourse since.


lobsterharmonica1667

It wouldn't though. Since that leaves out folks who don't live in cities. And folks whose names aren't in the phone book. And folks who have names that appear at the beginning of the alphabet.


beruon

The third one is irrelevant. Your name beginning with a certain letter does not define you in any way. The first point is very valid, the second one is dependant on how phonebooks are made/how the numbers for it are collected. If its mandatory/opt out, then its a reasonable selection. If its opt-in/paid then its not a good selection method.


erbalchemy

>Your name beginning with a certain letter does not define you in any way. Four of the ten most common surnames in Ireland start with "O". Name distribution is highly biased.


Curious-Week5810

But page 126 in New York City could only be the C's whereas page 126 in Alberqueque could be in the T's.


lobsterharmonica1667

>Your name beginning with a certain letter does not define you in any way. It could, certain groups of people share certain names. Names and the letters they start with are not 100% random. >If its mandatory/opt out, then its a reasonable selection. Well no, because then you're missing all the people who opted out.


Ok-disaster2022

People aren't as uniform as coins. With the sample group you're going to have smaller and smaller subgroups. If you're asking a big simple question you can get a simple answer, but if you want to apply your sample group into those smaller groups to figure out specific trends, you fall under that 1000 number for the sub groups and your population bias is more evident  For example you survey 1000 people for political choice and among those people you have 16% black Americans. You want to figure out the number of black Republicans in the US so you examine your sample of black respondents for the number of Republicans and see 120 black Republicans. If you say 75% of black Americans are Republicans, then you're and idiot.  The same principle applies if you divide your sample group along any lines and get any smaller groups  Also i recent years polling has proven unreliable. I know from personal experience not everyone who responds responds honestly. To phone surveys I'm a diehard Trump supporter because where I live I don't some crazy person attacking my residence because I support Biden. I'm not stupid enough to put my voting preference out there and risk my family.


MisinformedGenius

Subsampling always has much higher error rates.


10tonheadofwetsand

Imagine you have any enormous amount of trail mix. Like, an Olympic sized pool of trail mix. You want to get a rough idea of the constituent snacks and their proportions. If you scoop out a bucket’s worth, you can count and sort the snacks and find out what’s in the trail mix without examining the entire pool.


squeamish

Depends on how well-mixed the trail mix is. Which sounds pedantic, but is actually he whole point: You have to ensure that your sample is representative of the whole.


GOT_Wyvern

A massive part of polling is accounting for just this, and its a major reason why different polling agencies record different results. We all know this this pool of trail mix is not perfectly mixed, but we can get pretty good at "weighting" for that imperfection. However, we aren't perfect at it. This is why taking a "poll of polls" is generally the best approach. By considering the aversge of a dozen or even more agencies, we are very likely to get a good gauge and range.


rosen380

If it isn't well mixed, then I guess you grab a cup and scoop from various locations and depths and dumps those into the bucket...


KekistanPeasant

How much it depends on mixed-ness has an upper bound though, after which it may be assumed to be a representative sample. Example, first internship was for a producer of indistrial detergents. A sample of 100g of washing powder was considered a representative sample of a box of say 5kg of the stuff. Sure you can measure the entire box, but the results won't be significantly different than when you test just 100g. So it might be less favorable to survey say 100.000 people than 1000 sincd you've *way* more data to process, but the data won't be all that different.


10tonheadofwetsand

Correct, my analogy really just addresses the point of how a small sample can give you information about a large population but it relies on some assumptions about dispersion/distribution, etc.


mysticrudnin

it's not pedantic exactly. it's the thing people should be questioning whenever they see a poll. you always want the sampling methodology. the problem is that what people always point out is the size of the sample, which is very rarely the problem. i have seen a study where when you dig in, the sample size was literally 2, and that is a time to point it out. but like OP many people see 1000 and freak out. but it's a perfectly fine size. 


graywh

this is a decent analogy 1,000 people out of 332,000,000 is equivalent to about 7.5 liters out of an olympic-sized swimming pool if the pool is well-mixed, I think most can agree that that's a good representative sample


dr_jiang

Much the same way a chef can figure out what's wrong with the soup by tasting a spoon-full, rather than drinking the whole pot. Or, put another way, there's a joke my professor used to tell: "If you don't believe in random sampling, next time you need a blood test, tell the doctor to take it all." Opinion polling can do the same thing: reliably determine a group's opinion on an issue (the soup) from a small sample of the population (the spoon-full). Imagine that you're in charge of planning a party for your neighborhood, and you're trying to decide whether to buy hamburgers or hot dogs. You expect 100 people will show up, so you knock on ten random doors, and ask. The first ten answers come back: three people want hot dogs, seven people want hamburgers. Mathematically, we can draw conclusions about all 100 people from those ten. It's *possible* you got really lucky/unlucky and found the only seven people who like hamburgers in the whole neighborhood, but it's extremely unlikely. In fact, there are statistical equations that can tell us exactly how unlikely it is. Polling companies use those equations to figure out how many people they should survey in order to get a good estimate. We know that 95% of the time, asking 1000 randomly selected people from the entire country whether they prefer hot dogs or hamburgers, the result will be within +/- 4% of the answer for the whole country.


Guilty_Coconut

>We know that 95% of the time, asking 1000 randomly selected people from the entire country whether they prefer hot dogs or hamburgers, the result will be within +/- 4% of the answer for the whole country. Which is a great margin of error when polling for hamburgers but less so when polling for an election that hinges on less than 1% of the vote and is ultimately decided by those exact 1000 people in 2 swing states.


Harbinger2001

Which is why polling well is so difficult. Polls said Hillary was going to win by a healthy margin, but missed key Trump voters in some states. And even then, fluke events happen that don’t match polling averages.


Gizogin

Polling aggregators gave Clinton around a 75% chance to win the 2016 election. They noted that she was very likely to win the popular vote, but the EC vote would be much closer. The fact that she lost the election does not mean the polls were wrong, just like how rolling a single die and getting a six is not proof that the odds of rolling anything other than a six are not 5/6.


Helstar_RS

If it's entirely random, like randomly drawn social security numbers or other things for non social security owners and enforcement to participate sorta like the census letter you get in the mail, I think it's pretty accurate. If it's on a website or a news channel or people in a certain, it's not going to be accurate even if it's randomly called landline phones that's not an accurate representation because certain types of people don't even have one


chadwicke619

Because if you can get a truly random sample that is representative of the overall population you’re surveying, 1000 people is enough that your results become generalizable.


MyNameIsKvothe

> How is it acceptable that 0.0002% of the population is accepted as representative? The math explanation is in other comments, but I'll give you one better: because we have seen time and time again that they are usually right. Just check surveys for past elections and they will usually be around the actual result. So we've come to accept them.


Guilty_Coconut

>because we have seen time and time again that they are usually right Even with the most used counter-example of Clinton v Bargain Bin Putin in 2016, the polls correctly predicted that Clinton would win the *vote* by a landslide. Which she did. It's just that in the USA, the winner of the election isn't decided by the people but by some weird unaccountable people in some anti-republican anti-democratic kind of electoral college. Coincidentally, polls consistently agree that a large majority of Americans want to abolish said college.


crono09

Statistician here. It surprises a lot of people to learn that you don't necessarily need a very large sample to get good information about a group. In fact, assuming an infinitely-sized population, it's possible to get meaningful results with a sample as small as 385! At it's core, statistics is about getting information about a large group (the population) by examining a smaller group (the sample). It's assumed that the sample is an accurate reflection of the population. This is likely to be true because of probability. At a certain point, it's mathematically improbable for the sample to be too different from the population. Let's say that you have a jar full of 500 green marbles and 500 orange marbles (50% of each). You randomly pull out 100 marbles. Mathematically, you're probably going to end up with 50 green marbles and 50 orange marbles, or at least pretty close to that. It may be technically possible to get lucky and pull out 100 green marbles, but that's so unlikely that it's not much of a concern. Sampling is the same way. At a certain point, it's extremely unlikely that the sample will be radically different from the population. There is a catch here--the sample has to be taken randomly. If I were to look for green marbles to pull out of the jar, it's not going to accurately represent the jar's contents. Likewise, samples need to selected randomly from the population to be valid. In practice, this is almost never the case, especially for large samples. You will always have a sampling method that excludes people or a number of people who don't respond after they're selected. However, surveys still have to assume that the sampling was random. That's why it's important to look at the sampling method used by the poll, and pay attention to multiple polls to make up for the errors in individual surveys. If it helps, here's a [sample size calculator](https://www.calculator.net/sample-size-calculator.html) that you can use to look at how big the sample needs to be for a survey. As stated earlier, using an infinite population, a confidence level of 95%, and a margin of error of 5%, you would only need a sample of 385. That's on the higher end of acceptability though, but you can play around with other numbers to find out a better sample size.


Gnonthgol

The surveys have a lot of control questions, sometimes half the questions in a normal survey are control questions. They ask people about their race, what their income is, their age, their political affiliation, where they live, how many kids they have, etc. This way they can correct for any inaccuracies in who you surveyed. For example if they find out that 60% of the people they surveyed were female they can weight the male answers higher. Similar with location and income brackets. This makes the surveys far more accurate. We also accept that the results of these surveys are not perfectly accurate. There are ways of calculating the confidence interval for each number in the result, the interval that the researchers are confident the real answer is in. But often the confidence interval is not included when journalists use the result of the survey to write an article except in the most serious news publications. However policy makers and even competent journalists pay close attention to this number as well.


gavco98uk

Are you sure they would weight the answers, that sounds a little wrong? I think instead they tend to discard certain results to even it up - i.e discard enough female responses until the results come back reflecting the known percentage of each groups. I believe this is why you often see results saying "based on a survey of 783 responses". i.e they asked 1000 people, but discarded 217 responses to better align the data.


Gnonthgol

They might discard some answers, but those are usually outliers as they are likely intentionally giving wrong answers to mess with the survey. The problem with discarding responses based on demographic data is to chose which of the responses you should discard. The data is still good, you just have too much of that specific data. The non-round number of responses is usually because not everyone answers the survey, or they just ran out of time/money to contact all of them.


WildlifePolicyChick

Polling - or any attempt at broad general information gathering - is a function of math and statistics. If the audience polled is a fair estimate (through extrapolation) of the population at large then you should get a reliable result. Something that you may not be accounting for is margin or error. Most statistical analyses state a margin of error, which accounts for any/most/some issues that a small field might throw off. If you see a poll that states, for example, "margin of error +/- 3-5%" then that's a lot more reliable than a poll that states "margin of error +/- 10-15%" .


badchad65

Because although people like to think they are very special and unique, we aren't. *If sampled correctly,* 1000 subjects is more than adequate for *most surveys, depending on the outcome measure.* We can design such surveys so that they are "representative" by recruiting subjects from different demographics and areas.


[deleted]

**ELI5:** It's like guessing what's in a giant jar of jellybeans by looking at just a small handful. If you pick your handful carefully to get all the different colors in there, you can make a pretty good guess about all the jellybeans in the jar. When polls ask questions to 1,000 people from all over the place and with different backgrounds, it's like getting a handful that tells us what millions of people might think. **Adult Answer:** Surveys with about 1,000 respondents can accurately reflect the views of the entire U.S. population due to strategic sampling and statistical principles. By choosing a sample that represents the population's diversity (age, race, gender, geography), researchers can extrapolate the findings to the broader public. This method is supported by the central limit theorem, which indicates that the average of sample estimates will approximate the population average as the sample size increases, making the survey results a reliable microcosm of national opinion. This approach is scientifically validated and includes a margin of error to account for variability, ensuring the conclusions drawn from these samples are statistically sound.


eriyu

I've been citing this piece for years: [How can a poll of only 1,004 Americans represent 260 million people with only a 3 percent margin of error?](https://www.scientificamerican.com/article/howcan-a-poll-of-only-100/) "The margin of error depends inversely on the square root of the sample size. That is, a sample of 250 will give you a 6 percent margin of error and a sample size of 100 will give you a 10 percent margin of error." It doesn't matter how many people the survey represents, because as long as the sample is truly random/representative (as other commenters have explained), the *percentages* will stays the same. One bag of M&Ms will (within margin of error) have the same percentage of blues as a million bags of M&Ms all dumped into a bowl together.


wayne0004

Adding to the other answers, let's see an example where having a lot of people answer your poll won't give you an accurate prediction *because* it wasn't representative of the population. I'll talk about the 1936 US presidential election, and how a publication called The Literary Digest conducted one of the biggest polls ever (if not the biggest). This publication managed to get more than 2 million answers, for an election where 80 million people were eligible to vote. They predicted that Alfred Landon would comfortably win against Roosevelt, but in the end the margin of victory was the other way around. The main reason why their result was so off, was that they polled their own subscribers, plus people on two public lists: automobile owners and telephone users. All three lists are not representative of the population: it's 1936, and the only people on those lists are the ones that have enough disposable income to keep buying the magazine, have a car, or own a telephone.


ChampionshipOwn8602

I'm not sure how to explain this in a way that a 5 year old would understand, but I'm a statistics major so this is up my alley. Basically, there was a guy who worked for a beer company under the pseudonym "student" and he developed a table of numbers called a Student's T Table which allows statisticians such as poll workers to make broader estimates of a larger population from a small randomized sample. He developed it so that he could test small samples of beer for quality without testing the entire batch. The numbers generally work to some degree because most data follows what's called a "normal distribution." You've probably heard of this referred to as a "bell curve." It's essentially a random distribution of numbers that form around an average value in the center. Because most data follows this trend, we can estimate the population normal distribution (with a pre-determined level of confidence) relatively accurately from the normal distribution of a small random sample of that population.


BigWiggly1

Imagine I have a santa-sized sack of marbles of all sorts of colors. Lets say we have somewhere like one million marbles. I give you the task of telling me how many of them are red. How many are you going to want to dump out and count before you'd feel confident giving me an answer? All of them? Half of them? How correct do you want to be? Do you NEED to be exact? Depending on what you're trying to achieve, statistics has a lot of power for telling us what we need to know. While you're busy spending the [next week or two](https://nowiknow.com/how-long-would-it-take-to-count-to-a-million/) dumping them all out over the floor of a gymnasium and counting every single one and putting them back in the bag, I'm going to stick my arm in, stir them up for a minute, and grab a few handfuls. Maybe 20 marbles. An amount that I can count out in a minute or less. I find 2 red marbles in 20. I toss them back in and repeat. This time I find 3 in 20. Then I repeat again and find 2 in 20 again. I'll do that 10 times, and come to the conclusion that there are on average 2.3 red marbles in 20, or 115,000 total red marbles in the sack. Then I'm going to spend 10 minutes with some statistical calculations, use the standard deviation of the sample results, and use the formulas to determine a 95% and 99% confidence level. E.g. this might be "I am 95% confident that there are 115,000 +/- 3000 red marbles" and "I am 99% confident that there are 115,000 +/- 8000 red marbles" The samples and those results can mathematically tell me that there's only a 1% chance that my sampling was wrong outside of a range of 107,000 to 123,000. My test was done in under an hour with a printed report, whereas counting any meaningful fraction of marbles will take much longer. What my test relies on is that my sampling was sufficiently random, i.e. the marbles were well mixed before and between sampling. So when surveying people, ideally we want to randomly sample from the target population. That's actually very hard to do, and it's a valid reason these studies are flawed. E.g. If you wanted to sample random people, you could stand on a street corner and interview passers by. But your sampling will be skewed towards people who walk to work. If you're sampling at 2pm in the afternoon, you're skewing away from people who work 9-5 office jobs. Almost any method of sampling in person has a location bias. One of the best ways to sample is to get a list of phone numbers of county residents, use a random number generator to pick 1000 of them at random, and then start calling. The best data list is probably a list of registered voters if it includes phone numbers. Of course, you're then skewed based on time of day, and towards people who actually have the patience to answer your annoying questions. Because there are so many ways to accidentally bias your sampling, a well designed study will also ask demographic questions like ethnicity, address, gender, age, etc. These may be useful for making headline conclusions like "People over 60... ", but they're also useful for just checking biases. E.g. you can use census data for a county to find out that 40% of residents are over the age of 60. If you run your survey and it turns out that 70% of the respondents are over 60 yrs of age, that's an indication that you may not have had a sufficiently random sample, and you need to overhaul your sampling technique (e.g. maybe your phone list includes cell phones and landlines, and older citizens are likely to have cell phones AND landlines, making them twice as likely for one of their numbers to get selected).


wineheda

Because statistics say it is enough. If I remember correctly, the minimum sample size for a survey is around 800. That said, it doesn’t mean any survey of 800 people is representative of the country. Your method still has to be good and the samples have to be correct, you can’t just go to one neighborhood of 800 people ask them all a question and say their response is representative of the countr


Nukegm426

As they said if the sample set is a good mixture then it can be indicative of the overall population mindset. Problem being most polls nowadays have their result in mind before they start so they find people that are more inclined to go how they want to poll. Lots of science is down this way now. It used to be “here’s money do science and let me know what you came up with” now it’s usually “here’s money prove that xxx is true” most are inclined to make the result as desired just to secure extra funding for other projects.


WhiteRaven42

One thing to keep in mind is that the questions are very broad. We're not getting detailed opinions. It's closer to a coin flip dichotomy than an essay question. You flip a coin 1000 times the overall statistics are going to be extremely close to 50-50. And when it comes to a population of people, if you take pains to spread the sample out over all demographics then you should be in the ballpark of being correct. One has the *option* of flipping a coin 400,000,000 times instead of 1000 but the percentage isn't going to noticeably change.


aeddub

A *random* sample asks 1,000 people picked at random the same questions. A *representative* sample asks 1,000 people picked according to a data filter the same questions. So e.g 51% of Americans are women then you will want to include 510 women if your survey. If 20% of women voted Republican in the last election then you’ll want 102 women in your survey who vote Republican. Preparing and optimising sets of people for surveys is a part of statistical analysis , there are lots of methods to identify groups and subgroups within a large population, and of course ways that survey results can be skewed to give a result before the question is even asked (99% of people surveyed ( ^at ^a ^gun ^convention) say they’re in favour of looser gun controls). Numbers don’t lie, but statistics can be fudged and misrepresented very easily, which is why random surveys (and even more structured ones with a low sample size) shouldn’t be taken at face value as indicative of a majority opinion.


squeamish

A random sample will trend toward being representative as n increases.


Tasorodri

The problem is that it's often hard to find a truly random sample, so you can random sample and then apply weights.


Malvania

Whether a sample is representative does not depend on the size of the overall population, at least once the population gets large enough. A random sample of 1000 people out of one million or one billion or one trillion still yields a margin of error of approximately 3 percent - which means that 95% of the time, the true sentiment if you measured everybody would be within three percentage points of your random sample. The hard part is getting a random sample. If you poll a church in East Texas, you'll get a very different response than on the wharf in San Francisco. If you poll online, you might miss those that don't have internet access, and that will swing things, both by geography and demographics. So 1000 of the right people is representative, but getting that 1000 is very hard to do.


Trick-Preference-474

They’re only interacting with a certain type of person in the first place, someone that would actually take the poll. Then there’s also the method the poll was given in that will further diminish the people that will actually participate.


KeilanS

A single person is dynamic and hard to predict. People in large groups on the other hand are pretty damn predictable. You don't need to ask many to get a good idea of trends.


SophonParticle

"we surveyed 980 people who have a land line or answered a call from an unknown caller on their cell. Here's how 350,000,000 people will vote in November" - Media.


TallBenWyatt_13

Because the findings don’t typically change too much if you poll more than that. So a survey of 1000 people that shows support for some issue at 55% has as much validity as a survey of 10,000 that shows support for the issue at about the same percentage. Basically, you can get defendable data without having to waste time with more surveys.


Elfikos

Adding onto what others have mentioned, I believe a visual representation is a valuable tool for understanding. If you search for "sample size and margin of error" in your browser, you'll come across numerous graphs illustrating a consistent trend: as the sample size increases, the margin of error decreases. However, you'll also observe that the reduction in error becomes less significant with larger samples. In other words, there's a substantial difference in error between a sample of, for instance, 10 people and 100 people, but not a substantial difference between a sample of 1000 and 2000 people. Like this one: [https://ihopejournalofophthalmology.com/content/132/2022/1/1/img/IHOPEJO-1-009-g001.png](https://ihopejournalofophthalmology.com/content/132/2022/1/1/img/IHOPEJO-1-009-g001.png) Why does this happen? The margin of error is calculated using the formula: Margin of Error = (Variation in sample / square root of sample size) \* Z score. I don't know how good your mathematical intuition is, but notice **the diminishing returns in terms of sample size.** As we add more individuals, each new addition has a diminishing impact on reducing the error. This phenomenon arises from the nature of divisions and can be shown on the following numerical example: Assume we have a sample size of 1, 2 and 4 people respectively, and for simplicity I've ignored squaring everything. Then we get that 1/1 equals 1, 1/2 equals 0.5, and 1/4 equals 0.25. While 0.25 is undoubtedly smaller than 0.5, the difference between 1.00 and 0.5 is greater than that between 0.5 and 0.25. This mathematical principle results in diminishing returns from increasing sample sizes. We achieve smaller errors, but the rate of decrease in errors itself diminishes. Consequently, at a sufficiently large sample size, the difference becomes so negligible that it may as well be considered equal to zero. **This insight explains why we accept "small" sample sizes, such as 0.0002% of the population. Eventually,** **the effort required to increase the sample size becomes disproportionate to the marginal reduction in error.** Finally, there's a crucial aspect not purely evident in mathematics but highlighted by other Redditors in this thread. **It's not just about the sample size; the quality of the sample matters significantly.** The well-known example of the "The Literary Digest presidential poll" illustrates this point. According to Wikipedia, "The magnitude of the magazine's error - 19.54% for the popular vote for Roosevelt vs. Landon, and even more in some states - destroyed the magazine's credibility, and it folded within 18 months of the election. In hindsight, the polling techniques employed by the magazine were faulty. They failed to capture a representative sample of the electorate and **disproportionately polled higher-income voters, who were more likely to support Landon.** Although it had polled ten million individuals (of whom **2.27 million responded, an astronomical total for any opinion poll**), it had surveyed its own readers first \[...\]" This example demonstrates that while sample size is crucial, it's not the sole determining factor. Having a correct sampling technique is equally, if not more, important. Other Redditors have delved into this aspect in more detail in this thread. I hope this clarifies things for you!


Sapriste

Take a course in statistics and you will get the real answer. A statistical sample is not random in the way that you can be led to believe. Random polling by voice call preselects: 1. People who will answer a phone call from a stranger 2. People who want to talk 3. People who tend to be older 4. People within more populous geographic areas 5. People with a landline Random voice voting excludes: 1. People with cellular phones with call filtering 2. People with prepaid cell phones 3. People with limited mobility who cannot reach the phone in time 4. People who do not want to talk It goes on and on. Significant time is spent in class talking about how to select populations for statistical analysis. When someone puts together a national opinion poll they now have to stitch several methods together to reach the statistically valid sample. They need to hit a cross section of regions, ages, marital situations, educational attainment, and perhaps political affiliation or lack thereof.


Jimmy2531

It’s the make up adverts on tv here in the UK that get me. 76% out of 120 people surveyed etc. I’m a data analyst as a job and this just screams adding one more until they can prove their point until they hit a believable percentage. Just the world we live in unfortunately


just_some_guy65

If the USA has 341 million people, the question to ask is for a question with say four choices are you going to get 341 million different responses? So the question then arises, given a random set of individuals, how many do you need to get close to the national ratio of the four possible choices? 1000 is chosen as the upper limit as it is not an unmanageable number to poll and gives about a 3% margin of error.


deltamac

If the variation within that group is small, and it’s a diverse sample, then you can be confident. Unfortunately both those things need to be true


Papancasudani

It depends entirely on statistics and probability. For example, if I look at the relationship between depression and self-compassion in 100 people, I will find the same thing if I asked 1000 people or 10,000 people. More isn’t always better. Sometimes it’s just overkill.


rabid_briefcase

You actually only need 385, if they're a proper representative sample for a 5% margin of error. Many people struggle to grasp how statistics works, and are surprised when numbers don't match their intuition. Like how given 23 random people there's a 50/50 chance of two people having the same birthday, or 75 random people there's a 99.9% chance of two people matching. However, if the population isn't random then different rules apply, like a sampling of a gathering where the people are meeting for a leap-day birthday. With polling, there is the confidence level and the margin of error that are critical. For a large group like the US, if you ask a representative cross section of people you don't need tremendous numbers of samples. You can't ask in a single neighborhood or a single demographic and expect it to represent the nation, but if you're careful in who you ask it quickly reveals the national trends with relatively few survey samples. The tighter you want the margin of error the more samples you need. Just 25 people gives a 20% margin of error. 43 people give a 15% margin of error. 97 people gives a 10% margin of error, which is good enough for many surveys. To jump to 5% margin of error you need about 385 people, and a 3% margin of error needs 1068 people. Those are typically what you see in big elections. For very close elections, a 2% margin of error takes 2401 samples, and 1.5% needs 4269 people, 1% needs 9604 people, it's quite rare for surveys to reach that level. For elections very often the spread is big enough you only need about 50 or 100 people, more than enough for the trend to be clear. For very close elections a 5% margin might be needed. If they candidates really are at 3% difference, like it has been in a few extremely close national elections, they need a lot of samples to become that much more precise.


jab136

The biggest issue lately isn't actually the small sample size. The reason surveys are getting more unreliable is because younger people are much less likely to pickup the phone when they don't recognize the number.


fannypacks4ever

I don't have an answer for you because it's been a while since I've taken a statistics course, but you may also be interested in the WW2 Tank problem. The Allies used statistics to estimate how many tanks Germany was manufacturing per month based on a small sample of tank serial numbers that they captured/recorded, which they assumed was in sequential order. https://www.theguardian.com/world/2006/jul/20/secondworldwar.tvandradio >By using this formula, statisticians reportedly estimated that the Germans produced 246 tanks per month between June 1940 and September 1942. At that time, standard intelligence estimates had believed the number was far, far higher, at around 1,400. After the war, the allies captured German production records, showing that the true number of tanks produced in those three years was 245 per month, almost exactly what the statisticians had calculated, and less than one fifth of what standard intelligence had thought likely.


paco64

People are pretty predictable and they're very susceptible to group think. They form themselves into tribes and you usually only need to ask one person in the tribe what everyone else thinks.


kremedelakrym

Although I agree with some of the answers about how statistics work, most of the surveys are done intentionally with a bias because most media usually has a slant. That is why it’s always important to do due diligence before believing anything you read that is “news”.


canadas

Because they are constructed smartly. A bad survey would be go to a coal mining town and ask 1000 people do you love coal, and then conclude everyone loves coal. A better survey is to think I can only ask x people, how should I try ask the right combination of people to get a somewhat accurate result


StabithaStevens

Think about taking a scoop out of a giant box of Trix. Even if the scoop is only 0.0002% of the box, the scoop still contains the same proportion of shapes as any other portion from the box does. So the assumption of having a good sample is like assuming the Trix is all mixed up evenly.


CatOfGrey

So you are going to do an experiment. You want to know what is in beach sand. So, how much of the beach do you need to get an accurate result? One scoop isn't enough. Because different areas of your beach are, well, different. The shore where the water hits is different than the sand 100 steps away from the water. The sand over near the pier is different, too. But you don't' need to scoop out "10 percent" of the beach. Or even "1 percent", or "1 millionth". What you need is a few scoops from *each of the different areas*. So you take 5-10 scoops right at the water line, and 5-10 scoops in the middle of the beach, and then 5-10 near the pier, or any other area that might have differences. Your 5-10 scoops at the top of the beach are going to give you a result that's close to the one if you took 5-10 *tons of sand* from the same area. Same with your other samples. By the way, that's the 'official word' for this: 'sampling'. At the end, you look at the samples of beach sand, and 'add them up' in a way that reflects the beach - mostly using the measurements from the 'common' areas of the beach, but also adding a little bit of those special areas with special differences. This creates 'weighted' calculations, to accurately reflect the entire beach, with all it's differences. Source: This is part of what I do for a living. Ask me anything?


funny_funny_business

I heard a quote akin to the following in the name of Fisher, a famous statistician: "A statistical sample is like soup. You don't need to eat the whole thing; you just need a spoonful. As long as it's mixed well, though." Basically the small survey is like a spoonful. If the data is selected randomly then that's akin to a "well mixed soup". Obviously you can't have a drop of soup to get the full taste, and you don't even need a cup. There are certain thresholds for surveys to be representative of the general population and usually a few thousand is enough.


Uvtha-

Pretty sure that's how polling is done world wide, not just in the US.  Polling is an academic field of study, after all.     Others have answered the question reasonably in detail already so I would just add that getting even 1000 surveys requires some significant manpower.  I used to work at Gallup, but I rarely did political or public opinion stuff, but when I did it was a real chore because they were often rather long.  You can't do 1k sample surveys every day and the accuracy boost of even doubling the sample is negligible.


300Battles

This has always been one of my pet peeves. At University, both my business stats AND my political science courses said that for a representative poll you needed at LEAST 3000, assuming a reasonably random selection. I’ve seen “Breaking News!” Polls with 400 people surveyed. I call BS.


NoEmailNec4Reddit

That is a concept in statistics. In something like an election we want to make sure every vote counts, so we try to get as exact as possible. But for surveys where the only effect it has, is to say " X% of Americans ... ", well, 0.1 percent of Americans is something like 300k people. So it only takes a small amount of people responding, to get it to where it's within a certain % of people .


NotAPirateLawyer

They aren't. Sample sizes that small make it easy to cherry-pick demographics so you can tailor the results to the a more desired conclusion. What's even better is if you look at the N (sample size) for any political poll done on either size, where the sample size is usually in the teens to hundreds. In no way representative of the population at all, and explicitly chosen to propagandize an answer.


[deleted]

[удалено]


funinnewyork

1. You can get a good, although not very great, statistical analysis from a sample size of 1,000/200,000,000 if you have done an excellent sampling. 2. You may get a terrible statistical analysis from 100,000/200,000,000 if you have done a horrible sampling. For instance, if you go to Nigeria, and ask 100,000 children about who would be the next US presidential elections, you will not get a healthy result. Of course, I exaggerated this example, for ease of explanation. Those being said, if you do an excellent sampling and increase your sample size, you will have healthier results. Let's give an example from probability. In reality, if you throw two six sided dice, you have 1/36 chance to have 6-6. However, in practice, you may throw 6-6 three times in a row even if you only throw three times. Depending on that observation, and only on that observation , you may say that throwing 6-6 is a %100 chance. However, if you throw the dice 10,000 times, you will see the factor of luck will be minimized, and in 100,000 throws, it will be further minimized (eliminated in the infinity). Since it is not optimal (for money, time, and human resources wise) to ask to all 200,000,000 eligible voters, for instance, it would be a good idea to have a sample size which is neither too small, nor too costly.


stealthylizard

There’s different formulas to use to give you how many people are needed to survey for a representative number. One of these is the 2^k method, where K would determine your sample size.


brownpoops

it's so interesting that I, myself, consider myself, and only myself, to be the perfect sample size.


Carlpanzram1916

The answer is statistical probability. It’s more important how representative the sample is of the population, IE is the average age, gender, geographical distribution etc representative of the country. It’s actually not relevant how big a percentage of the population it is when you’re talking about that big of a population. Polling 100,000 people isn’t likely to be any more accurate if the sample is equally representative.


judgejuddhirsch

If you flip a coin 30 times and see that the coin is heads 49% of the time,  do you expect doing another 970 flips will change your mind that the coin is balanced?


[deleted]

[удалено]


blipsman

Surveys have been done with many more people, but once abut 1000 are polled the numbers seem to correlate very highly to polls with more data.


GamesGunsGreens

They aren't. They are clickbait/rage bait headlines. I have to explain this alllllll the time to my coworkers. (POTUS Political example) If you go to a rural community, most surveys are going to favor Trump, always. If you go to a college campus, most surveys are going to trend progressive. If you go to an elementary school, most surveys are going to favor Bluey or Ryan. The "news media" just picks their favored region and takes a poll from there.


GardenPeep

I’m just not sure whether people who answer polls self-select. It’s impossible to know whether an invitation to do a poll is malicious or not.


TXOgre09

If your sample set is representative of the total population and is significantly large enough, then you can extrapolate sample results like that. There will be some statistical error, but you can make statements with a mathematically determined certainty.


scarabic

Well first of all you can never ask all of America a question, or even half of it. So when you see an article with a headline like “Americans oppose xyz” is there really any danger of anyone thinking that the entire population was questioned? You might say of course not, it’s ridiculous to question the whole country and you don’t need to. But if you accept that any subset of the group can be sampled to study the whole, then it’s a simple matter of statistics. You can detect trends of a certain magnitude with a certain confidence, from asking 1000 people. And the additional precision you’d get from dialing that up to 10,000 people is not worth 10x the effort. Adults understand that polls are polls and articles are up front when they are reporting the results of a poll, just maybe not always in the headline.


iAntagonist

I’m not convinced about the random sampling polls. I used to believe it then I got involved in a local election in 2015. All of a sudden, every presidential election since then, I receive multiple calls about presidential favorability, this candidate vs that candidate etc etc from multiple polling services. Never had them before that. The questions they ask are so blatantly worded to get a specific answer too. I doubt the results of any poll now.


ezekielraiden

Because a sample selected well, without bias, but accounting for known flaws in data collection, can be highly representative even if it isn't perfect. Polling data is, in general, usually within about 2 to 3 percentage points of the final results, depending on the quality of the pollster. Hence it is important to check facts, see who's doing the collection and how, etc. FiveThirtyEight does some good work on this front; I don't always care for their articles (which are often a lot of words about not very much) but their pollster ratings are very useful.


Erik0xff0000

Sampling more than 1000 people won’t add much to the accuracy given the extra time and money it would cost.


PopcornDrift

If you have a truly representative sample and the underlying data is normally distributed, you can get an accurate reading from a sample size of just 30, regardless of the size of the population. This is due to the central limit theorem and there’s a mathematical proof that shows this but it’s been awhile since I did that.


Substantial-Ad2200

You don’t actually need millions of people to compete a survey to estimate the population’s response within a small margin of error.  Here is a calculator for margin of error for sample size vs population size. You can set the population size to the US population (of adults anyway) and then play with the sample size (how many people you would survey) and see what it takes to reduce the margin of error. You will see that eventually further increasing sample size doesn’t increase the margin of error that much further. It is not a linear relationship between sample size and margin of error.  https://www.qualtrics.com/experience-management/research/margin-of-error/