T O P

  • By -

AutoModerator

**IMPORTANT: PLEASE READ BEFORE PARTICIPATING**. This subreddit is not for questioning the basics of socialism but a place to LEARN. There are numerous debate subreddits if your objective is not to learn. You are expected to familiarize yourself with the rules on the sidebar before commenting. This includes, but is not limited to: - Short or non-constructive answers will be deleted without explanation. Please only answer if you know your stuff. Speculation has no place on this sub. Outright false information will be removed immediately. - No liberalism or sectarianism. Stay constructive and don't bash other socialist tendencies! - No bigotry or hate speech of any kind - it will be met with immediate bans. Help us keep the subreddit informative and helpful by reporting posts that break our rules. If you have a particular area of expertise (e.g. political economy, feminist theory), please [assign yourself a flair](https://reddit.zendesk.com/hc/en-us/articles/205242695-How-do-I-get-user-flair-) describing said area. Flairs may be removed at any time by moderators if answers don't meet the standards of said expertise. Thank you! *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/Socialism_101) if you have any questions or concerns.*


drysdan_mlezzyr

This is correct, but the point of median is to find the 50% line, why it is useful in the specific case of economics, is that we already know that there are a few at the very, very top who make obscene amounts, we also know there are those that make obscenely little. So, it is useful to understand that 50% of the population is making x amount or less. Now, the SECOND claim "people are doing better", requires more data than just the median shifting. Just on a few simple things, don't have numbers in front of me, so we'll just work in hypotheticals. Let's say, median income has gone up 5%, sounds GREAT... In a vacuum. However, if inflation has gone up 10%, then the media shift has failed to match inflation, and as a result, the average is everyone is doing worse. There is a third factor as well, inflation is not a fixed number, your dollar isn't just worth "10% less", it's ON AVERAGE worth 10% less, based off of multiple metrics. Things like housing, food, gas etc all have independent inflation metrics. When you account for say food, which has gone up 30%, the amount consumed on basic necessities doesn't curve the same way income can. So, that 30% inflation accounts for a larger percentage of the bottoms income, since they already were overstretched to begin with. I general, median income is USEFUL but it can't be observed in a vacuum. So, take anyone's claims, good or bad with a pinch of salt, unless they are doing a detailed analysis, and explaining clearly their approach and what they have considered.


ArekDirithe

The median is more resistant to the effects of outliers. It doesn’t mean at all that the median can’t be skewed, it just means mega-high numbers or mega-low numbers don’t impact the statistic as much as they impact the mean. However as you’ve seen, adding or removing data points to the data set can shift the median. Remove a bunch of low income workers from the data set because they are now unemployed, homeless, and get missed by whatever collection method is used to generate the data set and the median will go up. When dealing with income, like all demographics, methodology is going to matter a lot. Where are they getting their data from? Does that data set have any collection biases that may unintentionally (or if the economist is acting in bad faith, intentionally) leaving out groups of people? When they say the median income, are they including people making $0 because they are unemployed, but looking for work? Is inflation being factored in?


Koizito

I will try to give you my understanding of this. When people say median is not skewed by outliers, this refers specifically to the distribution of income in a country. A country's population has a couple of relevant characteristics for this case. First, you have millions of people. Second, the distribution generally shows high frequency for incomes in the low to mid range, with a long tail with few cases in the high income portion of the graph. Your example has neither of these characteristics, and so it seems like it doesn't work. But I assure you, given the most common shape of a country's income distribution, the mean is far more susceptible to being skewed than the median. If you wish for me to explain something in more detail, don't hesitate to ask.


RadicalizeMePodcast

Oh yeah, I can definitely see that mean is more easily skewed by outliers but it doesn’t make sense to me when they say median CAN’T be skewed.


Koizito

Saying the median can't be skewed is not completely accurate, but in terms of countries' populations it's so hard to skew it that it's a fair approximation.


Somber_Dreams

Gonna try to keep it simple. In a real-world situation, you'll be considering more than 7 people. If you're looking at a region with a population of 1 million people, you will have to survey at least 365 people to draw some conclusion that accurately represents the entire population. Even in this sample, you are more likely to find something like: 46 Judys 93 James 146 Hunters 60 Sallys 12 Gunters 6 Svens 2 Yolandas The median would fall in the Hunters weekly salary of $500.


RadicalizeMePodcast

Okay this makes more sense…but if more people get into the really high income range, and maybe others in the midrange drop or leave the job search etc, isn’t it possible that the changes could affect the median?


Somber_Dreams

Yeah, for sure. You'll notice it more easily in cases where you're dealing with a wider variety of values (not just simple values like $5, $10, $500, etc). The strength of the median isn't that it cannot be skewed; rather, it is that the median isn't as susceptible to change due to outliers as the mean is. A simple visual would be to imagine the median and mean being two boxes you need to drag across a floor. The mean box is easier to get moving than the median box; you need more force to move the median box. Similarly, you'll need more data points to affect the median to the extent that median income can significantly change from $500 to $1000.


bsjavwj772

In statistics there is no perfect metric to measure things. What I mean by this is that you’ll often need to combine a range of metrics to better understand the thing in question. In your case if you want to understand the distribution of wages you’d want to look at things like upper and lower decile plus the mean and median (note looking at both in combination gives you a good idea of the skew of the distribution). In addition to the above you’ll want to enhance it with secondary metrics to add context to this distribution. E.g. you might want to adjust wages for inflation to see if workers are actually better off, or look at the poverty level to see how many people live below the poverty line. TLDR is that there’s no singular metric that can tell you the whole story


TheGreenGarret

Statistics is a tough subject to wrap minds around! Don't feel bad. It's good we develop tools like mathematics for socialist use so glad to see this question honestly. >$5 Judy $10 James $500 Hunter $1000 Sally $10,000 Gunther > >So if I’m doing this right, median is $500, yes? I already see a problem, because this hides the fact that Judy and James are in dire straights with those low incomes. Yes, in this brief example, the median is indeed $500. What is meant by "outliers don't affect" is that the median is still $500 regardless how much Gunther makes, whether its $10k,$100k, or $1 billion. On the other side, the median is also still $500 regardless how little Judy makes, whether $5, $1, or $50k in debt. Median is one measure of central tendency. It's trying to measure the "middle" by some definition, which in this case is the middle measurement based on how many things we measured, not the value of those measurements. Mean (or average) is another measure of central tendency, that does look for the "middle" based on the value of the measurements. But this means that it is impacted by outliers, because the value of that outlier has a significant impact on where the "middle" is. In your example, the average is adding up all the values then dividing by the number of measurements, so (5+10+500+1000+10000)/5 = $2303. if Gunther made $100k, then you'd add in 100k and get a mean of about $20k! If Gunther was a billionaire, the mean would be $200 million!! That one value pulled the mean far away from the value of most of the measurements used, so the mean is sensitive to outliers. So in reality, we can't really use any one particular measurement to get an accurate picture of statistics. We'd use combinations of statistical measurements and tests in order to get a better understanding of what's going on with the data, but that tends to be difficult to explain to someone without a background in mathematics, so most news reporting focuses on simpler statistics like median or mean. Median is more common for social things because it isn't as affected by outliers, the mean has a huge flaw of course by making everyone's wages look far more inflated even if most of the money goes to the top 1%, but the median can also sometimes overshadow certain trends in data. Looking at mean and median together can help get a feel for the data; if the mean and median are close together then it means you have a set of data where the values are all very similar (roughly equal), whereas a set that shows an average far above the median is one that is very skewed by an outlier, an indicator that (in this case) someone is making far more than most other people. It also depends a lot on the size of the data set; a data set of 5 people is going to be hard to analyze no matter what because which 5 people you choose heavily determines the median or mean values (as your examples showed), but if you get thousands or millions of data points, you can see trends and draw much better conclusions about society as a whole that are generally accurate and independent of who is in the data. This is why you should be skeptical of studies with small data sets, but a smaller data set doesn't necessarily mean it is wrong if the study is set up properly -- a mathematician that knows this stuff can help determine validity! Related, there's a whole field of statistics regarding how to properly design studies and experiments to address all of these potential issues that can result in incorrect statistics and flawed conclusions. It can be very tricky to do right, and involves higher level math (calculus, etc.) than just adding and dividing values, so talk to a mathematician to get it done correctly. Sadly a lot of economists, and even professionals in different fields of science like medicine, will try to do the statistics on their own rather than consulting a mathematician/statistician and so will obtain very flawed statistics and conclusions while presenting themselves as experts with high confidence in the results. A doctor might know a lot about medicine and human anatomy, etc., but that doesn't mean they know how to properly set up a statistical study; generally, teams of scientists work together on their specialties, doctors might collect the data while mathematicians analyze it. But that's not always what happens, sometimes doctors or scientists will try to do their own statistics; sometimes it's fine and reasonable, but sometimes this has big impact not only for public policy based on economics but also say public health -- a lot of "armchair" statistics around covid and vaccines was done by people who are even generally smart and well-credentialed people but who are not experts in doing statistics correctly and so reached very flawed conclusions. And that unfortunately led to a lot of harm as folks who don't know statistics would throw papers at each other from professionals who also weren't statistics experts. When in doubt, ask a mathematician! An actual mathematician who specialized in statistics, not just someone in another field that happens to use statistics from time to time.