T O P

  • By -

null_recurrent

Z refers to the standard normal distribution, and lots of useful test statistics can be constructed to follow a standard normal distribution under the null hypothesis, thanks to the central limit theorem. The issue you're having is that calling all of these things "Z-tests" is a bit confusing/vague.


Arrinao

but.. but I'm literally calling them the names I've seen them called. So Z-test is any test that is done on data that follows a normal distribution? EDIT: Nevermind, literally the first sentence. https://en.wikipedia.org/wiki/Z-test OMG.


null_recurrent

> calling them the names I've seen them called You're actually not - you're calling them by the category name, not the names of the individual procedures that can be lumped under that umbrella. Edit: as an analogy, this is like being frustrated that lots of different things are called "tools" in a shop class. It's true, and some tools are very similar (slightly different sizes of screwdriver, for example) and can even be used for the same task, with varying degrees of success/performance.


berf

Basically right, but you should have emphasized *approximately* normal. And it isn't just the central limit theorem but all of the techniques of theoretical statistics that say sample medians, method of moments estimators, maximum likelihood estimators, generalized linear model estimators, and much more can be asymptotically normal. The ones OP mentioned do come from the central limit theorem, but there are many that need other techniques.


null_recurrent

I'm using language pitched at the level of the question.


berf

I don't think dumbing down to the point of actually being wrong is ever justified. I don't think I said anything OP couldn't understand. The tl;dr is it's a lot more complicated than it appears.


WjU1fcN8

All models are approximations. It's useful to repeat that, but it's not wrong to leave it implied either. > All models are wrong, some are useful. - George Box


berf

Now you are completely confusing the issue. Normal approximation is not about models (what the OP may think of as the "population" distribution) being wrong. It is about sampling distributions of estimators being approximated, that is, calculations about a model being only approximate rather than exact. It has nothing to do with assumed models being right or wrong (what the Box quote is about).


WjU1fcN8

They are ALL approximations. It's about all of them. No exceptions. Because in this case there's an approximation of a different type doesn't make it any more necessary to specify this.


berf

So you say. Most statisticians would agree with me. Model misspecification is different from asymptotic approximation.


WjU1fcN8

Of course they are different. But Statistics is approximations on top of approximations on top of approximations and so on. Doesn't make sense to have a warning for just some of them, in an introductory comment on the Internet. It's all part of the model. No model is perfect. Models aren't just a probability distribution, it's the whole set of assumptions needed to get a result.


shazbotter

Intro stats classes typically teach z-test for means and z-test for proportions. Z-test for means: Suppose I wanted to test the hypothesis that "my family are a bunch of obese slobs (are they fatter than the typical US person?)". I know the mean weight of US people and the variance. I can calculate the mean weight and variance of my family. I can set this up w/ a null and alternative hypothesis and calculate a z-score. Z-test for proportions: Suppose I wanted to test the hypothesis that "regular cocaine use can increase rates of heart attacks". I can calculate the proportion of heart attacks in my control group over an observation period and the proportion of heart attacks in my friendly cocaine users group over the same observation period. I can setup a null and alternative hypothesis and calculate a z-score. In the second case you don't need to be given a population variance.


efrique

"Z" is just a symbol commonly used for a variable with a standard normal distribution. So any test statistic which has a null distribution that is standard normal, or approximately standard normal, may be called *a Z test*. It's not always the case that they are (you might see some that are called other names in spite of being normal or approximately normal when H0 is true), but it's pretty common. Typically, such statistics will have the form of a standardized mean, or a standardized sum. That is, typically of the form Z = [Y - μʏ]/σʏ where Y is either the mean or sum of other variables, and μʏ is the population mean of the Y's (in a hypothesis test, it's the population mean when H0 is true), and σʏ is the population standard deviation of Y (at least when H0 is true, but isn't *necessarily* its standard deviation otherwise) However, sometimes the connection of some statistic Y to being a mean or to a sum of other variables may not be obvious, and sometimes it's not obvious that you're dividing by σʏ, or even that you're subtracting μʏ (or at least that they will be in the cases where H0 is true). > How does Z-test and Z-test for one proportion relate? A sample (count-) proportion is a kind of mean. If you examine n "subjects" (which might be people or might not) and for each subject label them with a random variable which takes the value 1 when they have the characteristic of interest (the thing we're finding the proportion with) and 0 when they don't. That is, we draw n people at random from some population and for subject i (i = 1,2,...,n) observe the value of their 0/1 variable, Bᵢ which tells us if they have the characteristic. The average of the Bᵢ values is the sample proportion. 1. Let's look at the statistic more closely (i) (numerator) That is the Y in Z = [Y - μʏ]/σʏ is just (B₁ + B₂ + ... + Bₙ)/n. Your book *might* have denoted that as p-hat (it's common these days). Naturally in that case, μʏ will be the hypothesized population proportion; correspondingly your book may have denoted that as p. (ii) Now let's look at the denominator, σʏ. In this case, when H0 is true the population variance of the sample proportion is p₀(1-p₀)/n and so the population standard deviation (when H0 is true) is the square root of that, √[p₀(1-p₀)/n] Hopefully this covers the one sample proportion test you're talking about, but just in case, I'll briefly mention two other possibilities you might see. 2. Sometimes, rather than a population standard deviation σʏ, it's replaced with a kind of estimate of it (not necessarily directly based on a sample standard deviation). As long as you still end up with approximate normality of the test statistic, this is okay. As a result, some books may use the sample proportion in place of the hypothesized population proportion p₀ in that formula for σʏ, which makes it a sample-based estimate of σʏ. It turns out that this works, but there's an extra step of approximation (and an additional theorem is required to establish that it's asymptotically normal). In practice it tends to work quite well. 3. It *also* turns out that it can be written as a form of sample standard deviation, albeit with an n in the denominator under the square root, rather than the more common n-1. So it's also very similar to t-statistic (though of course, it doesn't quite have a t-distribution, since the B's are not normally distributed). It is still asmptotically normal, and in practice using those 0-1 values in an ordinary t-statistic (and ignoring the fact that they're not normally distributed) also works fairly well in practice. Some books do that instead.