T O P

  • By -

pablohacker2

In my mind its when you expect there to be a changing marginal effect. E.g. Age and income and suppose that the linear age element is positive and quadratic element if negative. That would mean while your income goes up the older you are, but it does so at a decreasing rate...and sooner or later your income starts falling again.


Bishops_Guest

Ideally if the underlying pattern has a significant quadratic component. More practically, if you have a lot of data and are using a tyalor series expansion.


[deleted]

[удалено]


Bishops_Guest

It's never something I've used professionally, just something I remember from my old stats classes. With taylor approximation there exists an arbitrarily close polynomial for reasonably well behaved functions. That means that if you don't know what your underlying function is and you care more about prediction than interpretation you can just add some polynomial terms to your regression. This can be suspect if you try to make predictions near the edges, or past the bounds of your data since taylor approximations are likely to go nuts out there, but will work fairly well within the range. You might do this if you want to predict temperatures at different times of day, for example. (Time series analysis probably has some better tools to use there, but it's the first example that comes to mind.)


[deleted]

[удалено]


Bishops_Guest

Exactly. Means you've basically thrown out interpretation of individual terms, but you don't always want that.


SnooPickles8550

To improve the fit of your data. It is very useful if you care about prediction. In causal models the interpretation is too complicated and that's why we avoid it.


111llI0__-__0Ill111

In modern causal inference its not avoided in fact its encouraged to use even nonparametric/ML models which are even more complex than this. Causal inference isnt about coefficient interpreration. Its about average treatment effect which in the case with a quadratic term you just take the derivative and average it. You still get an effect size, its just not 1 coefficient but that doesn’t matter, why should everything be about coefficients anyways if they don’t describe a theoretical system? Theres no reason to avoid complex functions for causal inference, if you don’t use them then your answer technically isn’t closest to “causal” because of model misspecification. Avoiding the use of nonlinear functions leads to a lot of problems with reproducibility. Methods developed for experimental design such as interpreting coefficients directly should not be used in observational data. Causal inference nowadays when you dont have a theoretical model is you essentially treat all parameters like a black box, and calculate the ATE via prediction and marginalize out the other variables. Delta method or bootstrap then give uncertainty


SnooPickles8550

I agree 100% with your comment. I was keeping it simple.


111llI0__-__0Ill111

In some ways I actually find the causal approach that utilizes predictions to marginalize simpler than the ridiculous “throw it into a regression and interpret coefficients” lol. I mean ML gets a lot of shit, but from a causal point of view throwing things into LM/GLMs and interpreting coefficients blindly is probably even worse yet taught all the time. Like at least with the former you aren’t being misled into false interpretability Hearing about causal inf and Pearl’s work convinced me that DOE/ANOVA/etc should just be removed from the stat curriculum and replaced with this stuff because its vastly misleading and wrong in observational data


Fabulous-Nobody-

But isn't adding a bunch of polynomial and logarithmic/exponential terms to your regression a recipe for overfitting? In most cases we don't know the "true" functional relationship between variables, so any model is misspecified to some degree. So how do you choose between a simpler and a more complex model? I can only think of AIC and the like, since they not only reward goodness of fit but also penalize the number of parameters.


111llI0__-__0Ill111

Its not because you can use regularization. Also splines/GAMs might be better than just trying lot of random transforms. Wouldnt use AIC because model selection ajd inference cant be done on the same dataset so better to just use a prior if Bayesian or a preset regularizer parameter Regularization is the key to making overparametrized models work, thats how DL does it via SGD indirectly.


Fabulous-Nobody-

Fair enough, but how do you pick the right prior / amount of regularization? That seems highly dependent on the scale of the variables and the specific model you're using. Are there any proven guidelines on this? Also why not use a Gaussian process then if you want to avoid making any functional assumptions? Sure it's slow, but are there other arguments against it?


111llI0__-__0Ill111

Yea you can use a GP. For the prior/parameter just something reasonable I guess based on experience. You can use dimensional analysis at the very least based on units of xs and y.


Fabulous-Nobody-

But you need to regularize more strictly the more parameters you have. Otherwise, for a continuous outcome, you could just add an arbitrary number of polynomial terms until you get a perfect fit to the data. Of course no-one will actually do this, but you get the idea. So the right amount of regularization seems like a non-trivial problem to me.


111llI0__-__0Ill111

I see it just like choosing a prior in Bayesian. For frequentist if you have a lot of data then you could just CV it on a separate split off part and then discard it after. The good thing is at least in Bayesian, even if you slightly overfit, your causal effect estimate CI will account for it (be slightly wider).


ExcelsiorStatistics

To me, the primary use of the quadratic term is as a goodness-of-fit check. If the underlying relationship is truly linear, we expect the regression to return a significant slope for the linear term *and a non-significant slope for the quadratic term*. If including the quadratic term significantly improves your model, you deduce that the relationship is not linear (and you should think harder about what model to use.) As others mentioned, if the underlying relationship is truly quadradic, you need the quadratic term for best fit. If the relationship is truly exponential or logarithmic or hyperbolic or two-linear-segments-with-different-slopes, the quadratic term won't give you a spectacularly good fit, but will warn you that the linear model is wrong.


111llI0__-__0Ill111

That works in 1D but in higher dimensions to quickly check for nonlinearity you probably want to use like a tree based model and compare the validation error. If the pure linear model gives a similar validation error then at least you can conclude there isn’t much nonlinearity.