How many observations needed for regression




















Understanding heterogeneity in meta-analysis: the role of meta-regression. International Journal of Clinical Practice. Handbook of meta-analysis in ecology and evolution.

Princeton University Press. View Article Google Scholar 4. Meta-analysis and the science of research synthesis. Meta-regression analysis: a quantitative method of literature surveys. Economic Surveys. View Article Google Scholar 6. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine. Meta-analysis of economics research reporting guidelines. View Article Google Scholar 8. The diversity—disturbance relationship: is it generally strong and peaked?

View Article Google Scholar 9. On the form of species—area relationships in habitat islands and true islands. Global Ecol. View Article Google Scholar Heterogeneity in meta-analysis of data from epidemiologic studies: a commentary.

Button KS. Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience. Ioannidis JP. Why most published research findings are false.

PLOS Medicine. Open Science Collaboration. Estimating the reproducibility of psychological science. What does research reproducibility mean? Science Translational Med. Reproducible research practices and transparency across the biomedical literature. PLOS Biology. Schweizer G, Furley P. Reproducible research in sport and exercise psychology: The role of sample sizes. Sport Exercise. PLOS data availability policy. Experimental design and analysis and their reporting: new guidance for publication in BJP.

Primer of ecological statistics. Sunderland: Sinauer Associates. Optimizing sampling approaches along ecological gradients. Methods Ecol. Harris RJ. A primer of multivariate statistics. New York: Academic Press; Measurement, design, and analysis: An integrated approach. Hillsdale: Erlbaum; Green SB. How many subjects does it take to do a regression analysis? Multivariate Behav. Biometry: The principles and practice of statistics in biological research.

Gill J. The insignificance of null hypothesis significance testing. Political Res. Null hypothesis testing: problems, prevalence, and an alternative. Wildlife Management. Greenland S, Senn SJ. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations.

MacGillivray BH. Risk Analysis. Personality Social Psychol. Smith GD, Ebrahim S. The extent and consequences of p-hacking in science. Empirical evaluation of neutral theory. Champely S. R package version 1. Diallo TM. Statistical power of latent growth curve models to detect quadratic growth. Behavior Res.

Chamberlin TC. The method of multiple working hypotheses. Platt JR. Strong inference. Burnham K, Anderson D. Model selection and multi-model inference: a practical information-theoretic approach. New York: Springer-Verlag; Cohen J. Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum; Zar JH. Biostatistical analysis. Upper Saddle River: Prentice-Hall; Steidl RJ, Thomas L.

Power analysis and experimental design. Design and analysis of ecological experiments. New York: Oxford University Press; Experimental design and data analysis for biologists. Cambridge: Cambridge University Press; Johnson VE. On the use of non-local prior densities in Bayesian hypothesis tests. Missing data has the potential to adversely affect a regression analysis by reducing the total usable sample size.

The best solution to this problem is to try extremely hard to avoid having missing data in the first place. When there are missing values that are impossible or too costly to avoid, one approach is to replace the missing values with plausible estimates, known as imputation. Another easier approach is to consider only models that contain predictors with no or few missing values.

This may be unsatisfactory, however, because even a predictor variable with a large number of missing values can contain useful information. In small datasets, a lack of observations can lead to poorly estimated models with large standard errors. Such models are said to lack statistical power because there is insufficient data to be able to detect significant associations between the response and predictors. So, how much data do we need to conduct a successful regression analysis?

A common rule of thumb is that 10 data observations per predictor variable is a pragmatic lower bound for sample size. However, it is not so much the number of data observations that determines whether a regression model is going to be useful, but rather whether the resulting model satisfies the LINE conditions. In some circumstances, a model applied to fewer than 10 data observations per predictor variable might be perfectly fine if, say, the model fits the data really well and the LINE conditions seem fine , while in other circumstances a model applied to a few hundred data points per predictor variable might be pretty poor if, say, the model fits the data badly and one or more conditions are seriously violated.

However, it is difficult to say exactly how much data would be needed. It is possible that we could adequately model interaction with a relatively small number of observations if the interaction effect was pronounced and there was little statistical error. Conversely, in datasets with only weak interaction effects and relatively large statistical error, it might take a much larger number of observations to have a satisfactory model. In practice, we have methods for assessing the LINE conditions, so it is possible to consider whether an interaction model approximately satisfies the assumptions on a case-by-case basis.

From a different perspective, if we are designing a study and need to know how much data to collect, then we need to get into sample size and power calculations, which rapidly become quite complex. Frank Harrell is an active member of this site. So i hope this question gets his attention and we can then hear directly from him. One scenario would be that all your parameters have about the same estimated magnitude of effect, but their uncertainties vary so that some are significant and others are not.

You definitely don't want to conclude in this case that "variables A and B are important, variables C, D, and E are not". The CIs will give you this information. Show 2 more comments. Michael R.

Chernick Michael R. Chernick I added my correlation matrix. Do you think with this correlation matrix doing regression is reasonable? Just I want to find any possible relation between independent variables and dependent variable. The estimates will probably have large variance and so statitical significance should not be the focus.

Ypu could look at regression diagnostics for collinearity. That might help. But I would recommend looking at a variety of subset models to see how the fit changes and which combinations of variables seem to do well and do poorly. I really think bootstrapping the data will show you something about the stability of the choice of predictors. I think you just want to see if there is one or two variables that seem to stand head a shoulders above the rest.

But you may nit find anything. Since there is some correlation between these predictors, presumably their estimated coefficients are "worth" less than 1 degree of freedom. And what about, say, regression splines or other local regression: do we have to account for the fact that only a subset of observations is used in construction of the components?

And if we use a kernel to apply weights to predictors, does that affect the effective number of observations used? Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. Featured on Meta.

Now live: A fully responsive profile. Version labels for answers.



0コメント

  • 1000 / 1000