Section author: Danielle J. Navarro and David R. Foxcroft


Author’s note – I’ve mentioned it before, but I’ll quickly mention it again. This reference list is appallingly incomplete. Please don’t assume that these are the only sources I’ve relied upon. The final version of this book will have a lot more references. And if you see anything clever sounding in this book that doesn’t seem to have a reference, I can absolutely promise you that the idea was someone else’s. This is an introductory textbook: none of the ideas are original. I’ll take responsibility for all the errors, but I can’t take credit for any of the good stuff. Everything smart in this book came from someone else, and they all deserve proper attribution for their excellent work. I just haven’t had the chance to give it to them yet.

Adair, J. G. (1984). The Hawthorne effect: A reconsideration of the methodological artifact. Journal of Applied Psychology, 69(2), 334–345.

Agresti, A. (2018). An Introduction to Categorical Data Analysis (3rd ed.). Wiley.

Agresti, A. (2012). Categorical Data Analysis (3rd ed.). Wiley.

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

Anscombe, F. J. (1973). Graphs in statistical analysis. The American Statistician, 27(1), 17-21.

Bickel, P. J., Hammel, E. A., & O’Connell, J. W. (1975). Sex bias in graduate admissions: Data from berkeley. Science, 187(4175), 398–404.

Box, G. E. P. (1953). Non-normality and tests on variances. Biometrika, 40(3–4), 318–335.

Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799.

Box, J. F. (1987). Guinness, Gosset, Fisher, and small samples. Statistical Science, 2(1), 45–52.

Brown, M. B., & Forsythe, A. B. (1974). Robust tests for equality of variances. Journal of the American Statistical Association, 69(346), 364–367.

Campbell, D. T., & Stanley, J. C. (1963). Experimental and Quasi-Experimental Designs for Research. Houghton Mifflin.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.

Cochran, W. G. (1954). The chi-squared test of goodness of fit. The Annals of Mathematical Statistics, 23(3), 315–345.

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum.

Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press.

Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52–64.

Ellis, P. D. (2010). The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results. Cambridge University Press.

Ellman, M. (2002). Soviet repression statistics: Some comments. Europe-Asia Studies, 54(7), 1151–1172.

Evans, J. S. B. T., Barston, J. L., & Pollard, P. (1983). On the conflict between logic and belief in syllogistic reasoning. Memory & Cognition, 11(3), 295–306.

Everitt, B. S. (1996). Making Sense of Statistics in Psychology. A Second-Level Course. Oxford University Press.

Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4(3), 272–299.

Fisher, R. A. (1922a). On the interpretation of chi-squared from contingency tables, and the calculation of p. Journal of the Royal Statistical Society, 85(1), 87–94.

Fisher, R. A. (1922b). On the mathematical foundation of theoretical statistics. Philosophical Transactions of the Royal Society A, 222, 309–368.

Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver & Boyd.

Forbes, C., Evans, M., Hastings, N., & Peacock, B. (2010). Statistical Distributions (4th ed.). Wiley.

Fox, J., Weisberg, S. (2011). An R Companion to Applied Regression (2nd ed.). Sage.

Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331.

Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460.

Geschwind, N. (1972). Language and the brain. Scientific American, 226(4), 76-83.

Hays, W. L. (1994). Statistics (5th ed.). Harcourt Brace.

Hedges, L. V. (1981). Distribution theory for glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6(2), 107–128.

Hedges, L. V. & Olkin, I. (1985). Statistical Methods for Meta-Analysis. Academic Press.

Hogg, R. V., McKean, J. V., Craig, A. T. (2005). Introduction to Mathematical Statistics (6th ed.). Pearson.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.

Hothersall, D. (2004). History of Psychology. McGraw-Hill.

Hróbjartsson, A., & Gøtzsche, P. C. (2010). Placebo interventions for all clinical conditions. The Cochrane Database of Systematic Reviews, 1, CD003974.

Hsu, J. C. (1996). Multiple Comparisons: Theory and Methods. Chapman and Hall.

Ioannidis, J. P. A. (2005). Why most published research findings are false. CHANCE, 18(4), 40–47.

Jeffreys, H. (1961). The Theory of Probability (3rd ed.). Clarendon Press.

Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48), 19313–19317.

Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80(4), 237–251.

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795.

Keynes, J. M. (1923). A tract on monetary reform. Macmillan and Company.

Kruschke, J. K. (2015). Doing Bayesian data analysis: A tutorial with R and BUGS. Academic Press.

Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583–621.

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825.

Larntz, K. (1978). Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics. Journal of the American Statistical Association, 73(362), 253–263.

Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.

Lehmann, E. L. (2011). Fisher, Neyman, and the Creation of Classical Statistics. Springer.

Levene, H. (1960). Robust tests for equality of variances. In I. Olkin et al. (ed.) Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling (pp. 278–292). Stanford University Press.

McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The case of r and d. Psychological Methods, 11(4), 386–401.

McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157.

Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.

Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50(302), 157–175.

Peterson, C., & Seligman, M. E. (1984). Causal explanations as a risk factor for depression: Theory and evidence. Psychological Review, 91(3), 347–374.

Pfungst, O. (1911). Clever Hans (The horse of Mr. von Osten): A contribution to experimental animal and human psychology. Henry Holt.

Rosenthal, R. (1966). Experimenter effects in behavioral research. New York: Appleton.

Sahai, H. & Ageel, M. I. (2000). The Analysis of Variance: Fixed, Random and Mixed Models. Springer.

Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46(1), 561–584.

Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (Complete samples). Biometrika, 52(3–4), 591–611.

Sokal, R. R., & Rohlf, F. J. (2011). Biometry: The principles and practice of statistics in biological research (4th ed.). W. H. Freeman.

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677–680.

Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900. Harvard University Press.

Student. (1908). The probable error of a mean. Biometrika, 6(1), 1-25.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131.

Welch, B. L. (1947). The generalization of “Student’s” problem when several different population variances are involved. Biometrika, 34(1/2), 28-35.

Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38(3–4), 330–336.

Wilkinson, L. (2006). The grammar of graphics (2nd ed.). Springer.

Yates, F. (1934). Contingency tables involving small numbers and the chi-squared test. Supplement to the Journal of the Royal Statistical Society, 1(2), 217–235.