Resampling for Logistic Regression
A recent question on Reddit asked about using resampling with logistic regression. The responses suggest two ways to do it, one parametric and one non-parametric. I implemented both of them and then invented a third, which is hybrid of the two.
You can read the details of the implementation in the extended version of this article.
Or you can click here to run the Jupyter notebook on Colab
Different ways of computing sampling distributions – and the statistics derived from them, like standard errors and confidence intervals – yield different results. None of them are right or wrong; rather, they are based on different modeling assumptions.
In this example, it is easy to implement multiple models and compare the results. If they were substantially different, we would need to think more carefully about the modeling assumptions they are based on and choose the one we think is the best description of the data-generating process.
But in this example, the differences are small enough that they probably don’t matter in practice. So we are free to choose whichever is easiest to implement, or fastest to compute, or convenient in some other way.
It is a common error to presume that the result of an analytic method is uniquely correct, and that results from computational methods like resampling are approximations to it. Analytic methods are often fast to compute, but they are always based on modeling assumptions and often based on approximations, so they are no more correct than computational methods.