I am trying to get prediction intervals thanks to bootstrap: I train 1000 linear regressions with different subsets of my training data.

Say I have 1,000,000 rows in my dataset, what would be a good size for the subset? (I have tried with 50,000 and 5,000 rows and I suspect that the less rows you take, the higher will be the variance of the predictions of all the models.)

Also, should I bootstrap on columns (e.g. taking 30 out of 50 features)?


Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Browse other questions tagged or ask your own question.