0
$\begingroup$

I am trying to get prediction intervals thanks to bootstrap: I train 1000 linear regressions with different subsets of my training data.

Say I have 1,000,000 rows in my dataset, what would be a good size for the subset? (I have tried with 50,000 and 5,000 rows and I suspect that the less rows you take, the higher will be the variance of the predictions of all the models.)

Also, should I bootstrap on columns (e.g. taking 30 out of 50 features)?

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Browse other questions tagged or ask your own question.