I am trying to get prediction intervals thanks to bootstrap: I train 1000 linear regressions with different subsets of my training data.
Say I have 1,000,000 rows in my dataset, what would be a good size for the subset? (I have tried with 50,000 and 5,000 rows and I suspect that the less rows you take, the higher will be the variance of the predictions of all the models.)
Also, should I bootstrap on columns (e.g. taking 30 out of 50 features)?