# Questions tagged [boosting]

A family of algorithms combining weakly predictive models into a strongly predictive model. The most common approach is called gradient boosting, and the most commonly used weak models are classification/regression trees.

762
questions

**0**

votes

**1**answer

11 views

### How do you interpret your features when you standardize your data?

Let's say I have built a boosting tree or neural network and I standardized my features beforehand. When I built my model, I split my data into training, validation, and test sets - each with their ...

**0**

votes

**0**answers

14 views

### xgboost classifier returning different proba and predict(with output_margin=True) [closed]

It's my understanding that for an XGBoost classifier with objective='multi:softprob', the output of ...

**0**

votes

**0**answers

13 views

### Regression with zero-inflated outcome

I am trying to fit and tune a Regression gradient boosting model where my target variable is zero inflated (80% zero) and the rest of the values are distributed as positive and negative values (not ...

**0**

votes

**0**answers

8 views

### Gradient Boosting vs Forward Stagewise Additive Model

Given that the famous Adaboost and Gradient Boosting are both some kind of approximation to Forward Stagewise Additive Modeling, why not directly fit a model using Forward Stagewise Additive Model? On ...

**0**

votes

**0**answers

10 views

### Why gradient boosting use first-order Taylor expansion approximation?

The target of boosting at step $m$ is (see Wikipedia):
$$F_{m}(x)=F_{m-1}(x)+\underset{h_{m} \in \mathcal{H}}{\arg \min }\left[\sum_{i=1}^{n} L\left(y_{i}, F_{m-1}\left(x_{i}\right)+h_{m}\left(x_{i}\...

**1**

vote

**1**answer

26 views

### Predicted probabilities seem too low with Gradient Boosting Machine on `iris` data

I'm doing a test run of the Gradient Boosting Machine algorithm on the iris data with the caret package.
...

**0**

votes

**0**answers

8 views

### Combine CatBoost with deep learning classifier

I'm using CatBoost to solve a binary classification problem.
Most of my features are binary, but the order of features does matter.
I've come up with a Recurrent Neural Network that has similar ...

**0**

votes

**0**answers

14 views

### Spark Gradient Boosted Tree give predicted probability wildly different from actual probability

In your experience of using GBT (of Spark or general) for binary classification, have you encountered the predicted probability very different from the actual probability ?
Train and test have same ...

**0**

votes

**0**answers

5 views

### Splits in Decision Trees vs Dendrograms

gradient boosting is a supervised learning algorithm that splits/grows decision trees to improve predictions iteratively.
hierarchical clustering is an unsupervised learning algorithm that splits/...

**1**

vote

**0**answers

22 views

### Question for [Element of statistical Learning ] Page 357 [closed]

Here is the book link http://web.stanford.edu/~hastie/Papers/ESLII.pdf
I am very confused about the statement here:
I am familiar with CART and gradient boosting machine but I have no idea what we ...

**0**

votes

**0**answers

17 views

### How to deal with unbalanced time series data for machine learning?

My understanding when it comes to unbalanced datasets is that we can randomly sample from the dominant class.
What are some ways to deal with unbalanced data when we have time series data and the ...

**2**

votes

**1**answer

37 views

### How does LightGBM deals with incremental learning (and concept drift)?

With some research I found that it updates the leaves (does not create new or remove old ones) is it right? How this happens?
Another question is when the incremental learning is done in concept ...

**1**

vote

**1**answer

37 views

### Steps in gradient boosting algorithm

Can some one please explain the 3rd step 2(c) in the below gradient boosting algorithm. I was under the impression, that the 2(c) computation is nothing but the mean of the corresponding terminal node ...

**0**

votes

**0**answers

18 views

### How the first tree in gradient boosting classifier is constructed and the split criteria [duplicate]

I am aware how GB classifiers are constructed as regression trees and predictions are made, but not sure how the initial tree and node splitting for it is done.
Can someone please explain how the ...

**2**

votes

**1**answer

30 views

### Performance drops when adding a feature using XGBoost

I did some feature engineering with my data set. When I added on of the new features, the performance significantly dropped. How is this possible? I thought XGBoost is robust to irrelevant variables.

**2**

votes

**0**answers

27 views

### Low OOB error but high CV error with MABoost

I am using Mirror Ascent Boosting (R package maboost) to learn a 3-class predictor over a set of 123 patients (very small , I know). Classes are almost balanced. I am getting excellent OOB errors (...

**1**

vote

**1**answer

38 views

### int vs Float in regression modeling

This is general question to understand a concept.
I have a dataframe with all columns having float values(precision varies from 2 to 8 digits).
I use GBM to train my model. When i train my model ...

**0**

votes

**0**answers

18 views

### How to compare feature selection regression-based algorithm with tree-based algorithms?

I'm trying to compare which feature selection model is more eficiente for a specific domain. Nowadays the state of the art in this domain (GWAS) is regression-based algorithms (LR, LMM, SAIGE, etc), ...

**1**

vote

**1**answer

52 views

### catboost does not overfit - how is that possible?

I'm fitting and evaluating a CatBoostRegressor and a XGBRegressor to the same regression problem. I tried matching their ...

**0**

votes

**0**answers

16 views

### Termination Condition for AdaBoost.R2

I can't quite wrap my head around the termination condition of AdaBoost.R2 as defined by Drucker in this paper. On page 2 of the paper he states to "repeat the following while the average loss* $\bar{...

**1**

vote

**0**answers

32 views

### Decision tree- Alternative model to predict this data?

My data looks something this (for example):
...

**0**

votes

**1**answer

53 views

### What is minimized/optimized when we use AdaBoost

When I learned about CART, we learned that at each split, we try to minimize some measure (usually Gini index) of the split. That is, we determine the predictor and threshold that decreases the Gini ...

**1**

vote

**1**answer

68 views

### How do you interpret prediction output in GBM() in R for classification problem?

I created a model using the gbm() function in library(gbm). Within the gbm() function, I set the distribution as "adaboost". I have a binary response [0, 1]. I used the predict.gbm function for ...

**0**

votes

**0**answers

16 views

### how to increase the accuracy and reduce the overfitting in xgboost [duplicate]

I am doing multi-classification problem , I got 95% accuracy in validation data set and test data set i got 25% accuracy , when I submitting my predictions I got 75% score,please help me how to fix ...

**2**

votes

**1**answer

53 views

### Combining XGBoost and LightGBM

I'm working on a text classification problem and I am comparing LightGBM and XGBoost performances. Both on train and test sets I get roughly the same accuracy metrics, but what looks amusing to me is ...

**0**

votes

**0**answers

19 views

**0**

votes

**0**answers

28 views

### using the test population as an eval_set when doing hyperparameter optimization

I'm looking at this guide for hyperparameters optimization of boosting regressors using hyperopt.
I noticed that for each trial, it uses the following code for the ...

**1**

vote

**1**answer

41 views

### What is the difference between Gini index and Gini coefficient?

I am building a decision tree from scratch. I have been using entropy so far (calculated this way):
...

**1**

vote

**0**answers

24 views

### āJumpingā among several interpolation techniques?

I am comparing several interpolation methods using monthly climatic data, through RMSE and a 10-fold cross-validation scheme.
What I'm observing is that the performances vary from one month to ...

**1**

vote

**1**answer

33 views

### The accumulative tree structure in a tree based gradient boosting

I'm playing with gradient boosting methods and with its python packages out there. I tried lightgbm, started with a high-dimensional input to predict a task. A left ...

**0**

votes

**0**answers

19 views

### what's the split criteria used by catboost?

I'm trying to understand the split criteria used by catboost in the "plain" boosting mode (not interested in the "ordered" mode complication).
In "algorithm 2 - Building a tree" they are saying that ...

**0**

votes

**0**answers

14 views

### required sample size for establishing equivalence of a gradient boosting model on different population

I have a trained Gradient boosting Trees (regression) model with a given R2 metric (obtained via cross validation)
Now I want to verify that the same model is valid for a very different population.
Is ...

**3**

votes

**1**answer

76 views

### Spelling out a detail in the gradient boosting machine algorithm for binary classification

This is a very long question, but perhaps people who are trying to deeply understand the Gradient Boosting Machine algorithm will think it's interesting.
I've been working on understanding the ...

**0**

votes

**0**answers

23 views

### Which is the best classification Algorithm to be used for finding the “second best class”?

I have a dataframe containing skillsets of players in different positions. I can build a classification problem for predicting the position of player based on the skillsets.
However, the problem ...

**0**

votes

**0**answers

23 views

### Estimate distribution from mean and prediction intervals

I'm using an ML-model (gradient boosting) to predict mean, upper and lower quantiles of a target variable which is gamma distributed.
I want to construct distributions for the predictions and figured ...

**1**

vote

**1**answer

39 views

### Comparison of regression models in terms of the importance of variables

I would like to compare models (multiple regression, LASSO, Ridge, GBM) in terms of the importance of variables. But I'm not sure if the procedure is correct, because the values āāobtained are not on ...

**3**

votes

**1**answer

53 views

### how to avoid overfitting in XGBoost model

I try to classify data from a dataset of 35K data point and 12 features
Firstly i have divided the data into train and test data for cross validation
After cross validation i have built a XGBoost ...

**1**

vote

**0**answers

38 views

### Gradient boosting (GB) splitting methods (categorical features)

Regarding categorical features - ordinary trees treat categorical features in two main ways, CART - considers only binary splitting, those computing the mean response value (y_mean_i per each category ...

**0**

votes

**0**answers

25 views

### how does using decision stump lead to an additive model?

In chapter 8 of ISLR it says boosting using stumps leads to an additive model. How would I derive $$f(X) = \sum^p_{j=1} f_j(X_j)$$ from $$\hat{f}(x) = \sum^B_{b=1} \lambda \hat{f}^b(x)$$?

**0**

votes

**0**answers

39 views

### Why is the step length by default equal to 1 in gradient boosting?

On ESL p.359, it explains steepest descent:
But in 10.37, it is trying to minimize the distance to g_im. It looks like the default step length is 1. Why is it so?

**0**

votes

**0**answers

89 views

### Tuning threshold from multiclass ROC for Gradient Boosting Classifier?

I have created a ROC curve based on the output of a multiclass Gradient Boosting Classifier (See Figure below implemented from Yellowbrick ROCAUC: http://www.scikit-yb.org/en/latest/api/classifier/...

**0**

votes

**0**answers

12 views

### What does “a distribution is consistent with a hypothesis class” mean?

What does "a distribution is consistent with a hypothesis class" mean?
I came across the following statement in this pdf
To see this, first note that for every
distribution $P$ consistent with $...

**3**

votes

**0**answers

24 views

### When should one use Bradley-Terry instead of gradient boosted trees for pairwise ranking

Both the Bradley-Terry model and Gradient boosted trees can be used to learn a ranking from pairwise comparisons (e.g. with libraries choix and XGboost).
How do they relate to each other? Is there ...

**3**

votes

**1**answer

76 views

### XGBOOST objective function derivation algebra

I need some help please with the derivation of xgboost objective function. I am following this online tutorial (Math behind GBM and XGBoost)
How do you go from here
$$
loss = \sum_{i=1}^{n} \left( ...

**2**

votes

**2**answers

161 views

### Overfitting in extreme gradient boosting

My situation is:
36,197 observations/ 125 outcomes in training data
26 predictors
A relatively successful prediction model has been built in a similar dataset using just logistic regression; I ...

**2**

votes

**1**answer

31 views

### Calculate minimum accuracy for a boosting algorithm

Suppose, you are working on a binary classification problem. And there are 3 models each with 70% accuracy. If you want to ensemble these models using majority voting. What will be the minimum ...

**2**

votes

**1**answer

53 views

### weak learning of 3-piece classifiers using decision stumps

I have a question about Example 10.1 in Shalev-Shwartz and Ben-David's "Understanding Machine Learning." The example means to illustrate weak learning of 3-piece classifiers $\mathcal H$ using ...

**4**

votes

**0**answers

58 views

### How to explain random forest ML algorithm doesn't learn at all, while logistic regression learns very well?

My prediction task is as follows:
Use name to predict people's ethnicity (into 4 categories: "English", "French", "Chinese", and "All others") as a multiclass classification problem. The name ...

**1**

vote

**0**answers

53 views

### Calculate Gini Importance for Boosting Trees

From my understanding, Gini Importance means Mean Decrease in MSE for regression objectives, and Mean Decrease in Impurity for classification objectives. Typical random forest packages like ...

**0**

votes

**0**answers

10 views

### Whats a good estimation for error measuremets when trying to predict values inside two bands?

I am using gradient boosting to predict two quantiles (upper and lower). The predicted value can be above, below, or in bounds. The problem I am facing is that counting the number of values in bound ...