# Questions tagged [prediction]

Prediction of unknown random quantities, using a statistical model.

1,553
questions

**0**

votes

**0**answers

16 views

### Understanding notation in Bias-Variance decomposition in Elements of Statistical Learning

I'm going through Elements of Statistical Learning and I'm having a bit of trouble understanding this bit of notation from Chapter 2 (this example is (2.27))
$$EPE(x_0) = E_{y_o|x_o}E_T(y_0 - \hat{y}...

**1**

vote

**0**answers

24 views

### How to simulate predicted probabilities

Can you help me out with the following brain twister?
I predict the probability (p) of a sale for each potential customer. On average, p is 0.003. The model mainly gives me p values in the range 0....

**0**

votes

**0**answers

15 views

### Interpreting logit regression stata, predicted values

I am running a logit regression in Stata, and need help interpreting my results.
I want to run the command three times. The first time, Im regressing Y on X1. The second time, I am regressing Y on x1 ...

**0**

votes

**0**answers

10 views

### Why are the inference results of autoregressive model different according to batch size?

I trained the Transformer network and got inference results from the network using batch-size 100. And, the results were [100 x 256 (max decoding length)] fixed-shape integer matrix. When I used batch-...

**1**

vote

**0**answers

106 views

### How to incorporate the uncertainty of the model coefficients in the prediction interval of a multiple linear regression [closed]

The question is a bit similar to question 147242 . I'm dealing with a multiple linear regression model, say:
$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2
$$
and I'm looking for an algebraic equation to ...

**0**

votes

**1**answer

30 views

### Predicting outcomes with categorical predictors

My dataset is formulated in a contingency table. My predictor variables are categorical and my dependent variable is the number of observations observed. How do I predict outcomes and find residuals?
...

**1**

vote

**1**answer

20 views

### Using GLS to fix heteroscedasticity

I have a dataset of global solar irradiance (ghi), diffuse solar irradiance aka solar radiation bouncing of trees, clouds, etc (dhi), and cloud cover. I theorize that I can estimate the dhi given ghi ...

**2**

votes

**2**answers

64 views

### Inference, Prediction, & Model Fit?

I have a background in statistics (for social science), but I am confused about the ways in which Data Science textbooks (in particular, An Introduction to Statistical Learning and Practical ...

**0**

votes

**1**answer

33 views

+50

### Estimate an error bound for an estimate

I have two datasets regarding historical data (say, quarterly revenues for companies over time). The first is the actual results of this data and the other is available estimates for these results ...

**0**

votes

**0**answers

9 views

### Bounded Model Prediction Error

I have a predictive model (not ML based, uses first principles from a science textbook) and I would like to have a confident bound on on the error of the predictions. I am able to collect many ...

**0**

votes

**0**answers

6 views

### Extrapolation of data from a range of data set [closed]

I have absorbance data from the wavelength range 350 nm to 750 nm. I want to extrapolate data for the wavelength range 250 nm to 259 nm.
Could you please suggest me the r-programming code for the ...

**0**

votes

**0**answers

9 views

### Are two machine learning models preferable to one when predicting a time series with several zeros?

I have a data series of positive integers. The majority of values are zero, but occasionally the values can be quite high. The task is to predict the following value given a set of features.
Let's ...

**0**

votes

**0**answers

23 views

### Flatten the curve or End the Pandemic Early? : best way to make a decision?

I am trying to solve this reward/risk problem which would depend on an individual preference. I have two countries with 2 different hospitalization capabilities. Both of them wants to flatten the ...

**0**

votes

**0**answers

18 views

### Deep learning ; LSTM out-of-sample prediction

I am trying to do out-of-sample prediction of housing price index with deep learning LSTM.
I've practiced the code with a sample data (apt_data_sc) splitting it with 70%,30% training and test set (...

**0**

votes

**0**answers

7 views

### Significance test for mean predicted probabilities between two groups

I have survey weighted predicted probabilities from a multivariate Poisson regression, and calculated the adjusted prevalence difference of my outcome between two groups using the predictions. While ...

**0**

votes

**0**answers

17 views

### How to predict probabilities from multinomial models, when levels are coded?

I'm using a discrete choice experiment (DCE), and I've estimated the answers with a nested logit model (using mlogit package of r software), which gives quite good results.
However, I've coded ...

**0**

votes

**0**answers

14 views

### standard error for difference in prediction using R

I want to calculate standard error for difference between two predictions, but have little idea.
Please see the code below:
...

**1**

vote

**1**answer

26 views

### Predicted probabilities seem too low with Gradient Boosting Machine on `iris` data

I'm doing a test run of the Gradient Boosting Machine algorithm on the iris data with the caret package.
...

**0**

votes

**0**answers

16 views

### Quantify the correlation of arbitrarily sized collection of time series

Given a set $N$ of continuous time series of equal lengths, is there a metric for how well they all correlate with each other?
I've considered a metric; average Pearson, $\hat{P}$, given by ...

**1**

vote

**1**answer

42 views

### Estimating prediction error and confidence band

Like a lot of amateurs, I would like to see how well the evolution of Covid-19 is predictable. So I imported the data (here, for Italy) and fitted a logistic curve. Then I added the 90% and 95% ...

**0**

votes

**0**answers

12 views

### Is there a possibility to combine WLS with bootstrapping methodology for prediction purposes in R?

This is my first post, so here I go: I used R to create a bootstrap prediction interval for a one-predictor logarithmic regression model. Here are the steps for the creation of the bootstrap ...

**2**

votes

**1**answer

32 views

### What are the standard errors of the predictions from predict.lm in R?

In R, ?predict says:
If the logical se.fit is TRUE, standard errors of the predictions are calculated.
An example:
...

**2**

votes

**1**answer

28 views

### Adding predictors in ROC curves and how does it affect AUC?

I have a general question about ROC curves and how adding predictors affect AUC values.
Let's say I have a model that contains only predictor A and produces an AUC of 0.6.
I then add into the model a ...

**1**

vote

**1**answer

28 views

### Median of Predictions With Standard Error using Margins command in Stata

I understand that by default, the "margins" command in Stata calculates the predicted value of the dependent variable for each observation, then reports the mean value of the predictions. Is it ...

**0**

votes

**1**answer

24 views

### Zero Denominator in Yule's Q-statistics?

In Concept Drift Adaptation by Exploiting Historical Knowledge
they use Yule's Q-statistics to compute diversity between a collection of predictions.
I think the context is not really important.
I ...

**1**

vote

**0**answers

27 views

### Calculating amount of infected in the background population?

I'm wondering if there's a method to calculate/predict the amount of infected in the population, knowing the R0, amounts of currently tested positive, amount of deaths, sensitivity of tests, ...

**0**

votes

**0**answers

32 views

### CALCULATING DIFFERENCE between percentages derived from two different numbers (help)

I recently came across a graph comparing education levels between foreign born immigrants and US born citizens that was divided into four categories.
Less than highschool,
Highschool graduate,
Some ...

**1**

vote

**1**answer

33 views

### Predicting house prices with machine learning. Problem with time-varying variables

I'm currently trying to cross-sectionally predict house prices using statistical learning methods. I have collected prices from 2009 until 2020. I have loads of time-invariant variables on the ...

**0**

votes

**0**answers

22 views

### Confidence Score of Random Forest Regressor Model

For confidence of any input or data point, we have packages ranger and grf in R as suggested ...

**0**

votes

**1**answer

38 views

### Logistic regression always predicting 1 for my small data set [duplicate]

I have a data set of 8 rows out of which seven rows are predicted and I need to predict the 8th-row data input. I am not getting where my logic is getting wrong, please let me know where I am getting ...

**0**

votes

**2**answers

42 views

### How can I identify the appropriate variables for a prediction model using linear regression?

I need to create a predictive model for the pricing of Airbnb using a linear regression. The data set contains 34 variables and I do not know which of them are suitable. I have already divided the ...

**0**

votes

**0**answers

26 views

### Bachelor Thesis: ML Soccer prediction - what model?

I am starting with ML and am kinda lost.
My Bachelor Thesis revolves around the prediction of soccer games and my mentor thought it would be fun to do that with ML - interesting yes, fun not so much.
...

**0**

votes

**1**answer

54 views

### Why is logisitic regression predicting TRUE values at a much higher rate than in the training data?

I am trying to use logistic regression to make predictions in R. I am confused as to why a model is predicting TRUE for 90% of predictions, when the training data ...

**3**

votes

**1**answer

34 views

### Definition of predictive hazard function?

In a Bayesian context, the posterior predictive probability density function is
$$f_p(t) = \int f(t\mid \theta)\pi(\theta\mid \text{Data})d\theta,$$
where $\pi(\theta\mid \text{Data})$ is the ...

**1**

vote

**0**answers

22 views

### time series prediction model with minute data

I m trying to apply prediction at my data that is taken from the sensor after 15
min
...

**0**

votes

**0**answers

17 views

### Organizing Data for hourly and daily predictions

Let's suppose I'm using SVM (Regression) to predict variable y and I have multiple input variables (x_i) which are data from sensors at intervals of 10 minutes.
From an operational point of view, I ...

**0**

votes

**1**answer

37 views

### Predicting a Markov chain next state using previously predicted states

Suppose we have a Markov chain with two states A and B.
This associated transition matrix is:
\begin{equation}
P_{mc}=
\begin{...

**0**

votes

**0**answers

19 views

### Predicting game outcomes with moving averages of goals scored

I decided to make a sports gambling script so I could quit my job and never work again.
I just started and I read about Poisson distributions (which kind of approximate the chance of X goals getting ...

**0**

votes

**0**answers

20 views

### Why does the RMSE value goes high? [duplicate]

I have been trying to predict the glucose values of patients by using regression algorithms. I used Support Vector Regression (RMSE: 65), Logistic Regression (RMSE: 86), Linear Regression(RMSE: 64) ...

**1**

vote

**2**answers

58 views

### Is there a guide for when to implement time series techniques?

I am interested in getting a better sense as to when to use time series techniques.
Let's say you have a data set with units sold as the response. Your goal is to predict units sold on any given ...

**0**

votes

**0**answers

27 views

### Manually predict lognormal survreg model considering parameters uncertainty

I'm analyzing environmental data using the "NADA" R library, which relies heavily on the "survival" package.
I am dealing with left-censored data, which are nonetheless strictly positive. To deal with ...

**0**

votes

**0**answers

9 views

### Time series model for cryptoprice prediction

I am fairly new to the topic of statistics and data science.
My first dataset consists out of the BTC prices since 2013 per minute.
The second dataset consits out of posts from a social media platform ...

**1**

vote

**0**answers

28 views

### Explanation(s) for unimodal distribution of prediction probability computed by Random Forest

I have a typical binary classification problem with a sample of ~700 instances where I fitted multiple classification models including logistic regression, SVM and Random Forest.
The instances are ...

**0**

votes

**0**answers

26 views

### Low sample size with independent observations

I am looking at sports team level data (summarized by average in each season) over several seasons and would like to predict/classify the winner of the championship. In a single season, the data has ...

**1**

vote

**0**answers

16 views

### Predicting in Coxās time varying proportional hazard model (theoretically as well as using Python)

Are there any ideas on predicting the remaining lifetime (say at time $t_0$) in Coxās time varying proportional hazard model? Im interested in theoretical ideas as well as practical ones for the ...

**0**

votes

**1**answer

55 views

### Predicting probability of non-payment for vehicle loans up to 90 days in advance

So I have an interesting problem that I'm working on. I have a dataset of customers from a bank for car loans. For each customer, I also have their associated payment information including repayment ...

**0**

votes

**0**answers

32 views

### Predicted probabilities very close to 0 and 1 in GLM model

I've added new attributes to the binary GLM model. AUC climbed to 98%, logistic loss decreased to 0.45. Training set has ~50 cases.
I can see that predicted probabilities are extremely close to 0 and ...

**0**

votes

**0**answers

9 views

### Performing classification when the potential options of classes are different for every data row/user

I have a problem that I am trying to solve using ML but am not able to determine techniques, hence asking for advice. Appreciate your urgent response!
I have a dataset of several users. Each user has ...

**0**

votes

**0**answers

18 views

### Deriving variance of prediction error for mean prediction

A regression model yi = a + bxi + ei is given. When a single value of xi0 is observed, the model is yi0 = a + bxi0 + ei0.
The prediction variance for a single out of sample prediction is sigma^2* [ ...

**1**

vote

**1**answer

37 views

### Out of sample prediction

I have a model in which I estimate the impact of price on acreage. My data is composed of 10 years. So I use these 10 years to estimate the model and get to coefficients. In next step, I want to use ...