# Questions tagged [modeling]

This tag describes the process of creating a statistical or machine learning model. Always add a more specific tag.

1,993
questions

**0**

votes

**1**answer

5 views

### Is there any formal guideline that would indicate the necessity for adjustment for baseline when analysing change from baseline?

As in the topic. I saw many critical discussions along with mathematical explanation on why the baseline shouldn't (or mustn't) be employed as a covariate, when analysing the change from baseline. ...

**0**

votes

**0**answers

4 views

### Do LS-means correspond only to type 3 (marginal) ANOVA or do they match the type 2 too?

I understand, that sequential type 1 ANOVA corresponds to unadjusted, raw (data-based) arithmetic means, and marginal type 3 ANOVA corresponds to LS-means (model-based predictions). In case of no ...

**0**

votes

**0**answers

20 views

### using own data as starting parameters for modeling

I'm currently having a heated debate with coworkers on whether it's acceptable to use estimates derived directly from the data as starting parameters for modeling.
For example, if I want to fit a ...

**0**

votes

**0**answers

19 views

### Whether to cap the dependent variable while treating the outliers?

So I am trying to run a linear regression model in R where the objective is to identify what's driving the credit card spends including both primary and secondary. I have a dataset with 10000 obs
I ...

**0**

votes

**0**answers

10 views

### How do the units of the SIR model cancel out?

I was having trouble trying to understand the parameters of the simplest SIR model.
If beta is the effective contact rate and s is the percentage of people who are susceptible, then how do the units ...

**1**

vote

**0**answers

16 views

### Synthesizing crowd-sourced data to get a consensus estimate

I have some data of different people's estimates for the value of a numerical variable for a bunch of different objects, and I want to get a consensus estimate of the value of the variable for each ...

**1**

vote

**1**answer

26 views

### Formal method to predict probability of a continuous variable

I am trying build a regression to model cdf, i.e to predict the probability that a continuous variable exceeds an arbitrary threshold.
I explored using quantile regression, but it seems that I have ...

**1**

vote

**0**answers

5 views

### Exposure onset unknown in time to event analyses

If I want to model the time to an event (cancer) in a group of patients exposed to e.g. a cancerous substance. I now have some people where I do not know if they were already exposed to the substance ...

**1**

vote

**0**answers

26 views

### How to find the factor which is more contributing to an event

I have two datasets.
Dataset#1 consists of information of patients having one of the three diseases and the hospital they are treated. Each patient will have only one disease.
...

**0**

votes

**0**answers

9 views

### Literature on applying XGBoost to Time Series Data

I'm currently working on doing a time-series model with very limited data. However, most of the independent variables I have are not time-dependent, cross-sectional data. As such I want to apply some ...

**0**

votes

**0**answers

14 views

### Hedge fund rank on their returns or rating predictions modeling problem - How to find patterns between return and metrics

Problem:
Hi, I m a new machine learning practitioner. I have a dataset about hedge funds. It contains monthly hedge fund returns and some financial metrics. I calculated metrics for every month from ...

**2**

votes

**2**answers

65 views

### Inference, Prediction, & Model Fit?

I have a background in statistics (for social science), but I am confused about the ways in which Data Science textbooks (in particular, An Introduction to Statistical Learning and Practical ...

**0**

votes

**0**answers

9 views

### Removing the effect of Time series X on time series Y, when their relation is unknown

I am working on a dataset of 6 years measurements of a water quality parameter called 'chla' ( parameter 'X') measured by a sensor for each year from May to October. The parameter has its own trend ...

**0**

votes

**0**answers

15 views

### Alternative to Poisson process with non-independent events?

Correct me if I'm wrong but as far I know, a Poisson distribution assumes the events are independent of each other. I have a dataset of events over several years where the occurrence of an event may ...

**0**

votes

**0**answers

18 views

### Can I treat count data as continuous in Quantile regression?

I have data with the response is the number of dengue disease incidence per year from 2013-2019. The number of incidence per year is a big number, many values of 1000, minimum values is 228.
I am ...

**0**

votes

**0**answers

10 views

### Bounded Model Prediction Error

I have a predictive model (not ML based, uses first principles from a science textbook) and I would like to have a confident bound on on the error of the predictions. I am able to collect many ...

**0**

votes

**0**answers

27 views

### Using ARIMA to explain time series data

I am creating models to analyze energy use in four U.S. states. The goal is to create a model to explain historical data (1960-2009) as well as to create a forecast for 2025 and 2050. I am using R and ...

**1**

vote

**0**answers

24 views

### Expressibility of VAR(1) models

Am I correct in understanding that vector autoregressive (VAR) models of order one can capture seemingly more general modeling frameworks such as VAR(p) models, for orders $p > 1$, and ARMA models?
...

**0**

votes

**0**answers

18 views

### ANOVA of Model Performance metric

I have conducted multiple simulations of a hydrological model under a matrix of scenarios and calculated a Nash-Sutcliffe Efficiency (NSE) value for each simulation. The NSE is a model performance ...

**1**

vote

**1**answer

76 views

### How is the gamma distribution used in the model developed by the Imperial College COVID-19 Response Team?

The Imperial College COVID-19 Response Team report mentions, "Individual infectiousness is assumed to be variable, described by a gamma distribution with mean 1 and shape parameter 0.25." With that ...

**7**

votes

**2**answers

151 views

### Detailed description/scripts of mathematical models for Coronavirus

Pretty much the title, I am looking for some more in-depth explanation of the models used in the papers from Imperial College and The Lancet. In the second one, they are using something called a ...

**0**

votes

**0**answers

8 views

### How to connect distribution selection and model selection in generalized linear models [duplicate]

I am trying to better understand the general process of choosing a distribution family and linear predictor for a generalized linear model. There are plenty of examples out there for specific data ...

**0**

votes

**0**answers

57 views

### Recursive Linear Least Squares

In the following question 10, the value of $Q_{N}$ is asked. By using the given $\alpha$ value, I assume the value of $Q_{N}$ converges to zero since $\phi = 1$ and the $Q_N^-1$ and $\theta(N+1)$ will ...

**0**

votes

**0**answers

14 views

### Standardising coefficients of Logistic Regression Model

While I'm trying to interpret and use coefficients of Logistic Regression Model, there are two set of problems that I'm facing
Standardizing and bringing to a common scale, all the coefficients?
...

**0**

votes

**0**answers

18 views

### TimeSeries vs Dataset with timestamp feature

I am not able to provide the exact values of the dataset due to data privacy issues.
The variables I am using in my dataset are:
Date (2007 to 2019),
[A, B, C, D (Categorical Variables that doesn't ...

**0**

votes

**0**answers

16 views

### LASSO CV returns diverging tuning parameter

I am trying to fit a series of LASSO models, but my CV code keeps returning increasingly larger tuning parameters.
Specifically, I am working with a dataset with about 980,000 observations, about 180 ...

**2**

votes

**1**answer

22 views

### Correct interpretation of estimates in poisson regression output

I am learning to use and validate the Poisson regression model and interpret the results. I am using some data on grassland plant diversity in response to fertilizer and light. The experimental design ...

**0**

votes

**0**answers

28 views

### Are positively biased bootstrap-derived GAM predictions indicative of model issues?

all,
I am using a negative binomial GAM fitted with mgcv::gam to estimate counts for new data, and I wanted to use bootstrapping to find a 95% confidence interval for point estimates. In my ...

**1**

vote

**0**answers

15 views

### Difference between Gaussian Process Regression and Kriging - Regressive vs Interpolative?

I am using different machine learning models to model a noisy dataset for some study. I came across fitrgp model in MATLAB to model the data using gaussian process regression. I am also using dacefit ...

**0**

votes

**1**answer

15 views

### Model specifications - Independent variables interaction: hierarchy principle

I am testing the effect of commodity demand shocks to the foreign exchange market. Because my hypotheses include three-way interaction effects, I test my hypotheses using hierarchical regression model....

**0**

votes

**0**answers

18 views

### PCA can be used for categorical variables, when to only thing that you wanna explore is the explained variance and no the ordination of data?

I Need to know how effective are a set of categorical variables in explaining the pattern of species distribution, in order to model a potential distribution.
I already explore with numerical ...

**0**

votes

**0**answers

10 views

### Calculating a cumulative effect

I was running a GLMM with insect count data which we collected 1x in summer and 1x in autumn at wetlands and non-wetlands. Now, reviewers ask me to check for cumulative effects without any further ...

**0**

votes

**0**answers

7 views

### How to approach modelling based on question trees

I'm trying to cluster individuals based on survey data. The thing is, this survey sometimes has question trees, like this:
Do you have kids? (Q1)
Do all the kids go to school? (Q1.1)
What's the ...

**1**

vote

**0**answers

29 views

### Is it valid to make predictions on both the train and test set?

This seems like such an elementary question, but I can't seem to find a straight answer.
My data is on 300 U.S. counties, and each county has info on income, race, age, etc. Goal is to predict, say, ...

**0**

votes

**0**answers

13 views

### Why do we need to include the main effect terms when we have interaction/composite term? [duplicate]

I remembered this was one of the things I heard quite often when I had my stat classes. And in some statistical software, it would even warn me the model is incomplete if I include an interaction term ...

**2**

votes

**1**answer

24 views

### Why use an offset variable as a predictor instead of just converting outcome to a rate?

I am reporting the results of an analysis where we tested the effect of various demographic predictors on the number of counselling sessions undertaken by participants during a clinical trial. I ran a ...

**0**

votes

**1**answer

17 views

### Does mixed model of change evaluate mean change or change in means?

Paired t-test evaluates mean change. Mixed models evaluate change in means. How the two can be compatible? For balanced data it works - paired t-test = random intercept model = GLS with compound ...

**0**

votes

**0**answers

17 views

### In R, does passing the IPW weights to the geeglm function in the geepack manage the Missing At Random (MAR) scenario?

Dear R users good at statistical modelling, please help me with this question.
I want to use GEE in my analysis. Unfortunately, I have missing observations. It seems it is at least MAR, as I cannot ...

**0**

votes

**0**answers

13 views

### Model probability [virus infection] based on two interconnected events

Suppose we study the influence of two interconnected events on the probability of contracting a virus. The probability depends on two things that must necessarily be present:
1) the person goes to a ...

**0**

votes

**0**answers

70 views

### Mixed model instead of RM anova?

I have data points collected at 4 time points for $N$ subjects. I need to understand if there is a difference in mean readings at these 4 time points, and also if age and gender influence these mean ...

**0**

votes

**2**answers

33 views

### Interpretation of the coefficients in quantile regression - a discrepancy between sources

In various sources you can find, that interpretation of the quantile regression is pretty much like in the linear regression, with the difference now it's about the medians, rather than means.
Like ...

**1**

vote

**1**answer

28 views

### How to interpret the output from the robust regression in terms of the expected value?

Regression is a way to model the relationship between the conditional expected value and the predictors. But in the robust regression we don't have the expected value (say, arithmetic mean for ...

**1**

vote

**0**answers

32 views

### GLM for Mixed Additive/Multiplicative Effects in R?

I'm in a situation where I'm fitting a GLM to a multivariate data set and presenting the outputs to a client. However (long story ...) I have just found out that the client needs the results to take ...

**0**

votes

**0**answers

16 views

### Nested model, trt, week and period, how to code this?

I have a dataset and I want to see if there is a difference between trt, week and period. The data:
...

**2**

votes

**2**answers

31 views

### Difference between learning algorithm and model

Is Logistic regression , Linear regression , SVM a learning algorithm or a model.
I See in some literature they say K-NN is a learning algorithm and not a model.

**1**

vote

**0**answers

32 views

### Handling rare levels in a categorical variable? (or maybe it's not categorical at all)

I have a dataset where I'm trying to predict completion time of an application. There are a number of numeric and categorical predictors, with a one group of predictors being holds. An application may ...

**0**

votes

**0**answers

20 views

### Trouble to Analyze Experimental Data

I'm having trouble to analyze the data I collected from experiments.
In my experimental design I'm testing how sound intensities and interval between sound stimuli influence the reflex response in ...

**2**

votes

**1**answer

38 views

### What does it mean “they differ in parameter space” regarding the compound symmetry and random intercept model?

I read discussions on how the random intercept model is not equivalent to the compound symmetry. I understand, that the CS model allows for a case, where the responses are more similar across subjects ...

**0**

votes

**1**answer

18 views

### Why is the binomial model preferable to the hypergeometric model?

working through notes on the formulation of statistical models.
Looking at the following example of estimating a population proportion.
The following is said:
"estimation of a proportion is often ...

**0**

votes

**0**answers

35 views

### Why are total indices in Sobol decomposition so misleading when the output is random?

When we do sensitivity analysis with Sobol indices we usually report two sets of results:
First-order indices which represent the portion of variance in model output that can be explained by varying a ...