Questions tagged [modeling]

This tag describes the process of creating a statistical or machine learning model. Always add a more specific tag.

Filter by
Sorted by
Tagged with
0
votes
1answer
5 views

Is there any formal guideline that would indicate the necessity for adjustment for baseline when analysing change from baseline?

As in the topic. I saw many critical discussions along with mathematical explanation on why the baseline shouldn't (or mustn't) be employed as a covariate, when analysing the change from baseline. ...
0
votes
0answers
4 views

Do LS-means correspond only to type 3 (marginal) ANOVA or do they match the type 2 too?

I understand, that sequential type 1 ANOVA corresponds to unadjusted, raw (data-based) arithmetic means, and marginal type 3 ANOVA corresponds to LS-means (model-based predictions). In case of no ...
0
votes
0answers
20 views

using own data as starting parameters for modeling

I'm currently having a heated debate with coworkers on whether it's acceptable to use estimates derived directly from the data as starting parameters for modeling. For example, if I want to fit a ...
0
votes
0answers
19 views

Whether to cap the dependent variable while treating the outliers?

So I am trying to run a linear regression model in R where the objective is to identify what's driving the credit card spends including both primary and secondary. I have a dataset with 10000 obs I ...
0
votes
0answers
10 views

How do the units of the SIR model cancel out?

I was having trouble trying to understand the parameters of the simplest SIR model. If beta is the effective contact rate and s is the percentage of people who are susceptible, then how do the units ...
1
vote
0answers
16 views

Synthesizing crowd-sourced data to get a consensus estimate

I have some data of different people's estimates for the value of a numerical variable for a bunch of different objects, and I want to get a consensus estimate of the value of the variable for each ...
1
vote
1answer
26 views

Formal method to predict probability of a continuous variable

I am trying build a regression to model cdf, i.e to predict the probability that a continuous variable exceeds an arbitrary threshold. I explored using quantile regression, but it seems that I have ...
1
vote
0answers
5 views

Exposure onset unknown in time to event analyses

If I want to model the time to an event (cancer) in a group of patients exposed to e.g. a cancerous substance. I now have some people where I do not know if they were already exposed to the substance ...
1
vote
0answers
26 views

How to find the factor which is more contributing to an event

I have two datasets. Dataset#1 consists of information of patients having one of the three diseases and the hospital they are treated. Each patient will have only one disease. ...
0
votes
0answers
9 views

Literature on applying XGBoost to Time Series Data

I'm currently working on doing a time-series model with very limited data. However, most of the independent variables I have are not time-dependent, cross-sectional data. As such I want to apply some ...
0
votes
0answers
14 views

Hedge fund rank on their returns or rating predictions modeling problem - How to find patterns between return and metrics

Problem: Hi, I m a new machine learning practitioner. I have a dataset about hedge funds. It contains monthly hedge fund returns and some financial metrics. I calculated metrics for every month from ...
2
votes
2answers
65 views

Inference, Prediction, & Model Fit?

I have a background in statistics (for social science), but I am confused about the ways in which Data Science textbooks (in particular, An Introduction to Statistical Learning and Practical ...
0
votes
0answers
9 views

Removing the effect of Time series X on time series Y, when their relation is unknown

I am working on a dataset of 6 years measurements of a water quality parameter called 'chla' ( parameter 'X') measured by a sensor for each year from May to October. The parameter has its own trend ...
0
votes
0answers
15 views

Alternative to Poisson process with non-independent events?

Correct me if I'm wrong but as far I know, a Poisson distribution assumes the events are independent of each other. I have a dataset of events over several years where the occurrence of an event may ...
0
votes
0answers
18 views

Can I treat count data as continuous in Quantile regression?

I have data with the response is the number of dengue disease incidence per year from 2013-2019. The number of incidence per year is a big number, many values of 1000, minimum values is 228. I am ...
0
votes
0answers
10 views

Bounded Model Prediction Error

I have a predictive model (not ML based, uses first principles from a science textbook) and I would like to have a confident bound on on the error of the predictions. I am able to collect many ...
0
votes
0answers
27 views

Using ARIMA to explain time series data

I am creating models to analyze energy use in four U.S. states. The goal is to create a model to explain historical data (1960-2009) as well as to create a forecast for 2025 and 2050. I am using R and ...
1
vote
0answers
24 views

Expressibility of VAR(1) models

Am I correct in understanding that vector autoregressive (VAR) models of order one can capture seemingly more general modeling frameworks such as VAR(p) models, for orders $p > 1$, and ARMA models? ...
0
votes
0answers
18 views

ANOVA of Model Performance metric

I have conducted multiple simulations of a hydrological model under a matrix of scenarios and calculated a Nash-Sutcliffe Efficiency (NSE) value for each simulation. The NSE is a model performance ...
1
vote
1answer
76 views

How is the gamma distribution used in the model developed by the Imperial College COVID-19 Response Team?

The Imperial College COVID-19 Response Team report mentions, "Individual infectiousness is assumed to be variable, described by a gamma distribution with mean 1 and shape parameter 0.25." With that ...
7
votes
2answers
151 views

Detailed description/scripts of mathematical models for Coronavirus

Pretty much the title, I am looking for some more in-depth explanation of the models used in the papers from Imperial College and The Lancet. In the second one, they are using something called a ...
0
votes
0answers
8 views

How to connect distribution selection and model selection in generalized linear models [duplicate]

I am trying to better understand the general process of choosing a distribution family and linear predictor for a generalized linear model. There are plenty of examples out there for specific data ...
0
votes
0answers
57 views

Recursive Linear Least Squares

In the following question 10, the value of $Q_{N}$ is asked. By using the given $\alpha$ value, I assume the value of $Q_{N}$ converges to zero since $\phi = 1$ and the $Q_N^-1$ and $\theta(N+1)$ will ...
0
votes
0answers
14 views

Standardising coefficients of Logistic Regression Model

While I'm trying to interpret and use coefficients of Logistic Regression Model, there are two set of problems that I'm facing Standardizing and bringing to a common scale, all the coefficients? ...
0
votes
0answers
18 views

TimeSeries vs Dataset with timestamp feature

I am not able to provide the exact values of the dataset due to data privacy issues. The variables I am using in my dataset are: Date (2007 to 2019), [A, B, C, D (Categorical Variables that doesn't ...
0
votes
0answers
16 views

LASSO CV returns diverging tuning parameter

I am trying to fit a series of LASSO models, but my CV code keeps returning increasingly larger tuning parameters. Specifically, I am working with a dataset with about 980,000 observations, about 180 ...
2
votes
1answer
22 views

Correct interpretation of estimates in poisson regression output

I am learning to use and validate the Poisson regression model and interpret the results. I am using some data on grassland plant diversity in response to fertilizer and light. The experimental design ...
0
votes
0answers
28 views

Are positively biased bootstrap-derived GAM predictions indicative of model issues?

all, I am using a negative binomial GAM fitted with mgcv::gam to estimate counts for new data, and I wanted to use bootstrapping to find a 95% confidence interval for point estimates. In my ...
1
vote
0answers
15 views

Difference between Gaussian Process Regression and Kriging - Regressive vs Interpolative?

I am using different machine learning models to model a noisy dataset for some study. I came across fitrgp model in MATLAB to model the data using gaussian process regression. I am also using dacefit ...
0
votes
1answer
15 views

Model specifications - Independent variables interaction: hierarchy principle

I am testing the effect of commodity demand shocks to the foreign exchange market. Because my hypotheses include three-way interaction effects, I test my hypotheses using hierarchical regression model....
0
votes
0answers
18 views

PCA can be used for categorical variables, when to only thing that you wanna explore is the explained variance and no the ordination of data?

I Need to know how effective are a set of categorical variables in explaining the pattern of species distribution, in order to model a potential distribution. I already explore with numerical ...
0
votes
0answers
10 views

Calculating a cumulative effect

I was running a GLMM with insect count data which we collected 1x in summer and 1x in autumn at wetlands and non-wetlands. Now, reviewers ask me to check for cumulative effects without any further ...
0
votes
0answers
7 views

How to approach modelling based on question trees

I'm trying to cluster individuals based on survey data. The thing is, this survey sometimes has question trees, like this: Do you have kids? (Q1) Do all the kids go to school? (Q1.1) What's the ...
1
vote
0answers
29 views

Is it valid to make predictions on both the train and test set?

This seems like such an elementary question, but I can't seem to find a straight answer. My data is on 300 U.S. counties, and each county has info on income, race, age, etc. Goal is to predict, say, ...
0
votes
0answers
13 views

Why do we need to include the main effect terms when we have interaction/composite term? [duplicate]

I remembered this was one of the things I heard quite often when I had my stat classes. And in some statistical software, it would even warn me the model is incomplete if I include an interaction term ...
2
votes
1answer
24 views

Why use an offset variable as a predictor instead of just converting outcome to a rate?

I am reporting the results of an analysis where we tested the effect of various demographic predictors on the number of counselling sessions undertaken by participants during a clinical trial. I ran a ...
0
votes
1answer
17 views

Does mixed model of change evaluate mean change or change in means?

Paired t-test evaluates mean change. Mixed models evaluate change in means. How the two can be compatible? For balanced data it works - paired t-test = random intercept model = GLS with compound ...
0
votes
0answers
17 views

In R, does passing the IPW weights to the geeglm function in the geepack manage the Missing At Random (MAR) scenario?

Dear R users good at statistical modelling, please help me with this question. I want to use GEE in my analysis. Unfortunately, I have missing observations. It seems it is at least MAR, as I cannot ...
0
votes
0answers
13 views

Model probability [virus infection] based on two interconnected events

Suppose we study the influence of two interconnected events on the probability of contracting a virus. The probability depends on two things that must necessarily be present: 1) the person goes to a ...
0
votes
0answers
70 views

Mixed model instead of RM anova?

I have data points collected at 4 time points for $N$ subjects. I need to understand if there is a difference in mean readings at these 4 time points, and also if age and gender influence these mean ...
0
votes
2answers
33 views

Interpretation of the coefficients in quantile regression - a discrepancy between sources

In various sources you can find, that interpretation of the quantile regression is pretty much like in the linear regression, with the difference now it's about the medians, rather than means. Like ...
1
vote
1answer
28 views

How to interpret the output from the robust regression in terms of the expected value?

Regression is a way to model the relationship between the conditional expected value and the predictors. But in the robust regression we don't have the expected value (say, arithmetic mean for ...
1
vote
0answers
32 views

GLM for Mixed Additive/Multiplicative Effects in R?

I'm in a situation where I'm fitting a GLM to a multivariate data set and presenting the outputs to a client. However (long story ...) I have just found out that the client needs the results to take ...
0
votes
0answers
16 views

Nested model, trt, week and period, how to code this?

I have a dataset and I want to see if there is a difference between trt, week and period. The data: ...
2
votes
2answers
31 views

Difference between learning algorithm and model

Is Logistic regression , Linear regression , SVM a learning algorithm or a model. I See in some literature they say K-NN is a learning algorithm and not a model.
1
vote
0answers
32 views

Handling rare levels in a categorical variable? (or maybe it's not categorical at all)

I have a dataset where I'm trying to predict completion time of an application. There are a number of numeric and categorical predictors, with a one group of predictors being holds. An application may ...
0
votes
0answers
20 views

Trouble to Analyze Experimental Data

I'm having trouble to analyze the data I collected from experiments. In my experimental design I'm testing how sound intensities and interval between sound stimuli influence the reflex response in ...
2
votes
1answer
38 views

What does it mean “they differ in parameter space” regarding the compound symmetry and random intercept model?

I read discussions on how the random intercept model is not equivalent to the compound symmetry. I understand, that the CS model allows for a case, where the responses are more similar across subjects ...
0
votes
1answer
18 views

Why is the binomial model preferable to the hypergeometric model?

working through notes on the formulation of statistical models. Looking at the following example of estimating a population proportion. The following is said: "estimation of a proportion is often ...
0
votes
0answers
35 views

Why are total indices in Sobol decomposition so misleading when the output is random?

When we do sensitivity analysis with Sobol indices we usually report two sets of results: First-order indices which represent the portion of variance in model output that can be explained by varying a ...

1
2 3 4 5
40