User-based vs Clustering-based Collaborative Filtering

Reading about recommender systems in this blog, i found that KNN (k-nearest neighbors) can be used for user-item (user-based) collaborative filtering to find similar users. But in another category of ...
Confusion on scikit-learn nested cross validation example

There are a ton of threads on nested cross-validation. "An intuitive understanding of each fold of a nested cross validation for parameter/model tuning" gives a good explanation. scikit-learn has an ...
LOOCV in Caret package ( randomForest example) - not unique results

I pose you my doubts: For what I know there is only a single way to perform a LOOCV for a model (i.e. testing each one of the N elements vs the model trained with the other N-1 elements). Namely, ...
How does this solution relate to the actual real-life stats problem?

4.62 in Newbold (8 ed): A new warehouse is being designed and a decision concerning the number of loading docks is required. There are two models based on truckarrival assumptions for the ...
Statistical Model used for predicting number of deaths due to COVID-19 [closed]

What statistical model is used to predict number of deaths due to COVID-19? Suppose I have a dataset with number of deaths for last two months and would like to predict number of deaths for next month?...
How to analyze repeated measures when condition changes at each time point?

I have a dataset from a repeated measures experiment that I am trying to analyze. The experiment had 4 possible conditions. Participants were measured on 1 condition and then again on a second ...
cox frailty model in R

I run a cox frailty model(model 1) in R, by adding new co-variate to model(model 2) The ACI decreases that show the new model is better than model 1, but the variance of random effect of model 2 is ...
Iterative solution to Gamma distribution MLE problem

I'm trying to follow the derivation for the MLE parameters of the gamma distribution in [1]. The standard approach is to derive an expression for the log likelihood, differentiate with respect to ...
Is there any formal guideline that would indicate the necessity for adjustment for baseline when analysing change from baseline?

As in the topic. I saw many critical discussions along with mathematical explanation on why the baseline shouldn't (or mustn't) be employed as a covariate, when analysing the change from baseline. ...
Formula of the Chebyshev's inequality for an asymmetric interval

The formula for Chebyshev's inequality for the asymmetric two-sided case is: $$\Pr( l < X < h ) \ge \frac{ 4 [ ( \mu - l )( h - \mu ) - \sigma^2 ] }{ ( h - l )^2 } \ .$$ What I don't understand ...
What is the ACF plot of $x_t = 0.9 x_{t-2} + w_t$

I am just learning time series, and I am wondering about the following AR(2) model: $x_t = 0.9 x_{t-2} + w_t, w_t \sim N(0, \sigma_w^2)$ Please show me the plot of its Autocorrelation Function, or ...
How to calculate correlation coefficient and AIC for non-linear estimation in R or Statistica?

I need to compare two non-linear models of growth. The first one is calculated with nls function in R and with non-linear estimation function in Statistica - both programs gave identical results and ...
R: read.csv imports my numeric columns with lots of missing as NULL, how to prevent? [closed]

My data has 63 columns, and for a column, 'hours' has more than 50% of missing values and it's converted as NULL when importing. But the column is very important and needs to be used after cleaned....
KL divergence of categorical distribution with continuous inputs

I want to simulate a process. I have a probability distribution and I have d classes to choose from. The inputs of my distribution are 3d points and it maps each of these points to a d-dimensional ...
Virtual seminars and workshops in Statistics and//or Machine Learning [closed]

I was wondering if there are any webinars in Statistics or Machine Learning one could join through Zoom during these bizarre times. I know that economists have a list of online resources here: http:...
No significant p values after multiple comparison of 126 tests

I am wondering if there is any reasonable other ways to adjust for multiple comparisons when you have such a large number of tests. I have a study with 126 brain regions being scanned in a group (N=20)...
Properties Of Bivariate Distribution Function [closed]

I have a problem with the below problem. Actually I have no idea how can I solve the question. I tried to implement 4 properties of bivariate distribution function, but I couldn't. I have derived to ...
mixed models vs anova for within-subject design

I know this topic may have been handled before and I apologise for being so lazy in checking all the other posts. Here's my concern: I have a group of babies (so everything is within-subjects) whose ...
AIC model selection for group studies

In some areas, it is common to fit a model separately to multiple clusters in a data set, for instance fitting a cognitive model separately to data from each participant in an experiment. Model ...
Is there a way to normalize my data of multiple groups to use as a random effect or to incorporate it into my model

I have Retention Efficiency(RE) percentages of 5 food types for sponges. RE is found by ((incurrent food - excurrent food)/incurrent food). I am running a generalized mixed effect model to look to see ...
variation partitioning with a GAMM model including an auto-correlation structure in R

I would like to undertake variation partitioning in a GAM framework in R, as described here: http://r.789695.n4.nabble.com/variance-explained-by-each-term-in-a-GAM-td836513.html However, my gam ...
Regression - Interpretation of coefficients and probability

I am very confused about the output of my regressions. First of all, I am not even sure if I could divide my sample as I did, meaning that by subsampling as I did the variable ESG score is both ...
Estimating ratio of two PDFs where one of them is noisy

I have a list $L_1$ of positive integers, such as $[1, 2, 1, 3, 10, ...]$. There are repetitions. From this list, I sample (with repetition) according to some method (not relevant to my question), and ...
Using the STAN math library [closed]

I would like to use a Matern Covariance Function for gaussian process regression in STAN. (Through RStan) The standard exponential covariance function works withouth issues ...
Why does component-wise median not make sense in higher dimensions?

I would like to compute the median of a higher-dimensional point set by computing the component-wise median for each individual dimension. The point that consists of the medians of each individual ...
When and when not to use an A/A test?

I'm curious about the circumstances under which an A/A test is a appropriate, vs. when it is not. Here is my current understanding: A/A testing is an empirical method, and the point is to ...
Can a random variable be expressed as a sum of deterministic and random variable?

Say we have a sequence of random variables $\{X_t:t\geq 0\}$ following an unknown stochastic process with distribution $X_t\sim N(\mu_X,\sigma_X^2)$. This idea came to me from the additive noise model....
Compare linear regression models for same variables but different data

I have created a linear regression model for height and weight using UK data, and want to compare this with the height and weight relationship of other countries. What would be an appropriate method ...
Combining information from multiple distributions

I have 13 classes. For each class, I have a different distribution: e.g. For each distribution, the y-axis indicates the probability and the x-axis indicates a count value. Given some input data, I ...
graph convolution network

I am trying to understand papers and lectures on graph convolution networks but whenever I open some paper, I get lost on the very first page. I started with some videos like this and this and papers ...
Custom metrics for multiclass classification when class errors have different weights

I have a multiclass classification problem (eg. the target variable is made by 4 different outcomes: Product A, Product B, Product C and NO Product). Not all the errors are equal: for example, if the ...
How to test Multinomial Logistic Regression assumption in R

So I'm currently trying to use a multinomial logistic regression model in R on a data set with 13 variables (mix of continuous and categorical) and 33,000 observations, where the dependent variable ...
First-difference and lags

I am newbie to time-series econometrics. I want to estimate a model for the association between greenhouse gas emissions and new green technologies. The estimation equation I want to use is CO_{t} = ...
Choosing a rotation method for ESEM

I am trying to decide on which oblique rotation method to use for my ESEM analysis (with MLR estimator). MPLUS provides a number of options (GEOMIN, QUARTIMIN, OBLIMIN, CRAWFER, etc.). I was wondering ...
Linking Correlated Dependent Variable with Independent Variable

I have a Monte Carlo model that generates a distribution of possibilities $X_i$ for the non-normal stochastic process $Z$ it describes. The distribution of $X$ and $Z$ is fat tailed but for the most ...
How can the prediction of a model be assessed?

I just played around with the VGG16 and ResNet56 model trained on the ImageNet dataset and realized, after running some tests, that the prediction confidence of both networks is really high even if ...
What does “version” mean here?

In a paper I read about the following statement: "Assumption 1. There is a version of $f(x)$ that is twice continuously differentiable" Note that $f(x)=E(Y|x)$ is an unknown function to be estimated ...
Calculating Confidence Interval for Estimated Parameters of SEIR model

I used a Log-Likelihood Estimation (Poisson) Objective Function to estimate and fit a curve to a data of reported infected cases of COVID-19 using SEIR model in order to estimate its coefficients. How ...
Transforming a random sample [duplicate]

For a dsitribution $p(X)$, let $x_1,\ldots,x_n$ be an independent sample of $p(X)$. Consider the one to one transformation $h(\cdot)$ such that $Y = h(X)$. If we apply the transformation to each of ...
Determining the power of the test in this question

The following problem is from Devore's Probability and Statistics for Engineering and the Sciences, 8th edition, exercise 8.1 question 33: Reconsider the accompanying sample data on expense ratio(%...
How to use linear regression for prediction [closed]

A taxi company monitoring the safety of its cabs kept track of the number of miles tires had been driven (in thousands) and the depth of the tread remaining (in mm). Their data are displayed in the ...
cant seem to do random slope intercept model because I am missing values, any way around it?

I have a data set with 3 fixed effects categories region(2 levels), genus(2 levels), and food(5 levels). I am looking to see if sponges have different retention efficiency of the different food type ...
R: Question about central limit theorem

Hello everyone :) can you help me please, I really don't understand my teacher's videos and it is the last part of our 20-pages work :O In the question 1 they ask us to create a Poisson distribution ...
Measures of dispersion which are scale invariant and can handle a mean of 0

Is there a measure of dispersion which is scale-invariant, s.t. I can compare it between datasets of different scale and does not have the problem like the Coefficient of Variation which is undefined ...
Given a sample of $N$ observed values I'd like to test the null hypothesis that they arose from an arbitrary PDF (for which I have the analytical form). There are tests in place that can handle some ...