I am looking to create a function that simulates data arising from a mediation process, where a predictor (X) has an indirect effect on the outcome (Y) through the mediator (M).

I consulted the answers to the following questions:

I would like the function to simulate:

the mediator and outcome if the user inputs the predictor,

the predictor and outcome if the user inputs the mediator, or

the predictor and mediator if the user inputs the outcome

I would like the user to be able to specify various conditions for simulating the data arising from mediation, including the correlation between `X`

and `Y`

, the correlation between `X`

and `M`

, the correlation between `M`

and `Y`

, and the proportion of the effect mediated. The proportion of the effect mediated (Pm) is the ratio of the indirect effect (`ab`

) to the total effect (Wen & Fan, 2015). I would like the function to simulate the data that would yield a mediation model with the conditions specified by the user.

For instance, I would like the function to estimate:

the total effect if the user inputs the correlation between

`X`

and`M`

, the correlation between`M`

and`Y`

, and`proportionMediated`

(Pm)`proportionMediated`

if the user inputs the correlation between`X`

and`M`

, the correlation between`M`

and`Y`

, and the correlation between`X`

and`Y`

the correlation between

`X`

and`M`

and the correlation between`M`

and`Y`

(assuming they are equal) if the user inputs the correlation between`X`

and`Y`

and`proportionMediated`

the correlation between

`X`

and`M`

if the user inputs the correlation between`M`

and`Y`

, the correlation between`X`

and`Y`

, and`proportionMediated`

the correlation between

`M`

and`Y`

if the user inputs the correlation between`X`

and`M`

, the correlation between`X`

and`Y`

, and`proportionMediated`

I used the answer to the first link (above) in writing the beginnings of a function:

```
simulateIndirectEffect <- function(x, m, y, a, b, cTotal, proportionMediated, seed){
if(missing(seed)){
seed <- round(runif(1, 0, 1000)*100)
}
if(missing(cTotal) == TRUE){
cTotal <- (a * b) / proportionMediated
} else if(missing(proportionMediated) == TRUE){
proportionMediated <- (a * b) / cTotal
} else if(missing(a) == TRUE & missing(b) == TRUE){
a <- sqrt(proportionMediated * cTotal)
b <- sqrt(proportionMediated * cTotal)
} else if(missing(a) == TRUE){
a <- (proportionMediated * cTotal) / b
} else if(missing(b) == TRUE){
b <- (proportionMediated * cTotal) / a
}
ab <- a * b
cPrime <- cTotal - ab
if(missing(x) == FALSE){
sampleSize <- length(x)
set.seed(seed + 1)
m <- a*x + sqrt(1-a^2) * rnorm(sampleSize) #what should I change error term to?
error <- 1 - (cPrime^2 + b^2 + 2*a*cPrime*b)
set.seed(seed + 2)
y <- cPrime*x + b*m + error*rnorm(sampleSize) #what should I change error term to?
} else if(missing(m) == FALSE){
sampleSize <- length(m)
set.seed(seed + 1)
#x <- #Not sure what to put here
set.seed(seed + 2)
#y <- #Not sure what to put here
} else if(missing(y) == FALSE){
sampleSize <- length(y)
set.seed(seed + 1)
#x <- #Not sure what to put here
set.seed(seed + 2)
#m <- #Not sure what to put here
}
simulatedData <- as.data.frame(cbind(x, m, y))
return(simulatedData)
}
```

I have three questions:

- How can we simulate
`m`

and`y`

given`x`

(and the conditions specified) in the above function? - How can we simulate
`x`

and`y`

given`m`

(and the conditions specified) in the above function? - How can we simulate
`x`

and`m`

given`y`

(and the conditions specified) in the above function?

Note that the function above does not appear to simulate the mediation data per the conditions specified. For instance, when I simulate data based on a total effect of .6 and a proportion of the effect mediated of .4, my correlations are way too high. I want my correlation between x and y to be .6 (i.e., the total effect), but it is .99 in the simulated data (see below). I suspect that using `rnorm()`

to generate a random variable with a mean of 0 and SD of 1 is too small to add to the error term, but am not sure what to use instead.

```
> predictor <- rnorm(1000, mean = 50, sd = 10)
> myData <- simulateIndirectEffect(x = predictor, cTotal = .6, proportionMediated = .4, seed = 12345)
> round(cor(myData), 2)
x m y
x 1.00 0.98 0.99
m 0.98 1.00 0.99
y 0.99 0.99 1.00
```

References:

Wen, Z., & Fan, X. (2015). Monotonicity of effect sizes: Questioning kappa-squared as mediation effect size measure. Psychological Methods, 20, 193-203. doi: 10.1037/met0000029