# Hypothesis Test for a Linear Combination of Three or More Probabilities

Suppose that you have a parameter w. As you change the value of that parameter, the probability of a given event occurring increases or decreases. I want to use the Method of Finite Differences to estimate the second derivative at a point, we'll call this w1. The probability of the event occurring at w1 is p1. Define w0 as w1 - h and w2 w1 + h, where h is a small step size. The probability of the event occurring at w0 is p0 and at w2 is p2.

An estimate of the second derivative with respect to the event occurring at w1 is (p2 - 2*p1 + p0)/(h^2). I am estimating the linear combination of probabilities

L = p2 - 2*p1 + p0

with L̄ = p̄2 - 2*p̄1 + p̄0

I want to test
H0: L = 0
H1: L != 0

I tried the Wald test: Where beta is the weight vector (1,-2,1), lambda = 0, and q̄i = 1 - p̄i. z2 ~ Chi-Square(1).

I ran the following simulation to see if the Wald test works well for this:

Initialize probabilities.

p2 = 0.4
p1 = 0.5
p0 = 0.55


Generate a Unfiform(0,1) random number. If less than given probability (p2, p1, or p0) return 1. This is how we'll estimate p2, p1, and p0.

game = function(p) {
u= runif(1,min=0,max=1)
if (u < p) {
return(1)
} else {
return(0)
}
}


We will begin a simulation and keep track of our estimates of p2, p1, and p0. Each game after the first 10, we will run a hypothesis test with:
H0: L = |p2e - 2*p1e + p0e| = 0
HA: L = |p2e - 2*p1e + p0e| > 0
We will run this hypothesis test until we can reject H0 with 95% confidence. Once that confidence level is reached, we will terminate the program and return s = sign(p2e - 2*p1e + p0e). If s < 0, then it was correctly identified as negative. If s > 0, it was mistakenly identified as positive due to random chance. We'll then keep track of the percent of the 1000 iterations of this process that correctly identified L as negative.

results = c()
j = 1
while (j <= 1000) {
w2 = c()
w1 = c()
w0 = c()
i = 1
while (i < Inf) {
w2 = c(w2,game(p2))
w1 = c(w1,game(p1))
w0 = c(w0,game(p0))
p2e = sum(w2)/i
p1e = sum(w1)/i
p0e = sum(w0)/i
l = p2e - (2*p1e) + p0e
d = ((p2e*(1-p2e))+(4*p1e*(1-p1e))+(p0e*(1-p0e)))/i
z2 = (l^2)/d
##the smaller i is, the higher the probability d is zero and thus z2
is undefined. I set i > 10 because I couldn't figure out how to use
TryCatch()
if (i > 10) {
if (pchisq(q = z2, df = 1, lower.tail = T) > 0.95) {
if (l<0) {
result = 1
} else {
result = 0
}
break
}
}
i = i + 1
}
results = c(results,result)
j = j + 1
}

percent_correct = sum(results)/(j-1)


The true L is -0.05 and I set my confidence to 95%. Thus, if the Wald test were applicable here, we would expect that in approximately 950 of the 1000 iterations, L was correctly identified as negative and in approximately 50 of the 1000 iterations it was mistakenly identified as positive. This should be the case because we steadily increased our sample size n until 95% confidence was achieved. However, this is not what I found. I got 871 successes. Pr(Binomial(n=1000,p=0.95)<=871) is virtually zero. This seems to suggest that either (1) the Wald test doesn't work here, (2) I'm using it wrong, or (3) there's some error in my code.

I ran another simulation using the process outlined in Andrés et. al (2012) and didn't get much more encouraging results.

Does anyone have any suggestions? Thanks in advance.

Martín Andrés, Antonio & Herranz, Inmaculada & Álvarez Hernández, María. (2012). The optimal method to make inferences about a linear combination of proportions. Journal of Statistical Computation and Simulation. 82. 123-135. 10.1080/00949655.2010.530601. http://en.wikipedia.org/wiki/Finite_difference_method

• +1. At first sight, your problem looks like it's due to not accounting for the (relatively strong negative) covariances between the estimates of the individual proportions. But I won't try to guess what your code does or what you intend it to do... . You will greatly increase your chances of getting good help if you would explain the code. – whuber Mar 24 at 14:16
• @whuber Thanks for the advice. I tried to flesh out my code a bit. Could you explain why you think there's covariance between the estimates of the proportions? – Patrick Mar 25 at 2:54
• It's unclear, because you haven't given enough context, but typically anyone estimating a linear combination of proportions is involved in estimating common proportions of some whole. In that case, because the proportions must sum to unity, they will be negatively correlated. – whuber Mar 25 at 14:27
• Ah ok. Now I see why you might suspect covariance. I'll edit my question shortly. – Patrick Mar 26 at 1:55
• Thank you for providing the extra detail. Your problem lies in the stopping algorithm: by ending the simulation once a certain confidence is reached, you are biasing the results. See ucslk.com/questions/310119 for a thread that directly addresses this. – whuber Mar 26 at 10:19