Likelihood of rejecting a fair coin (repeated significance testing)

Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?

(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)

asked 6 hours ago

msh210

959

add a comment |

(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)

asked 6 hours ago

msh210

959

add a comment |

(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)

asked 6 hours ago

msh210

959

(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)

statistical-significance chi-squared goodness-of-fit frequentist

asked 6 hours ago

msh210

959

asked 6 hours ago

msh210

959

asked 6 hours ago

msh210

959

asked 6 hours ago

msh210

959

asked 6 hours ago

msh210

959

add a comment |

2 Answers
2

active

oldest

votes

This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.

Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:

$$begin{equation} begin{aligned}
T(mathbf{x}_n)
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
&= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
end{aligned} end{equation}$$

Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:

$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$

Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$

Hence, the "ruin probability" you are looking for is:

$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$

Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.

To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.

Define the test statistics up to the ends of these subsequences as:

$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$

With a little algebra we can show that:

$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$

Under this condition we have:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$

Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:

$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$

The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:

$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$

answered 3 hours ago

Ben

21.3k224101

You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago

add a comment |

For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.

Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.

Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.

So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.

For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.

answered 5 hours ago

Sebapi

416

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384263%2flikelihood-of-rejecting-a-fair-coin-repeated-significance-testing%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$

Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$

Hence, the "ruin probability" you are looking for is:

$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$

Define the test statistics up to the ends of these subsequences as:

$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$

With a little algebra we can show that:

$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$

Under this condition we have:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$

Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:

$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$

$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$

answered 3 hours ago

Ben

21.3k224101

You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago

add a comment |

$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$

Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$

Hence, the "ruin probability" you are looking for is:

$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$

Define the test statistics up to the ends of these subsequences as:

$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$

With a little algebra we can show that:

$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$

Under this condition we have:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$

Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:

$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$

$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$

answered 3 hours ago

Ben

21.3k224101

You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago

add a comment |

$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$

Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$

Hence, the "ruin probability" you are looking for is:

$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$

Define the test statistics up to the ends of these subsequences as:

$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$

With a little algebra we can show that:

$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$

Under this condition we have:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$

Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:

$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$

$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$

answered 3 hours ago

Ben

21.3k224101

$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$

Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$

Hence, the "ruin probability" you are looking for is:

$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$

Define the test statistics up to the ends of these subsequences as:

$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$

With a little algebra we can show that:

$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$

Under this condition we have:

$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$

Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:

$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$

$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$

answered 3 hours ago

Ben

21.3k224101

answered 3 hours ago

Ben

21.3k224101

answered 3 hours ago

Ben

21.3k224101

answered 3 hours ago

Ben

21.3k224101

You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago

add a comment |

You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago

You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago

add a comment |

For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.

So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.

For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.

answered 5 hours ago

Sebapi

416

add a comment |

For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.

So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.

For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.

answered 5 hours ago

Sebapi

416

add a comment |

For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.

So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.

For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.

answered 5 hours ago

Sebapi

416

For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.

So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.

For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.

answered 5 hours ago

Sebapi

416

answered 5 hours ago

Sebapi

416

answered 5 hours ago

Sebapi

416

answered 5 hours ago

Sebapi

416

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj