Likelihood of rejecting a fair coin (repeated significance testing)
Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?
(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)
statistical-significance chi-squared goodness-of-fit frequentist
add a comment |
Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?
(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)
statistical-significance chi-squared goodness-of-fit frequentist
add a comment |
Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?
(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)
statistical-significance chi-squared goodness-of-fit frequentist
Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?
(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)
statistical-significance chi-squared goodness-of-fit frequentist
statistical-significance chi-squared goodness-of-fit frequentist
asked 6 hours ago
msh210
959
959
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.
Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:
$$begin{equation} begin{aligned}
T(mathbf{x}_n)
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
&= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
end{aligned} end{equation}$$
Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:
$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$
Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$
Hence, the "ruin probability" you are looking for is:
$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$
Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.
To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.
Define the test statistics up to the ends of these subsequences as:
$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$
With a little algebra we can show that:
$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$
Under this condition we have:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$
Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:
$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$
The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:
$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$
You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago
add a comment |
For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.
Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.
Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.
So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.
For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384263%2flikelihood-of-rejecting-a-fair-coin-repeated-significance-testing%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.
Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:
$$begin{equation} begin{aligned}
T(mathbf{x}_n)
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
&= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
end{aligned} end{equation}$$
Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:
$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$
Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$
Hence, the "ruin probability" you are looking for is:
$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$
Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.
To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.
Define the test statistics up to the ends of these subsequences as:
$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$
With a little algebra we can show that:
$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$
Under this condition we have:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$
Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:
$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$
The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:
$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$
You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago
add a comment |
This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.
Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:
$$begin{equation} begin{aligned}
T(mathbf{x}_n)
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
&= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
end{aligned} end{equation}$$
Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:
$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$
Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$
Hence, the "ruin probability" you are looking for is:
$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$
Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.
To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.
Define the test statistics up to the ends of these subsequences as:
$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$
With a little algebra we can show that:
$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$
Under this condition we have:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$
Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:
$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$
The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:
$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$
You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago
add a comment |
This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.
Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:
$$begin{equation} begin{aligned}
T(mathbf{x}_n)
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
&= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
end{aligned} end{equation}$$
Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:
$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$
Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$
Hence, the "ruin probability" you are looking for is:
$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$
Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.
To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.
Define the test statistics up to the ends of these subsequences as:
$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$
With a little algebra we can show that:
$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$
Under this condition we have:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$
Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:
$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$
The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:
$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$
This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.
Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:
$$begin{equation} begin{aligned}
T(mathbf{x}_n)
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
&= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
&= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
end{aligned} end{equation}$$
Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:
$$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$
Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$
Hence, the "ruin probability" you are looking for is:
$$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$
Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.
To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.
Define the test statistics up to the ends of these subsequences as:
$$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$
With a little algebra we can show that:
$$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$
Under this condition we have:
$$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$
Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:
$$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$
The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:
$$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$
answered 3 hours ago
Ben
21.3k224101
21.3k224101
You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago
add a comment |
You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago
You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago
You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
– mdewey
21 mins ago
add a comment |
For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.
Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.
Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.
So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.
For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.
add a comment |
For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.
Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.
Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.
So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.
For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.
add a comment |
For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.
Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.
Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.
So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.
For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.
For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.
Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.
Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.
So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.
For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.
answered 5 hours ago
Sebapi
416
416
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384263%2flikelihood-of-rejecting-a-fair-coin-repeated-significance-testing%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown