Likelihood of rejecting a fair coin (repeated significance testing)












2














Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?





(This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)










share|cite|improve this question



























    2














    Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?





    (This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)










    share|cite|improve this question

























      2












      2








      2







      Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?





      (This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)










      share|cite|improve this question













      Suppose I have a fair coin and I flip it numerous times, testing after every time using Pearson's $chi^2$ test of fit to fairness. What is the likelihood that I will, at some point, reject that the coin is fair (for given $alpha$)? If (as I suspect) that's $1$, then is there an expected number of flips after which I'll reject that the coin is fair?





      (This precise example comes up when trying to explain to coworkers by analogy why one can't repeatedly peek at split-test results without accounting for the repeated testing.)







      statistical-significance chi-squared goodness-of-fit frequentist






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked 6 hours ago









      msh210

      959




      959






















          2 Answers
          2






          active

          oldest

          votes


















          2














          This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.





          Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:



          $$begin{equation} begin{aligned}
          T(mathbf{x}_n)
          &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
          &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
          &= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
          end{aligned} end{equation}$$



          Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:



          $$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$



          Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:



          $$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$



          Hence, the "ruin probability" you are looking for is:



          $$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$





          Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.



          To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.



          Define the test statistics up to the ends of these subsequences as:



          $$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$



          With a little algebra we can show that:



          $$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$



          Under this condition we have:



          $$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$



          Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:



          $$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$



          The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:



          $$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$






          share|cite|improve this answer





















          • You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
            – mdewey
            21 mins ago



















          0














          For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.



          Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.



          Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.



          So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.



          For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.






          share|cite|improve this answer





















            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "65"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384263%2flikelihood-of-rejecting-a-fair-coin-repeated-significance-testing%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.





            Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:



            $$begin{equation} begin{aligned}
            T(mathbf{x}_n)
            &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
            &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
            &= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
            end{aligned} end{equation}$$



            Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:



            $$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$



            Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:



            $$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$



            Hence, the "ruin probability" you are looking for is:



            $$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$





            Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.



            To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.



            Define the test statistics up to the ends of these subsequences as:



            $$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$



            With a little algebra we can show that:



            $$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$



            Under this condition we have:



            $$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$



            Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:



            $$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$



            The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:



            $$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$






            share|cite|improve this answer





















            • You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
              – mdewey
              21 mins ago
















            2














            This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.





            Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:



            $$begin{equation} begin{aligned}
            T(mathbf{x}_n)
            &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
            &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
            &= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
            end{aligned} end{equation}$$



            Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:



            $$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$



            Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:



            $$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$



            Hence, the "ruin probability" you are looking for is:



            $$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$





            Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.



            To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.



            Define the test statistics up to the ends of these subsequences as:



            $$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$



            With a little algebra we can show that:



            $$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$



            Under this condition we have:



            $$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$



            Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:



            $$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$



            The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:



            $$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$






            share|cite|improve this answer





















            • You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
              – mdewey
              21 mins ago














            2












            2








            2






            This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.





            Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:



            $$begin{equation} begin{aligned}
            T(mathbf{x}_n)
            &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
            &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
            &= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
            end{aligned} end{equation}$$



            Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:



            $$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$



            Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:



            $$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$



            Hence, the "ruin probability" you are looking for is:



            $$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$





            Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.



            To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.



            Define the test statistics up to the ends of these subsequences as:



            $$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$



            With a little algebra we can show that:



            $$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$



            Under this condition we have:



            $$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$



            Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:



            $$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$



            The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:



            $$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$






            share|cite|improve this answer












            This is a problem involving what are essentially "ruin probabilities" of a stochastic process. The fact that the tests go on forever is not sufficient to imply that the probability of eventual rejection is one. You would need to establish this formally via analysis of the ruin probability. It is also notable that the tests are not independent, since the test statistic at higher $n$ values is related to the test statistic at lower $n$ values. I will set up the problem for you below, and give you an outline of how you can prove your conjecture.





            Setting up the problem: You have a sequence $X_1, X_2, X_3 sim text{IID Bern}(theta)$ and your null hypothesis of a fair coin is $H_0: theta = tfrac{1}{2}$. After $n$ tosses the expected number of positive indicators is $mathbb{E}(n bar{X}) = tfrac{1}{2} n$, so the Pearson test statistic (under the null hypothesis) is:



            $$begin{equation} begin{aligned}
            T(mathbf{x}_n)
            &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(1-bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} Bigg] \[6pt]
            &= n Bigg[ frac{(bar{x}_n - tfrac{1}{2})^2}{tfrac{1}{2}} + frac{(tfrac{1}{2}-bar{x}_n)^2}{tfrac{1}{2}} Bigg] \[6pt]
            &= 4 n (bar{x}_n - tfrac{1}{2})^2. \[6pt]
            end{aligned} end{equation}$$



            Higher values of the test statistic are more conducive to the alternative hypothesis, and for large $n$ we have the asymptotic null distribution $T(mathbf{X}_n) sim text{ChiSq}(1)$. We define the critical point $chi_{1,alpha}^2$ by:



            $$alpha = int limits_{chi_{1,alpha}^2}^infty text{ChiSq}(r|1) dr.$$



            Then, assuming you use the chi-squared approximation for your p-value (rather than the exact distribution of the test statistic) we have the rejection region:



            $$text{Reject } H_0 quad quad quad iff quad quad quad 4n (bar{x}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2.$$



            Hence, the "ruin probability" you are looking for is:



            $$W(alpha) equiv mathbb{P} Big( (exists n in mathbb{N}): 4n (bar{X}_n - tfrac{1}{2})^2 > chi_{1, alpha}^2 Big).$$





            Establishing the result: You have conjectured that $W(alpha) = 1$ for all $0 < alpha < 1$. The rejection events in your sequence are not independent, which makes the problem tricky. Although the rejection events are not independent, it should be possible to form an infinite series of events which each have a positive lower bound on the probability of rejection, regardless of the previous data. We do this by splitting into subsequences and conditioning on a sample mean of one-half in the previous data.



            To prove your conjecture, define a sequence of positive integer values $m_1^*, m_2^*, m_3^*, ...$ so that we divide the sequence of tosses into finite subsequences of these lengths. Let $bar{X}_k^*$ be the sample mean corresponding to the $k$th subsequence, and note that these sample means are independent.



            Define the test statistics up to the ends of these subsequences as:



            $$T_k equiv T(mathbf{x}_{n_k}) = 4 n_k Bigg( frac{1}{n_k} sum_{i=1}^{n_k} x_i - frac{1}{2} Bigg)^2 quad quad quad quad n_k equiv sum_{i=1}^k m_i^*.$$



            With a little algebra we can show that:



            $$T_k = 0 quad quad quad implies quad quad quad T_{k+1} = 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 times frac{m_{k+1}^*}{n_k + m_{k+1}^*}.$$



            Under this condition we have:



            $$text{Reject } H_0 quad quad quad iff quad quad quad 4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2.$$



            Now, choose some value $0 < alpha^* < alpha$ so that $chi_{1, alpha}^2 < chi_{1, alpha^*}^2$. Choose each value $m_k in mathbb{N}$ sufficiently large so that you have:



            $$frac{n_k + m_{k+1}^*}{m_{k+1}^*} cdot chi_{1, alpha}^2 leqslant chi_{1, alpha^*}^2 quad quad text{for all } k = 0,1,2,....$$



            The condition $4 m_{k+1}^* (bar{x}_{k+1}^* - tfrac{1}{2})^2 > chi_{1, alpha^*}^2$ is sufficient to ensure rejection in the $(k+1)$th subsequence, even with the lowest possible evidence for rejection in the previous data. This establishes that the probability of rejection in any one of the subsequences is at least $alpha^*$, regardless of the presvious data. Hence, we have:



            $$W(alpha) geqslant 1 - prod_{i=1}^infty (1-alpha^*) = 1-0 = 1.$$







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered 3 hours ago









            Ben

            21.3k224101




            21.3k224101












            • You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
              – mdewey
              21 mins ago


















            • You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
              – mdewey
              21 mins ago
















            You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
            – mdewey
            21 mins ago




            You do not make explicit, although it is fairly obvious, that the subsequences are not overlapping.
            – mdewey
            21 mins ago













            0














            For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.



            Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.



            Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.



            So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.



            For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.






            share|cite|improve this answer


























              0














              For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.



              Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.



              Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.



              So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.



              For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.






              share|cite|improve this answer
























                0












                0








                0






                For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.



                Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.



                Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.



                So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.



                For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.






                share|cite|improve this answer












                For one test with a given $alpha$, you have a $p$ value and a confidence level $q=1-p$.



                Now, if you do $n$ independent test all with the same $p$ value, your confidence $q^n rightarrow 0$ as $n$ increases. So you are almost sure to fail the test if you repeat independent tests for a large enough $n$.



                Now a $chi^2$ test for $n+1$ coin toss is far from being independent from the test for the $n$ first tosses, so that the convergence of confidence to 0 has to be much slower than $q^n$. However, the first $100^{n+1}$ tosses are almost independent of the first $100^n$.



                So that a test with a fixed $p$-value repeated for arbitrarily large $n$ is almost sure to fail.



                For a small value of $n$ however, the successive pearson tests suggested will not be independent, but it is intuitive that their $p$-value is strictly lower than implied by $alpha$.







                share|cite|improve this answer












                share|cite|improve this answer



                share|cite|improve this answer










                answered 5 hours ago









                Sebapi

                416




                416






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Cross Validated!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384263%2flikelihood-of-rejecting-a-fair-coin-repeated-significance-testing%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Accessing regular linux commands in Huawei's Dopra Linux

                    Can't connect RFCOMM socket: Host is down

                    Kernel panic - not syncing: Fatal Exception in Interrupt