In linear regression, why should we include degree 2 variables when we only want interaction terms?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}






up vote
3
down vote

favorite












Suppose I am interested in a linear regression model, for $$Y_i = beta_0 + beta_1x_1 + beta_2x_2 + beta_3x_1x_2$$, because I would like to see if an interaction between the two covariates have an effect on Y.



In a professors' course notes (whom I do not have contact with), it states:
When including interaction terms, you should include their second degree terms. ie $$Y_i = beta_0 + beta_1x_1 + beta_2x_2 + beta_3x_1x_2 +beta_4x_1^2 + beta_5x_2^2$$ should be included in the regression.



Why should one include second degree terms when we are only interested in the interactions?










share|cite|improve this question









New contributor




Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 4




    If model has $x_1x_2$, it should include $x_1$ and $x_2$. But $x_1^2$ and $x_2^2$ are optional.
    – user158565
    2 hours ago






  • 3




    Your professor's opinion seems to be unusual. It might stem from a specialized background or set of experiences, because "should" is definitely not a universal requirement. You might find stats.stackexchange.com/questions/11009 to be of some interest.
    – whuber
    2 hours ago












  • @user158565 hi! May I ask why we should also include $x_1$ and $x_2$? I did not originally think of that, but now that you mentioned it..!
    – Kevin C
    2 hours ago












  • @whuber hi! Thanks for the link! I think including the main effect makes sense, but I have trouble extending that to having to include second order terms. // user158565 I think the link above answered that, thank you!
    – Kevin C
    2 hours ago










  • Would you please post a link to the data?
    – James Phillips
    1 hour ago

















up vote
3
down vote

favorite












Suppose I am interested in a linear regression model, for $$Y_i = beta_0 + beta_1x_1 + beta_2x_2 + beta_3x_1x_2$$, because I would like to see if an interaction between the two covariates have an effect on Y.



In a professors' course notes (whom I do not have contact with), it states:
When including interaction terms, you should include their second degree terms. ie $$Y_i = beta_0 + beta_1x_1 + beta_2x_2 + beta_3x_1x_2 +beta_4x_1^2 + beta_5x_2^2$$ should be included in the regression.



Why should one include second degree terms when we are only interested in the interactions?










share|cite|improve this question









New contributor




Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 4




    If model has $x_1x_2$, it should include $x_1$ and $x_2$. But $x_1^2$ and $x_2^2$ are optional.
    – user158565
    2 hours ago






  • 3




    Your professor's opinion seems to be unusual. It might stem from a specialized background or set of experiences, because "should" is definitely not a universal requirement. You might find stats.stackexchange.com/questions/11009 to be of some interest.
    – whuber
    2 hours ago












  • @user158565 hi! May I ask why we should also include $x_1$ and $x_2$? I did not originally think of that, but now that you mentioned it..!
    – Kevin C
    2 hours ago












  • @whuber hi! Thanks for the link! I think including the main effect makes sense, but I have trouble extending that to having to include second order terms. // user158565 I think the link above answered that, thank you!
    – Kevin C
    2 hours ago










  • Would you please post a link to the data?
    – James Phillips
    1 hour ago













up vote
3
down vote

favorite









up vote
3
down vote

favorite











Suppose I am interested in a linear regression model, for $$Y_i = beta_0 + beta_1x_1 + beta_2x_2 + beta_3x_1x_2$$, because I would like to see if an interaction between the two covariates have an effect on Y.



In a professors' course notes (whom I do not have contact with), it states:
When including interaction terms, you should include their second degree terms. ie $$Y_i = beta_0 + beta_1x_1 + beta_2x_2 + beta_3x_1x_2 +beta_4x_1^2 + beta_5x_2^2$$ should be included in the regression.



Why should one include second degree terms when we are only interested in the interactions?










share|cite|improve this question









New contributor




Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











Suppose I am interested in a linear regression model, for $$Y_i = beta_0 + beta_1x_1 + beta_2x_2 + beta_3x_1x_2$$, because I would like to see if an interaction between the two covariates have an effect on Y.



In a professors' course notes (whom I do not have contact with), it states:
When including interaction terms, you should include their second degree terms. ie $$Y_i = beta_0 + beta_1x_1 + beta_2x_2 + beta_3x_1x_2 +beta_4x_1^2 + beta_5x_2^2$$ should be included in the regression.



Why should one include second degree terms when we are only interested in the interactions?







regression multiple-regression interaction linear-model






share|cite|improve this question









New contributor




Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 1 hour ago





















New contributor




Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 hours ago









Kevin C

213




213




New contributor




Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Kevin C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 4




    If model has $x_1x_2$, it should include $x_1$ and $x_2$. But $x_1^2$ and $x_2^2$ are optional.
    – user158565
    2 hours ago






  • 3




    Your professor's opinion seems to be unusual. It might stem from a specialized background or set of experiences, because "should" is definitely not a universal requirement. You might find stats.stackexchange.com/questions/11009 to be of some interest.
    – whuber
    2 hours ago












  • @user158565 hi! May I ask why we should also include $x_1$ and $x_2$? I did not originally think of that, but now that you mentioned it..!
    – Kevin C
    2 hours ago












  • @whuber hi! Thanks for the link! I think including the main effect makes sense, but I have trouble extending that to having to include second order terms. // user158565 I think the link above answered that, thank you!
    – Kevin C
    2 hours ago










  • Would you please post a link to the data?
    – James Phillips
    1 hour ago














  • 4




    If model has $x_1x_2$, it should include $x_1$ and $x_2$. But $x_1^2$ and $x_2^2$ are optional.
    – user158565
    2 hours ago






  • 3




    Your professor's opinion seems to be unusual. It might stem from a specialized background or set of experiences, because "should" is definitely not a universal requirement. You might find stats.stackexchange.com/questions/11009 to be of some interest.
    – whuber
    2 hours ago












  • @user158565 hi! May I ask why we should also include $x_1$ and $x_2$? I did not originally think of that, but now that you mentioned it..!
    – Kevin C
    2 hours ago












  • @whuber hi! Thanks for the link! I think including the main effect makes sense, but I have trouble extending that to having to include second order terms. // user158565 I think the link above answered that, thank you!
    – Kevin C
    2 hours ago










  • Would you please post a link to the data?
    – James Phillips
    1 hour ago








4




4




If model has $x_1x_2$, it should include $x_1$ and $x_2$. But $x_1^2$ and $x_2^2$ are optional.
– user158565
2 hours ago




If model has $x_1x_2$, it should include $x_1$ and $x_2$. But $x_1^2$ and $x_2^2$ are optional.
– user158565
2 hours ago




3




3




Your professor's opinion seems to be unusual. It might stem from a specialized background or set of experiences, because "should" is definitely not a universal requirement. You might find stats.stackexchange.com/questions/11009 to be of some interest.
– whuber
2 hours ago






Your professor's opinion seems to be unusual. It might stem from a specialized background or set of experiences, because "should" is definitely not a universal requirement. You might find stats.stackexchange.com/questions/11009 to be of some interest.
– whuber
2 hours ago














@user158565 hi! May I ask why we should also include $x_1$ and $x_2$? I did not originally think of that, but now that you mentioned it..!
– Kevin C
2 hours ago






@user158565 hi! May I ask why we should also include $x_1$ and $x_2$? I did not originally think of that, but now that you mentioned it..!
– Kevin C
2 hours ago














@whuber hi! Thanks for the link! I think including the main effect makes sense, but I have trouble extending that to having to include second order terms. // user158565 I think the link above answered that, thank you!
– Kevin C
2 hours ago




@whuber hi! Thanks for the link! I think including the main effect makes sense, but I have trouble extending that to having to include second order terms. // user158565 I think the link above answered that, thank you!
– Kevin C
2 hours ago












Would you please post a link to the data?
– James Phillips
1 hour ago




Would you please post a link to the data?
– James Phillips
1 hour ago










2 Answers
2






active

oldest

votes

















up vote
1
down vote













It depends on the goal of inference. If you want to make inference of whether there exists an interaction in a causal context, for instance, this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.



Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't -- it's just a bias due to the omission of the squared term of $x_1$.



rm(list = ls())
set.seed(10)
n <- 1e3
x1 <- rnorm(n)
x2 <- x1 + rnorm(n)
y <- x1 + x2 + x1^2 + rnorm(n)
summary(lm(y ~ x1 + x2 + x1:x2))

Call:
lm(formula = y ~ x1 + x2 + x1:x2)

Residuals:
Min 1Q Median 3Q Max
-3.7781 -0.8326 -0.0806 0.7598 7.7929

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.30116 0.04813 6.257 5.81e-10 ***
x1 1.03142 0.05888 17.519 < 2e-16 ***
x2 1.01806 0.03971 25.638 < 2e-16 ***
x1:x2 0.63939 0.02390 26.757 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.308 on 996 degrees of freedom
Multiple R-squared: 0.7935, Adjusted R-squared: 0.7929
F-statistic: 1276 on 3 and 996 DF, p-value: < 2.2e-16


Now, if you go back and include the squared term in your regression, the apparent interaction disappears.



summary(lm(y ~ x1 + x2 + x1:x2 + I(x1^2)))   

Call:
lm(formula = y ~ x1 + x2 + x1:x2 + I(x1^2))

Residuals:
Min 1Q Median 3Q Max
-3.4574 -0.7073 0.0228 0.6723 3.7135

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0419958 0.0398423 -1.054 0.292
x1 1.0296642 0.0458586 22.453 <2e-16 ***
x2 1.0017625 0.0309367 32.381 <2e-16 ***
I(x1^2) 1.0196002 0.0400940 25.430 <2e-16 ***
x1:x2 -0.0006889 0.0313045 -0.022 0.982
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.019 on 995 degrees of freedom
Multiple R-squared: 0.8748, Adjusted R-squared: 0.8743
F-statistic: 1739 on 4 and 995 DF, p-value: < 2.2e-16


Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.






share|cite|improve this answer






























    up vote
    0
    down vote













    The two models you listed in your answer can be re-expressed to make it clear how the effect of $X_1$ is postulated to depend on $X_2$ (or the other way around) in each model.



    The first model can be re-expressed like this:



    $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_2X_2+ epsilon,$$



    which shows that, in this model, $X1$ is assumed to have a linear effect on $Y$ (controlling for the effect of $X_2$) but the the magnitude of this linear effect - captured by the slope coefficient of $X_1$ - changes linearly as a function of $X_2$. For example, the effect of $X_1$ on $Y$ may increase in magnitude as the values of $X_2$ increase.



    The second model can be re-expressed like this:



    $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_4 X_1^2 + beta_2X_2 +beta_5X_2^2 + epsilon,$$



    which shows that, in this model, the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$) is assumed to be quadratic rather than linear. This quadratic effect is captured by including both $X_1$ and $X_1^2$ in the model. While the coefficient of $X_1^2$ is assumed to be independent of $X_2$, the coefficient of $X_1$ is assumed to depend linearly on $X_2$.



    Using either model would imply that you are making entirely different assumptions about the nature of the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$).



    Usually, people fit the first model. They might then plot the residuals from that model against $X_1$ and $X_2$ in turns. If the residuals reveal a quadratic pattern in the residuals as a function of $X_1$ and/or $X_2$, the model can be augmented accordingly so that it includes $X_1^2$ and/or $X_2^2$ (and possibly their interaction).



    Note that I simplified the notation you used for consistency and also made ther error term explicit in both models.






    share|cite|improve this answer























      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "65"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      Kevin C is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f379841%2fin-linear-regression-why-should-we-include-degree-2-variables-when-we-only-want%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      It depends on the goal of inference. If you want to make inference of whether there exists an interaction in a causal context, for instance, this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.



      Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't -- it's just a bias due to the omission of the squared term of $x_1$.



      rm(list = ls())
      set.seed(10)
      n <- 1e3
      x1 <- rnorm(n)
      x2 <- x1 + rnorm(n)
      y <- x1 + x2 + x1^2 + rnorm(n)
      summary(lm(y ~ x1 + x2 + x1:x2))

      Call:
      lm(formula = y ~ x1 + x2 + x1:x2)

      Residuals:
      Min 1Q Median 3Q Max
      -3.7781 -0.8326 -0.0806 0.7598 7.7929

      Coefficients:
      Estimate Std. Error t value Pr(>|t|)
      (Intercept) 0.30116 0.04813 6.257 5.81e-10 ***
      x1 1.03142 0.05888 17.519 < 2e-16 ***
      x2 1.01806 0.03971 25.638 < 2e-16 ***
      x1:x2 0.63939 0.02390 26.757 < 2e-16 ***
      ---
      Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

      Residual standard error: 1.308 on 996 degrees of freedom
      Multiple R-squared: 0.7935, Adjusted R-squared: 0.7929
      F-statistic: 1276 on 3 and 996 DF, p-value: < 2.2e-16


      Now, if you go back and include the squared term in your regression, the apparent interaction disappears.



      summary(lm(y ~ x1 + x2 + x1:x2 + I(x1^2)))   

      Call:
      lm(formula = y ~ x1 + x2 + x1:x2 + I(x1^2))

      Residuals:
      Min 1Q Median 3Q Max
      -3.4574 -0.7073 0.0228 0.6723 3.7135

      Coefficients:
      Estimate Std. Error t value Pr(>|t|)
      (Intercept) -0.0419958 0.0398423 -1.054 0.292
      x1 1.0296642 0.0458586 22.453 <2e-16 ***
      x2 1.0017625 0.0309367 32.381 <2e-16 ***
      I(x1^2) 1.0196002 0.0400940 25.430 <2e-16 ***
      x1:x2 -0.0006889 0.0313045 -0.022 0.982
      ---
      Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

      Residual standard error: 1.019 on 995 degrees of freedom
      Multiple R-squared: 0.8748, Adjusted R-squared: 0.8743
      F-statistic: 1739 on 4 and 995 DF, p-value: < 2.2e-16


      Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.






      share|cite|improve this answer



























        up vote
        1
        down vote













        It depends on the goal of inference. If you want to make inference of whether there exists an interaction in a causal context, for instance, this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.



        Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't -- it's just a bias due to the omission of the squared term of $x_1$.



        rm(list = ls())
        set.seed(10)
        n <- 1e3
        x1 <- rnorm(n)
        x2 <- x1 + rnorm(n)
        y <- x1 + x2 + x1^2 + rnorm(n)
        summary(lm(y ~ x1 + x2 + x1:x2))

        Call:
        lm(formula = y ~ x1 + x2 + x1:x2)

        Residuals:
        Min 1Q Median 3Q Max
        -3.7781 -0.8326 -0.0806 0.7598 7.7929

        Coefficients:
        Estimate Std. Error t value Pr(>|t|)
        (Intercept) 0.30116 0.04813 6.257 5.81e-10 ***
        x1 1.03142 0.05888 17.519 < 2e-16 ***
        x2 1.01806 0.03971 25.638 < 2e-16 ***
        x1:x2 0.63939 0.02390 26.757 < 2e-16 ***
        ---
        Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

        Residual standard error: 1.308 on 996 degrees of freedom
        Multiple R-squared: 0.7935, Adjusted R-squared: 0.7929
        F-statistic: 1276 on 3 and 996 DF, p-value: < 2.2e-16


        Now, if you go back and include the squared term in your regression, the apparent interaction disappears.



        summary(lm(y ~ x1 + x2 + x1:x2 + I(x1^2)))   

        Call:
        lm(formula = y ~ x1 + x2 + x1:x2 + I(x1^2))

        Residuals:
        Min 1Q Median 3Q Max
        -3.4574 -0.7073 0.0228 0.6723 3.7135

        Coefficients:
        Estimate Std. Error t value Pr(>|t|)
        (Intercept) -0.0419958 0.0398423 -1.054 0.292
        x1 1.0296642 0.0458586 22.453 <2e-16 ***
        x2 1.0017625 0.0309367 32.381 <2e-16 ***
        I(x1^2) 1.0196002 0.0400940 25.430 <2e-16 ***
        x1:x2 -0.0006889 0.0313045 -0.022 0.982
        ---
        Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

        Residual standard error: 1.019 on 995 degrees of freedom
        Multiple R-squared: 0.8748, Adjusted R-squared: 0.8743
        F-statistic: 1739 on 4 and 995 DF, p-value: < 2.2e-16


        Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.






        share|cite|improve this answer

























          up vote
          1
          down vote










          up vote
          1
          down vote









          It depends on the goal of inference. If you want to make inference of whether there exists an interaction in a causal context, for instance, this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.



          Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't -- it's just a bias due to the omission of the squared term of $x_1$.



          rm(list = ls())
          set.seed(10)
          n <- 1e3
          x1 <- rnorm(n)
          x2 <- x1 + rnorm(n)
          y <- x1 + x2 + x1^2 + rnorm(n)
          summary(lm(y ~ x1 + x2 + x1:x2))

          Call:
          lm(formula = y ~ x1 + x2 + x1:x2)

          Residuals:
          Min 1Q Median 3Q Max
          -3.7781 -0.8326 -0.0806 0.7598 7.7929

          Coefficients:
          Estimate Std. Error t value Pr(>|t|)
          (Intercept) 0.30116 0.04813 6.257 5.81e-10 ***
          x1 1.03142 0.05888 17.519 < 2e-16 ***
          x2 1.01806 0.03971 25.638 < 2e-16 ***
          x1:x2 0.63939 0.02390 26.757 < 2e-16 ***
          ---
          Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

          Residual standard error: 1.308 on 996 degrees of freedom
          Multiple R-squared: 0.7935, Adjusted R-squared: 0.7929
          F-statistic: 1276 on 3 and 996 DF, p-value: < 2.2e-16


          Now, if you go back and include the squared term in your regression, the apparent interaction disappears.



          summary(lm(y ~ x1 + x2 + x1:x2 + I(x1^2)))   

          Call:
          lm(formula = y ~ x1 + x2 + x1:x2 + I(x1^2))

          Residuals:
          Min 1Q Median 3Q Max
          -3.4574 -0.7073 0.0228 0.6723 3.7135

          Coefficients:
          Estimate Std. Error t value Pr(>|t|)
          (Intercept) -0.0419958 0.0398423 -1.054 0.292
          x1 1.0296642 0.0458586 22.453 <2e-16 ***
          x2 1.0017625 0.0309367 32.381 <2e-16 ***
          I(x1^2) 1.0196002 0.0400940 25.430 <2e-16 ***
          x1:x2 -0.0006889 0.0313045 -0.022 0.982
          ---
          Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

          Residual standard error: 1.019 on 995 degrees of freedom
          Multiple R-squared: 0.8748, Adjusted R-squared: 0.8743
          F-statistic: 1739 on 4 and 995 DF, p-value: < 2.2e-16


          Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.






          share|cite|improve this answer














          It depends on the goal of inference. If you want to make inference of whether there exists an interaction in a causal context, for instance, this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.



          Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't -- it's just a bias due to the omission of the squared term of $x_1$.



          rm(list = ls())
          set.seed(10)
          n <- 1e3
          x1 <- rnorm(n)
          x2 <- x1 + rnorm(n)
          y <- x1 + x2 + x1^2 + rnorm(n)
          summary(lm(y ~ x1 + x2 + x1:x2))

          Call:
          lm(formula = y ~ x1 + x2 + x1:x2)

          Residuals:
          Min 1Q Median 3Q Max
          -3.7781 -0.8326 -0.0806 0.7598 7.7929

          Coefficients:
          Estimate Std. Error t value Pr(>|t|)
          (Intercept) 0.30116 0.04813 6.257 5.81e-10 ***
          x1 1.03142 0.05888 17.519 < 2e-16 ***
          x2 1.01806 0.03971 25.638 < 2e-16 ***
          x1:x2 0.63939 0.02390 26.757 < 2e-16 ***
          ---
          Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

          Residual standard error: 1.308 on 996 degrees of freedom
          Multiple R-squared: 0.7935, Adjusted R-squared: 0.7929
          F-statistic: 1276 on 3 and 996 DF, p-value: < 2.2e-16


          Now, if you go back and include the squared term in your regression, the apparent interaction disappears.



          summary(lm(y ~ x1 + x2 + x1:x2 + I(x1^2)))   

          Call:
          lm(formula = y ~ x1 + x2 + x1:x2 + I(x1^2))

          Residuals:
          Min 1Q Median 3Q Max
          -3.4574 -0.7073 0.0228 0.6723 3.7135

          Coefficients:
          Estimate Std. Error t value Pr(>|t|)
          (Intercept) -0.0419958 0.0398423 -1.054 0.292
          x1 1.0296642 0.0458586 22.453 <2e-16 ***
          x2 1.0017625 0.0309367 32.381 <2e-16 ***
          I(x1^2) 1.0196002 0.0400940 25.430 <2e-16 ***
          x1:x2 -0.0006889 0.0313045 -0.022 0.982
          ---
          Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

          Residual standard error: 1.019 on 995 degrees of freedom
          Multiple R-squared: 0.8748, Adjusted R-squared: 0.8743
          F-statistic: 1739 on 4 and 995 DF, p-value: < 2.2e-16


          Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 44 mins ago

























          answered 1 hour ago









          Carlos Cinelli

          5,19542146




          5,19542146
























              up vote
              0
              down vote













              The two models you listed in your answer can be re-expressed to make it clear how the effect of $X_1$ is postulated to depend on $X_2$ (or the other way around) in each model.



              The first model can be re-expressed like this:



              $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_2X_2+ epsilon,$$



              which shows that, in this model, $X1$ is assumed to have a linear effect on $Y$ (controlling for the effect of $X_2$) but the the magnitude of this linear effect - captured by the slope coefficient of $X_1$ - changes linearly as a function of $X_2$. For example, the effect of $X_1$ on $Y$ may increase in magnitude as the values of $X_2$ increase.



              The second model can be re-expressed like this:



              $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_4 X_1^2 + beta_2X_2 +beta_5X_2^2 + epsilon,$$



              which shows that, in this model, the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$) is assumed to be quadratic rather than linear. This quadratic effect is captured by including both $X_1$ and $X_1^2$ in the model. While the coefficient of $X_1^2$ is assumed to be independent of $X_2$, the coefficient of $X_1$ is assumed to depend linearly on $X_2$.



              Using either model would imply that you are making entirely different assumptions about the nature of the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$).



              Usually, people fit the first model. They might then plot the residuals from that model against $X_1$ and $X_2$ in turns. If the residuals reveal a quadratic pattern in the residuals as a function of $X_1$ and/or $X_2$, the model can be augmented accordingly so that it includes $X_1^2$ and/or $X_2^2$ (and possibly their interaction).



              Note that I simplified the notation you used for consistency and also made ther error term explicit in both models.






              share|cite|improve this answer



























                up vote
                0
                down vote













                The two models you listed in your answer can be re-expressed to make it clear how the effect of $X_1$ is postulated to depend on $X_2$ (or the other way around) in each model.



                The first model can be re-expressed like this:



                $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_2X_2+ epsilon,$$



                which shows that, in this model, $X1$ is assumed to have a linear effect on $Y$ (controlling for the effect of $X_2$) but the the magnitude of this linear effect - captured by the slope coefficient of $X_1$ - changes linearly as a function of $X_2$. For example, the effect of $X_1$ on $Y$ may increase in magnitude as the values of $X_2$ increase.



                The second model can be re-expressed like this:



                $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_4 X_1^2 + beta_2X_2 +beta_5X_2^2 + epsilon,$$



                which shows that, in this model, the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$) is assumed to be quadratic rather than linear. This quadratic effect is captured by including both $X_1$ and $X_1^2$ in the model. While the coefficient of $X_1^2$ is assumed to be independent of $X_2$, the coefficient of $X_1$ is assumed to depend linearly on $X_2$.



                Using either model would imply that you are making entirely different assumptions about the nature of the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$).



                Usually, people fit the first model. They might then plot the residuals from that model against $X_1$ and $X_2$ in turns. If the residuals reveal a quadratic pattern in the residuals as a function of $X_1$ and/or $X_2$, the model can be augmented accordingly so that it includes $X_1^2$ and/or $X_2^2$ (and possibly their interaction).



                Note that I simplified the notation you used for consistency and also made ther error term explicit in both models.






                share|cite|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  The two models you listed in your answer can be re-expressed to make it clear how the effect of $X_1$ is postulated to depend on $X_2$ (or the other way around) in each model.



                  The first model can be re-expressed like this:



                  $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_2X_2+ epsilon,$$



                  which shows that, in this model, $X1$ is assumed to have a linear effect on $Y$ (controlling for the effect of $X_2$) but the the magnitude of this linear effect - captured by the slope coefficient of $X_1$ - changes linearly as a function of $X_2$. For example, the effect of $X_1$ on $Y$ may increase in magnitude as the values of $X_2$ increase.



                  The second model can be re-expressed like this:



                  $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_4 X_1^2 + beta_2X_2 +beta_5X_2^2 + epsilon,$$



                  which shows that, in this model, the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$) is assumed to be quadratic rather than linear. This quadratic effect is captured by including both $X_1$ and $X_1^2$ in the model. While the coefficient of $X_1^2$ is assumed to be independent of $X_2$, the coefficient of $X_1$ is assumed to depend linearly on $X_2$.



                  Using either model would imply that you are making entirely different assumptions about the nature of the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$).



                  Usually, people fit the first model. They might then plot the residuals from that model against $X_1$ and $X_2$ in turns. If the residuals reveal a quadratic pattern in the residuals as a function of $X_1$ and/or $X_2$, the model can be augmented accordingly so that it includes $X_1^2$ and/or $X_2^2$ (and possibly their interaction).



                  Note that I simplified the notation you used for consistency and also made ther error term explicit in both models.






                  share|cite|improve this answer














                  The two models you listed in your answer can be re-expressed to make it clear how the effect of $X_1$ is postulated to depend on $X_2$ (or the other way around) in each model.



                  The first model can be re-expressed like this:



                  $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_2X_2+ epsilon,$$



                  which shows that, in this model, $X1$ is assumed to have a linear effect on $Y$ (controlling for the effect of $X_2$) but the the magnitude of this linear effect - captured by the slope coefficient of $X_1$ - changes linearly as a function of $X_2$. For example, the effect of $X_1$ on $Y$ may increase in magnitude as the values of $X_2$ increase.



                  The second model can be re-expressed like this:



                  $$Y = beta_0 + (beta_1 + beta_3X_2)X_1 + beta_4 X_1^2 + beta_2X_2 +beta_5X_2^2 + epsilon,$$



                  which shows that, in this model, the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$) is assumed to be quadratic rather than linear. This quadratic effect is captured by including both $X_1$ and $X_1^2$ in the model. While the coefficient of $X_1^2$ is assumed to be independent of $X_2$, the coefficient of $X_1$ is assumed to depend linearly on $X_2$.



                  Using either model would imply that you are making entirely different assumptions about the nature of the effect of $X_1$ on $Y$ (controlling for the effect of $X_2$).



                  Usually, people fit the first model. They might then plot the residuals from that model against $X_1$ and $X_2$ in turns. If the residuals reveal a quadratic pattern in the residuals as a function of $X_1$ and/or $X_2$, the model can be augmented accordingly so that it includes $X_1^2$ and/or $X_2^2$ (and possibly their interaction).



                  Note that I simplified the notation you used for consistency and also made ther error term explicit in both models.







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited 58 mins ago

























                  answered 1 hour ago









                  Isabella Ghement

                  5,464319




                  5,464319






















                      Kevin C is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      Kevin C is a new contributor. Be nice, and check out our Code of Conduct.













                      Kevin C is a new contributor. Be nice, and check out our Code of Conduct.












                      Kevin C is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Cross Validated!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f379841%2fin-linear-regression-why-should-we-include-degree-2-variables-when-we-only-want%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Accessing regular linux commands in Huawei's Dopra Linux

                      Can't connect RFCOMM socket: Host is down

                      Kernel panic - not syncing: Fatal Exception in Interrupt