Weird: terminating background bash kills the parent?












4














I apologize if the answer is really simple. Logged into a Linux server, I was practicing different job control builtins and I arrived at the suspend command. Being curious, I did the first thing anyone would do: type "suspend" and see what happens.



user@server:~$ suspend
-bash: suspend: cannot suspend a login shell


So then I created a subshell, and tried to suspend it:



user@server:~$ bash
user@server:~$ suspend

[1]+ Stopped bash
user@server:~$


This was fine. Or so I thought! Being satisfied that the suspend command worked, I decided to end that subshell:



user@server:~$ kill %1

[1]+ Stopped bash
user@server:~$ user@server:~$


Strange, I thought. Ignoring the fact that I failed to actually terminate that subshell, I got two prompts on that line. So I hit enter to get a tidier prompt, and:



user@server:~$ user@server:~$ logout
user@server:~$ Connection to server closed.
user@client:~$


This comes as a surprise. It also works with a local terminal, it does not require being connected to a remote server. A local terminal will go back to the login prompt. A terminal in a desktop session will close.



So how does trying to kill a backgrounded subshell result in the parent dying?










share|improve this question



























    4














    I apologize if the answer is really simple. Logged into a Linux server, I was practicing different job control builtins and I arrived at the suspend command. Being curious, I did the first thing anyone would do: type "suspend" and see what happens.



    user@server:~$ suspend
    -bash: suspend: cannot suspend a login shell


    So then I created a subshell, and tried to suspend it:



    user@server:~$ bash
    user@server:~$ suspend

    [1]+ Stopped bash
    user@server:~$


    This was fine. Or so I thought! Being satisfied that the suspend command worked, I decided to end that subshell:



    user@server:~$ kill %1

    [1]+ Stopped bash
    user@server:~$ user@server:~$


    Strange, I thought. Ignoring the fact that I failed to actually terminate that subshell, I got two prompts on that line. So I hit enter to get a tidier prompt, and:



    user@server:~$ user@server:~$ logout
    user@server:~$ Connection to server closed.
    user@client:~$


    This comes as a surprise. It also works with a local terminal, it does not require being connected to a remote server. A local terminal will go back to the login prompt. A terminal in a desktop session will close.



    So how does trying to kill a backgrounded subshell result in the parent dying?










    share|improve this question

























      4












      4








      4







      I apologize if the answer is really simple. Logged into a Linux server, I was practicing different job control builtins and I arrived at the suspend command. Being curious, I did the first thing anyone would do: type "suspend" and see what happens.



      user@server:~$ suspend
      -bash: suspend: cannot suspend a login shell


      So then I created a subshell, and tried to suspend it:



      user@server:~$ bash
      user@server:~$ suspend

      [1]+ Stopped bash
      user@server:~$


      This was fine. Or so I thought! Being satisfied that the suspend command worked, I decided to end that subshell:



      user@server:~$ kill %1

      [1]+ Stopped bash
      user@server:~$ user@server:~$


      Strange, I thought. Ignoring the fact that I failed to actually terminate that subshell, I got two prompts on that line. So I hit enter to get a tidier prompt, and:



      user@server:~$ user@server:~$ logout
      user@server:~$ Connection to server closed.
      user@client:~$


      This comes as a surprise. It also works with a local terminal, it does not require being connected to a remote server. A local terminal will go back to the login prompt. A terminal in a desktop session will close.



      So how does trying to kill a backgrounded subshell result in the parent dying?










      share|improve this question













      I apologize if the answer is really simple. Logged into a Linux server, I was practicing different job control builtins and I arrived at the suspend command. Being curious, I did the first thing anyone would do: type "suspend" and see what happens.



      user@server:~$ suspend
      -bash: suspend: cannot suspend a login shell


      So then I created a subshell, and tried to suspend it:



      user@server:~$ bash
      user@server:~$ suspend

      [1]+ Stopped bash
      user@server:~$


      This was fine. Or so I thought! Being satisfied that the suspend command worked, I decided to end that subshell:



      user@server:~$ kill %1

      [1]+ Stopped bash
      user@server:~$ user@server:~$


      Strange, I thought. Ignoring the fact that I failed to actually terminate that subshell, I got two prompts on that line. So I hit enter to get a tidier prompt, and:



      user@server:~$ user@server:~$ logout
      user@server:~$ Connection to server closed.
      user@client:~$


      This comes as a surprise. It also works with a local terminal, it does not require being connected to a remote server. A local terminal will go back to the login prompt. A terminal in a desktop session will close.



      So how does trying to kill a backgrounded subshell result in the parent dying?







      bash






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jul 19 '17 at 10:57









      ajhlinuxuser

      373




      373






















          2 Answers
          2






          active

          oldest

          votes


















          2














          I can reproduce it in a Ubuntu 16 like this:




          • Create a new Gnome Terminal window.


          • Run a child bash; then suspend


          • kill %1



          The window dies. UPDATE: if we use kill -KILL this doesn't reproduce!



          TL; DR:




          My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the SIGTERM, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking the SIGTTIN signal and so its TTY read receives an EIO, and it bails. A bash which has suspended itself with suspend should not assert itself into the foreground when it resumes executing due to a fatal signal.




          To obtain more information, I attached strace -f -p <pid> to the parent shell to see the system calls.



          It looks like it may be exiting because, for some reason, it receives a -1 return from a read of the standard input, with errno being EIO: in other words, I/O error on standard input.



          Here is the tail end of the strace log: PID 18860 is the parent, 18910 is the child:



          Epilog of child exiting:



          18910 exit_group(0)                     = ?
          18910 +++ exited with 0 +++


          Parent's TTY read is interrupted in a restartable way by the SIGCHLD:



          18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
          18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---


          Parent's signal handling calls wait4 to collect child:



          18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
          18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)


          Parent executes return to the kernel from signal handler:



          18860 rt_sigreturn({mask=})           = 0


          And now comes the strange kicker, what the heck? The resumed read bails with I/O error:



          18860 read(0, 0x7ffe891c6717, 1)        = -1 EIO (Input/output error)


          And the parent begins to exit:



          18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
          18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
          18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
          [ ... ]
          18860 write(2, "exitn", 5) = 5
          18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
          18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
          18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
          18860 write(3, "echo $$nbashnkill %1n", 21) = 21
          18860 close(3) = 0
          [ ... ]
          etc.


          It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.



          So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1) then it doesn't repro, suggesting that the child bash takes some steps which put the TTY into a state whereby it generates -1/EIO.



          It does look like the kernel may be implicated in this as a possible root cause.



          Also, I tried this a few more times. Sometimes the ioctl(0, ...) calls that the parent issues when exiting also fail with -1/EIO; sometimes they don't.



          In the kernel, tty_read can bail with EIO for a few reasons. The next step would be to add some printk debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:



          static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
          loff_t *ppos)
          {
          int i;
          struct inode *inode = file_inode(file);
          struct tty_struct *tty = file_tty(file);
          struct tty_ldisc *ld;

          if (tty_paranoia_check(tty, inode, "tty_read"))
          return -EIO;
          if (!tty || tty_io_error(tty))
          return -EIO;

          /* We want to wait for the line discipline to sort out in this
          situation */
          ld = tty_ldisc_ref_wait(tty);
          if (!ld)
          return hung_up_tty_read(file, buf, count, ppos);
          if (ld->ops->read)
          i = ld->ops->read(tty, file, buf, count);
          else
          i = -EIO;
          tty_ldisc_deref(ld);

          if (i > 0)
          tty_update_time(&inode->i_atime);

          return i;
          }


          It's almost certainly not due to the line discipline not having a read function (the last EIO). Either a failed paranoia check, or tty being null, or the tty_io_error being true.



          It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty pointer being null. tty being null for some reason can't be ruled out.



          tty_io_error tests a flag in the TTY structure:



          static inline bool tty_io_error(struct tty_struct *tty)
          {
          return test_bit(TTY_IO_ERROR, &tty->flags);
          }


          If that got set somehow, we would have a persistent EIO return from read attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.



          So perhaps the ld->ops->read(tty, file, buf, count); line discipline operation is returning -EIO. The TTY should be in the POSIX line discipline at all times here numbered N_TTY. I see the file name hasn't changed in twenty years; it is still in n_tty.c. We want n_tty_read



          This has only one EIO situation:



                      if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
          retval = -EIO;
          break;
          }


          That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.



          Ah, but look what happens on entry into n_tty_read:



          c = job_control(tty, file);
          if (c < 0)
          return c;


          Here is where I strongly suspect the "smoking gun" may be. This code has EIO returns and has to do with job control. This ends up in the following function, with the sig argument being SIGTTIN.



          int __tty_check_change(struct tty_struct *tty, int sig)
          {
          unsigned long flags;
          struct pid *pgrp, *tty_pgrp;
          int ret = 0;

          if (current->signal->tty != tty)
          return 0;

          rcu_read_lock();
          pgrp = task_pgrp(current);

          spin_lock_irqsave(&tty->ctrl_lock, flags);
          tty_pgrp = tty->pgrp;
          spin_unlock_irqrestore(&tty->ctrl_lock, flags);

          if (tty_pgrp && pgrp != tty->pgrp) {
          if (is_ignored(sig)) {
          if (sig == SIGTTIN)
          ret = -EIO;
          } else if (is_current_pgrp_orphaned())
          ret = -EIO;
          else {
          kill_pgrp(pgrp, sig, 1);
          set_thread_flag(TIF_SIGPENDING);
          ret = -ERESTARTSYS;
          }
          }
          rcu_read_unlock();

          if (!tty_pgrp)
          tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);

          return ret;
          }


          Here, there are two conditions for EIO. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN signal.



          This precisely conforms to POSIX (Issue 7, 2016) which says:




          Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]




          The thing is, we don't expect the parent shell to become orphaned.



          Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?



          Indeed, what I'm seeing in one of my strace logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.






          share|improve this answer























          • Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago



















          2














          It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu).




          • It doesn't need suspend as it occurs also after kill -STOP bash_pid.

          • It doesn't occur if you kill -9 %1 instead of kill %1.

          • It doesn't occur if you kill pid instead of kill %1.

          • It doesn't occur if the subprocess is something else than bash (try dash or sleep 999). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONT sleep 999 in this case, but it apparently does.

          • It doesn't occur in other shells (including dash executing a dash subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (the ps uw consistently shows the subprocess in state T). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.






          share|improve this answer























          • It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago












          • @Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
            – kubanczyk
            39 mins ago












          • It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
            – Isaac
            33 mins ago










          • You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
            – kubanczyk
            23 mins ago











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f379507%2fweird-terminating-background-bash-kills-the-parent%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          I can reproduce it in a Ubuntu 16 like this:




          • Create a new Gnome Terminal window.


          • Run a child bash; then suspend


          • kill %1



          The window dies. UPDATE: if we use kill -KILL this doesn't reproduce!



          TL; DR:




          My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the SIGTERM, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking the SIGTTIN signal and so its TTY read receives an EIO, and it bails. A bash which has suspended itself with suspend should not assert itself into the foreground when it resumes executing due to a fatal signal.




          To obtain more information, I attached strace -f -p <pid> to the parent shell to see the system calls.



          It looks like it may be exiting because, for some reason, it receives a -1 return from a read of the standard input, with errno being EIO: in other words, I/O error on standard input.



          Here is the tail end of the strace log: PID 18860 is the parent, 18910 is the child:



          Epilog of child exiting:



          18910 exit_group(0)                     = ?
          18910 +++ exited with 0 +++


          Parent's TTY read is interrupted in a restartable way by the SIGCHLD:



          18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
          18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---


          Parent's signal handling calls wait4 to collect child:



          18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
          18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)


          Parent executes return to the kernel from signal handler:



          18860 rt_sigreturn({mask=})           = 0


          And now comes the strange kicker, what the heck? The resumed read bails with I/O error:



          18860 read(0, 0x7ffe891c6717, 1)        = -1 EIO (Input/output error)


          And the parent begins to exit:



          18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
          18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
          18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
          [ ... ]
          18860 write(2, "exitn", 5) = 5
          18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
          18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
          18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
          18860 write(3, "echo $$nbashnkill %1n", 21) = 21
          18860 close(3) = 0
          [ ... ]
          etc.


          It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.



          So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1) then it doesn't repro, suggesting that the child bash takes some steps which put the TTY into a state whereby it generates -1/EIO.



          It does look like the kernel may be implicated in this as a possible root cause.



          Also, I tried this a few more times. Sometimes the ioctl(0, ...) calls that the parent issues when exiting also fail with -1/EIO; sometimes they don't.



          In the kernel, tty_read can bail with EIO for a few reasons. The next step would be to add some printk debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:



          static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
          loff_t *ppos)
          {
          int i;
          struct inode *inode = file_inode(file);
          struct tty_struct *tty = file_tty(file);
          struct tty_ldisc *ld;

          if (tty_paranoia_check(tty, inode, "tty_read"))
          return -EIO;
          if (!tty || tty_io_error(tty))
          return -EIO;

          /* We want to wait for the line discipline to sort out in this
          situation */
          ld = tty_ldisc_ref_wait(tty);
          if (!ld)
          return hung_up_tty_read(file, buf, count, ppos);
          if (ld->ops->read)
          i = ld->ops->read(tty, file, buf, count);
          else
          i = -EIO;
          tty_ldisc_deref(ld);

          if (i > 0)
          tty_update_time(&inode->i_atime);

          return i;
          }


          It's almost certainly not due to the line discipline not having a read function (the last EIO). Either a failed paranoia check, or tty being null, or the tty_io_error being true.



          It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty pointer being null. tty being null for some reason can't be ruled out.



          tty_io_error tests a flag in the TTY structure:



          static inline bool tty_io_error(struct tty_struct *tty)
          {
          return test_bit(TTY_IO_ERROR, &tty->flags);
          }


          If that got set somehow, we would have a persistent EIO return from read attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.



          So perhaps the ld->ops->read(tty, file, buf, count); line discipline operation is returning -EIO. The TTY should be in the POSIX line discipline at all times here numbered N_TTY. I see the file name hasn't changed in twenty years; it is still in n_tty.c. We want n_tty_read



          This has only one EIO situation:



                      if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
          retval = -EIO;
          break;
          }


          That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.



          Ah, but look what happens on entry into n_tty_read:



          c = job_control(tty, file);
          if (c < 0)
          return c;


          Here is where I strongly suspect the "smoking gun" may be. This code has EIO returns and has to do with job control. This ends up in the following function, with the sig argument being SIGTTIN.



          int __tty_check_change(struct tty_struct *tty, int sig)
          {
          unsigned long flags;
          struct pid *pgrp, *tty_pgrp;
          int ret = 0;

          if (current->signal->tty != tty)
          return 0;

          rcu_read_lock();
          pgrp = task_pgrp(current);

          spin_lock_irqsave(&tty->ctrl_lock, flags);
          tty_pgrp = tty->pgrp;
          spin_unlock_irqrestore(&tty->ctrl_lock, flags);

          if (tty_pgrp && pgrp != tty->pgrp) {
          if (is_ignored(sig)) {
          if (sig == SIGTTIN)
          ret = -EIO;
          } else if (is_current_pgrp_orphaned())
          ret = -EIO;
          else {
          kill_pgrp(pgrp, sig, 1);
          set_thread_flag(TIF_SIGPENDING);
          ret = -ERESTARTSYS;
          }
          }
          rcu_read_unlock();

          if (!tty_pgrp)
          tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);

          return ret;
          }


          Here, there are two conditions for EIO. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN signal.



          This precisely conforms to POSIX (Issue 7, 2016) which says:




          Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]




          The thing is, we don't expect the parent shell to become orphaned.



          Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?



          Indeed, what I'm seeing in one of my strace logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.






          share|improve this answer























          • Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago
















          2














          I can reproduce it in a Ubuntu 16 like this:




          • Create a new Gnome Terminal window.


          • Run a child bash; then suspend


          • kill %1



          The window dies. UPDATE: if we use kill -KILL this doesn't reproduce!



          TL; DR:




          My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the SIGTERM, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking the SIGTTIN signal and so its TTY read receives an EIO, and it bails. A bash which has suspended itself with suspend should not assert itself into the foreground when it resumes executing due to a fatal signal.




          To obtain more information, I attached strace -f -p <pid> to the parent shell to see the system calls.



          It looks like it may be exiting because, for some reason, it receives a -1 return from a read of the standard input, with errno being EIO: in other words, I/O error on standard input.



          Here is the tail end of the strace log: PID 18860 is the parent, 18910 is the child:



          Epilog of child exiting:



          18910 exit_group(0)                     = ?
          18910 +++ exited with 0 +++


          Parent's TTY read is interrupted in a restartable way by the SIGCHLD:



          18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
          18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---


          Parent's signal handling calls wait4 to collect child:



          18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
          18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)


          Parent executes return to the kernel from signal handler:



          18860 rt_sigreturn({mask=})           = 0


          And now comes the strange kicker, what the heck? The resumed read bails with I/O error:



          18860 read(0, 0x7ffe891c6717, 1)        = -1 EIO (Input/output error)


          And the parent begins to exit:



          18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
          18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
          18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
          [ ... ]
          18860 write(2, "exitn", 5) = 5
          18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
          18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
          18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
          18860 write(3, "echo $$nbashnkill %1n", 21) = 21
          18860 close(3) = 0
          [ ... ]
          etc.


          It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.



          So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1) then it doesn't repro, suggesting that the child bash takes some steps which put the TTY into a state whereby it generates -1/EIO.



          It does look like the kernel may be implicated in this as a possible root cause.



          Also, I tried this a few more times. Sometimes the ioctl(0, ...) calls that the parent issues when exiting also fail with -1/EIO; sometimes they don't.



          In the kernel, tty_read can bail with EIO for a few reasons. The next step would be to add some printk debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:



          static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
          loff_t *ppos)
          {
          int i;
          struct inode *inode = file_inode(file);
          struct tty_struct *tty = file_tty(file);
          struct tty_ldisc *ld;

          if (tty_paranoia_check(tty, inode, "tty_read"))
          return -EIO;
          if (!tty || tty_io_error(tty))
          return -EIO;

          /* We want to wait for the line discipline to sort out in this
          situation */
          ld = tty_ldisc_ref_wait(tty);
          if (!ld)
          return hung_up_tty_read(file, buf, count, ppos);
          if (ld->ops->read)
          i = ld->ops->read(tty, file, buf, count);
          else
          i = -EIO;
          tty_ldisc_deref(ld);

          if (i > 0)
          tty_update_time(&inode->i_atime);

          return i;
          }


          It's almost certainly not due to the line discipline not having a read function (the last EIO). Either a failed paranoia check, or tty being null, or the tty_io_error being true.



          It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty pointer being null. tty being null for some reason can't be ruled out.



          tty_io_error tests a flag in the TTY structure:



          static inline bool tty_io_error(struct tty_struct *tty)
          {
          return test_bit(TTY_IO_ERROR, &tty->flags);
          }


          If that got set somehow, we would have a persistent EIO return from read attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.



          So perhaps the ld->ops->read(tty, file, buf, count); line discipline operation is returning -EIO. The TTY should be in the POSIX line discipline at all times here numbered N_TTY. I see the file name hasn't changed in twenty years; it is still in n_tty.c. We want n_tty_read



          This has only one EIO situation:



                      if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
          retval = -EIO;
          break;
          }


          That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.



          Ah, but look what happens on entry into n_tty_read:



          c = job_control(tty, file);
          if (c < 0)
          return c;


          Here is where I strongly suspect the "smoking gun" may be. This code has EIO returns and has to do with job control. This ends up in the following function, with the sig argument being SIGTTIN.



          int __tty_check_change(struct tty_struct *tty, int sig)
          {
          unsigned long flags;
          struct pid *pgrp, *tty_pgrp;
          int ret = 0;

          if (current->signal->tty != tty)
          return 0;

          rcu_read_lock();
          pgrp = task_pgrp(current);

          spin_lock_irqsave(&tty->ctrl_lock, flags);
          tty_pgrp = tty->pgrp;
          spin_unlock_irqrestore(&tty->ctrl_lock, flags);

          if (tty_pgrp && pgrp != tty->pgrp) {
          if (is_ignored(sig)) {
          if (sig == SIGTTIN)
          ret = -EIO;
          } else if (is_current_pgrp_orphaned())
          ret = -EIO;
          else {
          kill_pgrp(pgrp, sig, 1);
          set_thread_flag(TIF_SIGPENDING);
          ret = -ERESTARTSYS;
          }
          }
          rcu_read_unlock();

          if (!tty_pgrp)
          tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);

          return ret;
          }


          Here, there are two conditions for EIO. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN signal.



          This precisely conforms to POSIX (Issue 7, 2016) which says:




          Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]




          The thing is, we don't expect the parent shell to become orphaned.



          Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?



          Indeed, what I'm seeing in one of my strace logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.






          share|improve this answer























          • Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago














          2












          2








          2






          I can reproduce it in a Ubuntu 16 like this:




          • Create a new Gnome Terminal window.


          • Run a child bash; then suspend


          • kill %1



          The window dies. UPDATE: if we use kill -KILL this doesn't reproduce!



          TL; DR:




          My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the SIGTERM, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking the SIGTTIN signal and so its TTY read receives an EIO, and it bails. A bash which has suspended itself with suspend should not assert itself into the foreground when it resumes executing due to a fatal signal.




          To obtain more information, I attached strace -f -p <pid> to the parent shell to see the system calls.



          It looks like it may be exiting because, for some reason, it receives a -1 return from a read of the standard input, with errno being EIO: in other words, I/O error on standard input.



          Here is the tail end of the strace log: PID 18860 is the parent, 18910 is the child:



          Epilog of child exiting:



          18910 exit_group(0)                     = ?
          18910 +++ exited with 0 +++


          Parent's TTY read is interrupted in a restartable way by the SIGCHLD:



          18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
          18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---


          Parent's signal handling calls wait4 to collect child:



          18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
          18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)


          Parent executes return to the kernel from signal handler:



          18860 rt_sigreturn({mask=})           = 0


          And now comes the strange kicker, what the heck? The resumed read bails with I/O error:



          18860 read(0, 0x7ffe891c6717, 1)        = -1 EIO (Input/output error)


          And the parent begins to exit:



          18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
          18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
          18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
          [ ... ]
          18860 write(2, "exitn", 5) = 5
          18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
          18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
          18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
          18860 write(3, "echo $$nbashnkill %1n", 21) = 21
          18860 close(3) = 0
          [ ... ]
          etc.


          It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.



          So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1) then it doesn't repro, suggesting that the child bash takes some steps which put the TTY into a state whereby it generates -1/EIO.



          It does look like the kernel may be implicated in this as a possible root cause.



          Also, I tried this a few more times. Sometimes the ioctl(0, ...) calls that the parent issues when exiting also fail with -1/EIO; sometimes they don't.



          In the kernel, tty_read can bail with EIO for a few reasons. The next step would be to add some printk debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:



          static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
          loff_t *ppos)
          {
          int i;
          struct inode *inode = file_inode(file);
          struct tty_struct *tty = file_tty(file);
          struct tty_ldisc *ld;

          if (tty_paranoia_check(tty, inode, "tty_read"))
          return -EIO;
          if (!tty || tty_io_error(tty))
          return -EIO;

          /* We want to wait for the line discipline to sort out in this
          situation */
          ld = tty_ldisc_ref_wait(tty);
          if (!ld)
          return hung_up_tty_read(file, buf, count, ppos);
          if (ld->ops->read)
          i = ld->ops->read(tty, file, buf, count);
          else
          i = -EIO;
          tty_ldisc_deref(ld);

          if (i > 0)
          tty_update_time(&inode->i_atime);

          return i;
          }


          It's almost certainly not due to the line discipline not having a read function (the last EIO). Either a failed paranoia check, or tty being null, or the tty_io_error being true.



          It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty pointer being null. tty being null for some reason can't be ruled out.



          tty_io_error tests a flag in the TTY structure:



          static inline bool tty_io_error(struct tty_struct *tty)
          {
          return test_bit(TTY_IO_ERROR, &tty->flags);
          }


          If that got set somehow, we would have a persistent EIO return from read attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.



          So perhaps the ld->ops->read(tty, file, buf, count); line discipline operation is returning -EIO. The TTY should be in the POSIX line discipline at all times here numbered N_TTY. I see the file name hasn't changed in twenty years; it is still in n_tty.c. We want n_tty_read



          This has only one EIO situation:



                      if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
          retval = -EIO;
          break;
          }


          That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.



          Ah, but look what happens on entry into n_tty_read:



          c = job_control(tty, file);
          if (c < 0)
          return c;


          Here is where I strongly suspect the "smoking gun" may be. This code has EIO returns and has to do with job control. This ends up in the following function, with the sig argument being SIGTTIN.



          int __tty_check_change(struct tty_struct *tty, int sig)
          {
          unsigned long flags;
          struct pid *pgrp, *tty_pgrp;
          int ret = 0;

          if (current->signal->tty != tty)
          return 0;

          rcu_read_lock();
          pgrp = task_pgrp(current);

          spin_lock_irqsave(&tty->ctrl_lock, flags);
          tty_pgrp = tty->pgrp;
          spin_unlock_irqrestore(&tty->ctrl_lock, flags);

          if (tty_pgrp && pgrp != tty->pgrp) {
          if (is_ignored(sig)) {
          if (sig == SIGTTIN)
          ret = -EIO;
          } else if (is_current_pgrp_orphaned())
          ret = -EIO;
          else {
          kill_pgrp(pgrp, sig, 1);
          set_thread_flag(TIF_SIGPENDING);
          ret = -ERESTARTSYS;
          }
          }
          rcu_read_unlock();

          if (!tty_pgrp)
          tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);

          return ret;
          }


          Here, there are two conditions for EIO. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN signal.



          This precisely conforms to POSIX (Issue 7, 2016) which says:




          Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]




          The thing is, we don't expect the parent shell to become orphaned.



          Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?



          Indeed, what I'm seeing in one of my strace logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.






          share|improve this answer














          I can reproduce it in a Ubuntu 16 like this:




          • Create a new Gnome Terminal window.


          • Run a child bash; then suspend


          • kill %1



          The window dies. UPDATE: if we use kill -KILL this doesn't reproduce!



          TL; DR:




          My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the SIGTERM, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking the SIGTTIN signal and so its TTY read receives an EIO, and it bails. A bash which has suspended itself with suspend should not assert itself into the foreground when it resumes executing due to a fatal signal.




          To obtain more information, I attached strace -f -p <pid> to the parent shell to see the system calls.



          It looks like it may be exiting because, for some reason, it receives a -1 return from a read of the standard input, with errno being EIO: in other words, I/O error on standard input.



          Here is the tail end of the strace log: PID 18860 is the parent, 18910 is the child:



          Epilog of child exiting:



          18910 exit_group(0)                     = ?
          18910 +++ exited with 0 +++


          Parent's TTY read is interrupted in a restartable way by the SIGCHLD:



          18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
          18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---


          Parent's signal handling calls wait4 to collect child:



          18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
          18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)


          Parent executes return to the kernel from signal handler:



          18860 rt_sigreturn({mask=})           = 0


          And now comes the strange kicker, what the heck? The resumed read bails with I/O error:



          18860 read(0, 0x7ffe891c6717, 1)        = -1 EIO (Input/output error)


          And the parent begins to exit:



          18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
          18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
          18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
          [ ... ]
          18860 write(2, "exitn", 5) = 5
          18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
          18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
          18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
          18860 write(3, "echo $$nbashnkill %1n", 21) = 21
          18860 close(3) = 0
          [ ... ]
          etc.


          It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.



          So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1) then it doesn't repro, suggesting that the child bash takes some steps which put the TTY into a state whereby it generates -1/EIO.



          It does look like the kernel may be implicated in this as a possible root cause.



          Also, I tried this a few more times. Sometimes the ioctl(0, ...) calls that the parent issues when exiting also fail with -1/EIO; sometimes they don't.



          In the kernel, tty_read can bail with EIO for a few reasons. The next step would be to add some printk debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:



          static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
          loff_t *ppos)
          {
          int i;
          struct inode *inode = file_inode(file);
          struct tty_struct *tty = file_tty(file);
          struct tty_ldisc *ld;

          if (tty_paranoia_check(tty, inode, "tty_read"))
          return -EIO;
          if (!tty || tty_io_error(tty))
          return -EIO;

          /* We want to wait for the line discipline to sort out in this
          situation */
          ld = tty_ldisc_ref_wait(tty);
          if (!ld)
          return hung_up_tty_read(file, buf, count, ppos);
          if (ld->ops->read)
          i = ld->ops->read(tty, file, buf, count);
          else
          i = -EIO;
          tty_ldisc_deref(ld);

          if (i > 0)
          tty_update_time(&inode->i_atime);

          return i;
          }


          It's almost certainly not due to the line discipline not having a read function (the last EIO). Either a failed paranoia check, or tty being null, or the tty_io_error being true.



          It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty pointer being null. tty being null for some reason can't be ruled out.



          tty_io_error tests a flag in the TTY structure:



          static inline bool tty_io_error(struct tty_struct *tty)
          {
          return test_bit(TTY_IO_ERROR, &tty->flags);
          }


          If that got set somehow, we would have a persistent EIO return from read attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.



          So perhaps the ld->ops->read(tty, file, buf, count); line discipline operation is returning -EIO. The TTY should be in the POSIX line discipline at all times here numbered N_TTY. I see the file name hasn't changed in twenty years; it is still in n_tty.c. We want n_tty_read



          This has only one EIO situation:



                      if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
          retval = -EIO;
          break;
          }


          That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.



          Ah, but look what happens on entry into n_tty_read:



          c = job_control(tty, file);
          if (c < 0)
          return c;


          Here is where I strongly suspect the "smoking gun" may be. This code has EIO returns and has to do with job control. This ends up in the following function, with the sig argument being SIGTTIN.



          int __tty_check_change(struct tty_struct *tty, int sig)
          {
          unsigned long flags;
          struct pid *pgrp, *tty_pgrp;
          int ret = 0;

          if (current->signal->tty != tty)
          return 0;

          rcu_read_lock();
          pgrp = task_pgrp(current);

          spin_lock_irqsave(&tty->ctrl_lock, flags);
          tty_pgrp = tty->pgrp;
          spin_unlock_irqrestore(&tty->ctrl_lock, flags);

          if (tty_pgrp && pgrp != tty->pgrp) {
          if (is_ignored(sig)) {
          if (sig == SIGTTIN)
          ret = -EIO;
          } else if (is_current_pgrp_orphaned())
          ret = -EIO;
          else {
          kill_pgrp(pgrp, sig, 1);
          set_thread_flag(TIF_SIGPENDING);
          ret = -ERESTARTSYS;
          }
          }
          rcu_read_unlock();

          if (!tty_pgrp)
          tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);

          return ret;
          }


          Here, there are two conditions for EIO. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN signal.



          This precisely conforms to POSIX (Issue 7, 2016) which says:




          Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]




          The thing is, we don't expect the parent shell to become orphaned.



          Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?



          Indeed, what I'm seeing in one of my strace logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jul 19 '17 at 23:48

























          answered Jul 19 '17 at 22:41









          Kaz

          4,56811432




          4,56811432












          • Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago


















          • Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago
















          Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
          – Isaac
          7 hours ago




          Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
          – Isaac
          7 hours ago













          2














          It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu).




          • It doesn't need suspend as it occurs also after kill -STOP bash_pid.

          • It doesn't occur if you kill -9 %1 instead of kill %1.

          • It doesn't occur if you kill pid instead of kill %1.

          • It doesn't occur if the subprocess is something else than bash (try dash or sleep 999). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONT sleep 999 in this case, but it apparently does.

          • It doesn't occur in other shells (including dash executing a dash subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (the ps uw consistently shows the subprocess in state T). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.






          share|improve this answer























          • It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago












          • @Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
            – kubanczyk
            39 mins ago












          • It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
            – Isaac
            33 mins ago










          • You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
            – kubanczyk
            23 mins ago
















          2














          It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu).




          • It doesn't need suspend as it occurs also after kill -STOP bash_pid.

          • It doesn't occur if you kill -9 %1 instead of kill %1.

          • It doesn't occur if you kill pid instead of kill %1.

          • It doesn't occur if the subprocess is something else than bash (try dash or sleep 999). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONT sleep 999 in this case, but it apparently does.

          • It doesn't occur in other shells (including dash executing a dash subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (the ps uw consistently shows the subprocess in state T). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.






          share|improve this answer























          • It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago












          • @Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
            – kubanczyk
            39 mins ago












          • It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
            – Isaac
            33 mins ago










          • You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
            – kubanczyk
            23 mins ago














          2












          2








          2






          It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu).




          • It doesn't need suspend as it occurs also after kill -STOP bash_pid.

          • It doesn't occur if you kill -9 %1 instead of kill %1.

          • It doesn't occur if you kill pid instead of kill %1.

          • It doesn't occur if the subprocess is something else than bash (try dash or sleep 999). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONT sleep 999 in this case, but it apparently does.

          • It doesn't occur in other shells (including dash executing a dash subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (the ps uw consistently shows the subprocess in state T). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.






          share|improve this answer














          It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu).




          • It doesn't need suspend as it occurs also after kill -STOP bash_pid.

          • It doesn't occur if you kill -9 %1 instead of kill %1.

          • It doesn't occur if you kill pid instead of kill %1.

          • It doesn't occur if the subprocess is something else than bash (try dash or sleep 999). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONT sleep 999 in this case, but it apparently does.

          • It doesn't occur in other shells (including dash executing a dash subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (the ps uw consistently shows the subprocess in state T). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 27 mins ago

























          answered Jul 19 '17 at 20:42









          kubanczyk

          859514




          859514












          • It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago












          • @Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
            – kubanczyk
            39 mins ago












          • It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
            – Isaac
            33 mins ago










          • You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
            – kubanczyk
            23 mins ago


















          • It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
            – Isaac
            7 hours ago












          • @Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
            – kubanczyk
            39 mins ago












          • It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
            – Isaac
            33 mins ago










          • You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
            – kubanczyk
            23 mins ago
















          It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
          – Isaac
          7 hours ago






          It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
          – Isaac
          7 hours ago














          @Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
          – kubanczyk
          39 mins ago






          @Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
          – kubanczyk
          39 mins ago














          It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
          – Isaac
          33 mins ago




          It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
          – Isaac
          33 mins ago












          You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
          – kubanczyk
          23 mins ago




          You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
          – kubanczyk
          23 mins ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f379507%2fweird-terminating-background-bash-kills-the-parent%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Accessing regular linux commands in Huawei's Dopra Linux

          Can't connect RFCOMM socket: Host is down

          Kernel panic - not syncing: Fatal Exception in Interrupt