Weird: terminating background bash kills the parent?
I apologize if the answer is really simple. Logged into a Linux server, I was practicing different job control builtins and I arrived at the suspend command. Being curious, I did the first thing anyone would do: type "suspend" and see what happens.
user@server:~$ suspend
-bash: suspend: cannot suspend a login shell
So then I created a subshell, and tried to suspend it:
user@server:~$ bash
user@server:~$ suspend
[1]+ Stopped bash
user@server:~$
This was fine. Or so I thought! Being satisfied that the suspend command worked, I decided to end that subshell:
user@server:~$ kill %1
[1]+ Stopped bash
user@server:~$ user@server:~$
Strange, I thought. Ignoring the fact that I failed to actually terminate that subshell, I got two prompts on that line. So I hit enter to get a tidier prompt, and:
user@server:~$ user@server:~$ logout
user@server:~$ Connection to server closed.
user@client:~$
This comes as a surprise. It also works with a local terminal, it does not require being connected to a remote server. A local terminal will go back to the login prompt. A terminal in a desktop session will close.
So how does trying to kill a backgrounded subshell result in the parent dying?
bash
add a comment |
I apologize if the answer is really simple. Logged into a Linux server, I was practicing different job control builtins and I arrived at the suspend command. Being curious, I did the first thing anyone would do: type "suspend" and see what happens.
user@server:~$ suspend
-bash: suspend: cannot suspend a login shell
So then I created a subshell, and tried to suspend it:
user@server:~$ bash
user@server:~$ suspend
[1]+ Stopped bash
user@server:~$
This was fine. Or so I thought! Being satisfied that the suspend command worked, I decided to end that subshell:
user@server:~$ kill %1
[1]+ Stopped bash
user@server:~$ user@server:~$
Strange, I thought. Ignoring the fact that I failed to actually terminate that subshell, I got two prompts on that line. So I hit enter to get a tidier prompt, and:
user@server:~$ user@server:~$ logout
user@server:~$ Connection to server closed.
user@client:~$
This comes as a surprise. It also works with a local terminal, it does not require being connected to a remote server. A local terminal will go back to the login prompt. A terminal in a desktop session will close.
So how does trying to kill a backgrounded subshell result in the parent dying?
bash
add a comment |
I apologize if the answer is really simple. Logged into a Linux server, I was practicing different job control builtins and I arrived at the suspend command. Being curious, I did the first thing anyone would do: type "suspend" and see what happens.
user@server:~$ suspend
-bash: suspend: cannot suspend a login shell
So then I created a subshell, and tried to suspend it:
user@server:~$ bash
user@server:~$ suspend
[1]+ Stopped bash
user@server:~$
This was fine. Or so I thought! Being satisfied that the suspend command worked, I decided to end that subshell:
user@server:~$ kill %1
[1]+ Stopped bash
user@server:~$ user@server:~$
Strange, I thought. Ignoring the fact that I failed to actually terminate that subshell, I got two prompts on that line. So I hit enter to get a tidier prompt, and:
user@server:~$ user@server:~$ logout
user@server:~$ Connection to server closed.
user@client:~$
This comes as a surprise. It also works with a local terminal, it does not require being connected to a remote server. A local terminal will go back to the login prompt. A terminal in a desktop session will close.
So how does trying to kill a backgrounded subshell result in the parent dying?
bash
I apologize if the answer is really simple. Logged into a Linux server, I was practicing different job control builtins and I arrived at the suspend command. Being curious, I did the first thing anyone would do: type "suspend" and see what happens.
user@server:~$ suspend
-bash: suspend: cannot suspend a login shell
So then I created a subshell, and tried to suspend it:
user@server:~$ bash
user@server:~$ suspend
[1]+ Stopped bash
user@server:~$
This was fine. Or so I thought! Being satisfied that the suspend command worked, I decided to end that subshell:
user@server:~$ kill %1
[1]+ Stopped bash
user@server:~$ user@server:~$
Strange, I thought. Ignoring the fact that I failed to actually terminate that subshell, I got two prompts on that line. So I hit enter to get a tidier prompt, and:
user@server:~$ user@server:~$ logout
user@server:~$ Connection to server closed.
user@client:~$
This comes as a surprise. It also works with a local terminal, it does not require being connected to a remote server. A local terminal will go back to the login prompt. A terminal in a desktop session will close.
So how does trying to kill a backgrounded subshell result in the parent dying?
bash
bash
asked Jul 19 '17 at 10:57
ajhlinuxuser
373
373
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I can reproduce it in a Ubuntu 16 like this:
Create a new Gnome Terminal window.
Run a child
bash
; thensuspend
kill %1
The window dies. UPDATE: if we use kill -KILL
this doesn't reproduce!
TL; DR:
My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the
SIGTERM
, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking theSIGTTIN
signal and so its TTYread
receives anEIO
, and it bails. A bash which has suspended itself withsuspend
should not assert itself into the foreground when it resumes executing due to a fatal signal.
To obtain more information, I attached strace -f -p <pid>
to the parent shell to see the system calls.
It looks like it may be exiting because, for some reason, it receives a -1 return from a read
of the standard input, with errno
being EIO
: in other words, I/O error on standard input.
Here is the tail end of the strace
log: PID 18860
is the parent, 18910
is the child:
Epilog of child exiting:
18910 exit_group(0) = ?
18910 +++ exited with 0 +++
Parent's TTY read
is interrupted in a restartable way by the SIGCHLD
:
18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---
Parent's signal handling calls wait4
to collect child:
18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
Parent executes return to the kernel from signal handler:
18860 rt_sigreturn({mask=}) = 0
And now comes the strange kicker, what the heck? The resumed read
bails with I/O error:
18860 read(0, 0x7ffe891c6717, 1) = -1 EIO (Input/output error)
And the parent begins to exit:
18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
[ ... ]
18860 write(2, "exitn", 5) = 5
18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
18860 write(3, "echo $$nbashnkill %1n", 21) = 21
18860 close(3) = 0
[ ... ]
etc.
It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.
So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1
) then it doesn't repro, suggesting that the child bash
takes some steps which put the TTY into a state whereby it generates -1/EIO
.
It does look like the kernel may be implicated in this as a possible root cause.
Also, I tried this a few more times. Sometimes the ioctl(0, ...)
calls that the parent issues when exiting also fail with -1/EIO
; sometimes they don't.
In the kernel, tty_read
can bail with EIO
for a few reasons. The next step would be to add some printk
debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:
static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
{
int i;
struct inode *inode = file_inode(file);
struct tty_struct *tty = file_tty(file);
struct tty_ldisc *ld;
if (tty_paranoia_check(tty, inode, "tty_read"))
return -EIO;
if (!tty || tty_io_error(tty))
return -EIO;
/* We want to wait for the line discipline to sort out in this
situation */
ld = tty_ldisc_ref_wait(tty);
if (!ld)
return hung_up_tty_read(file, buf, count, ppos);
if (ld->ops->read)
i = ld->ops->read(tty, file, buf, count);
else
i = -EIO;
tty_ldisc_deref(ld);
if (i > 0)
tty_update_time(&inode->i_atime);
return i;
}
It's almost certainly not due to the line discipline not having a read
function (the last EIO
). Either a failed paranoia check, or tty
being null, or the tty_io_error
being true.
It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty
pointer being null. tty
being null for some reason can't be ruled out.
tty_io_error
tests a flag in the TTY structure:
static inline bool tty_io_error(struct tty_struct *tty)
{
return test_bit(TTY_IO_ERROR, &tty->flags);
}
If that got set somehow, we would have a persistent EIO
return from read
attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.
So perhaps the ld->ops->read(tty, file, buf, count);
line discipline operation is returning -EIO
. The TTY should be in the POSIX line discipline at all times here numbered N_TTY
. I see the file name hasn't changed in twenty years; it is still in n_tty.c
. We want n_tty_read
This has only one EIO
situation:
if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
retval = -EIO;
break;
}
That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.
Ah, but look what happens on entry into n_tty_read
:
c = job_control(tty, file);
if (c < 0)
return c;
Here is where I strongly suspect the "smoking gun" may be. This code has EIO
returns and has to do with job control. This ends up in the following function, with the sig
argument being SIGTTIN
.
int __tty_check_change(struct tty_struct *tty, int sig)
{
unsigned long flags;
struct pid *pgrp, *tty_pgrp;
int ret = 0;
if (current->signal->tty != tty)
return 0;
rcu_read_lock();
pgrp = task_pgrp(current);
spin_lock_irqsave(&tty->ctrl_lock, flags);
tty_pgrp = tty->pgrp;
spin_unlock_irqrestore(&tty->ctrl_lock, flags);
if (tty_pgrp && pgrp != tty->pgrp) {
if (is_ignored(sig)) {
if (sig == SIGTTIN)
ret = -EIO;
} else if (is_current_pgrp_orphaned())
ret = -EIO;
else {
kill_pgrp(pgrp, sig, 1);
set_thread_flag(TIF_SIGPENDING);
ret = -ERESTARTSYS;
}
}
rcu_read_unlock();
if (!tty_pgrp)
tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);
return ret;
}
Here, there are two conditions for EIO
. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN
signal.
This precisely conforms to POSIX (Issue 7, 2016) which says:
Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]
The thing is, we don't expect the parent shell to become orphaned.
Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?
Indeed, what I'm seeing in one of my strace
logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp
to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD
signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.
Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
add a comment |
It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
.
- It doesn't need
suspend
as it occurs also afterkill -STOP bash_pid
. - It doesn't occur if you
kill -9 %1
instead ofkill %1
. - It doesn't occur if you
kill pid
instead ofkill %1
. - It doesn't occur if the subprocess is something else than
bash
(trydash
orsleep 999
). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONTsleep 999
in this case, but it apparently does. - It doesn't occur in other shells (including
dash
executing adash
subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (theps uw
consistently shows the subprocess in stateT
). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.
It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
@Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
– kubanczyk
39 mins ago
It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
– Isaac
33 mins ago
You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
– kubanczyk
23 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f379507%2fweird-terminating-background-bash-kills-the-parent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I can reproduce it in a Ubuntu 16 like this:
Create a new Gnome Terminal window.
Run a child
bash
; thensuspend
kill %1
The window dies. UPDATE: if we use kill -KILL
this doesn't reproduce!
TL; DR:
My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the
SIGTERM
, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking theSIGTTIN
signal and so its TTYread
receives anEIO
, and it bails. A bash which has suspended itself withsuspend
should not assert itself into the foreground when it resumes executing due to a fatal signal.
To obtain more information, I attached strace -f -p <pid>
to the parent shell to see the system calls.
It looks like it may be exiting because, for some reason, it receives a -1 return from a read
of the standard input, with errno
being EIO
: in other words, I/O error on standard input.
Here is the tail end of the strace
log: PID 18860
is the parent, 18910
is the child:
Epilog of child exiting:
18910 exit_group(0) = ?
18910 +++ exited with 0 +++
Parent's TTY read
is interrupted in a restartable way by the SIGCHLD
:
18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---
Parent's signal handling calls wait4
to collect child:
18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
Parent executes return to the kernel from signal handler:
18860 rt_sigreturn({mask=}) = 0
And now comes the strange kicker, what the heck? The resumed read
bails with I/O error:
18860 read(0, 0x7ffe891c6717, 1) = -1 EIO (Input/output error)
And the parent begins to exit:
18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
[ ... ]
18860 write(2, "exitn", 5) = 5
18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
18860 write(3, "echo $$nbashnkill %1n", 21) = 21
18860 close(3) = 0
[ ... ]
etc.
It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.
So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1
) then it doesn't repro, suggesting that the child bash
takes some steps which put the TTY into a state whereby it generates -1/EIO
.
It does look like the kernel may be implicated in this as a possible root cause.
Also, I tried this a few more times. Sometimes the ioctl(0, ...)
calls that the parent issues when exiting also fail with -1/EIO
; sometimes they don't.
In the kernel, tty_read
can bail with EIO
for a few reasons. The next step would be to add some printk
debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:
static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
{
int i;
struct inode *inode = file_inode(file);
struct tty_struct *tty = file_tty(file);
struct tty_ldisc *ld;
if (tty_paranoia_check(tty, inode, "tty_read"))
return -EIO;
if (!tty || tty_io_error(tty))
return -EIO;
/* We want to wait for the line discipline to sort out in this
situation */
ld = tty_ldisc_ref_wait(tty);
if (!ld)
return hung_up_tty_read(file, buf, count, ppos);
if (ld->ops->read)
i = ld->ops->read(tty, file, buf, count);
else
i = -EIO;
tty_ldisc_deref(ld);
if (i > 0)
tty_update_time(&inode->i_atime);
return i;
}
It's almost certainly not due to the line discipline not having a read
function (the last EIO
). Either a failed paranoia check, or tty
being null, or the tty_io_error
being true.
It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty
pointer being null. tty
being null for some reason can't be ruled out.
tty_io_error
tests a flag in the TTY structure:
static inline bool tty_io_error(struct tty_struct *tty)
{
return test_bit(TTY_IO_ERROR, &tty->flags);
}
If that got set somehow, we would have a persistent EIO
return from read
attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.
So perhaps the ld->ops->read(tty, file, buf, count);
line discipline operation is returning -EIO
. The TTY should be in the POSIX line discipline at all times here numbered N_TTY
. I see the file name hasn't changed in twenty years; it is still in n_tty.c
. We want n_tty_read
This has only one EIO
situation:
if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
retval = -EIO;
break;
}
That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.
Ah, but look what happens on entry into n_tty_read
:
c = job_control(tty, file);
if (c < 0)
return c;
Here is where I strongly suspect the "smoking gun" may be. This code has EIO
returns and has to do with job control. This ends up in the following function, with the sig
argument being SIGTTIN
.
int __tty_check_change(struct tty_struct *tty, int sig)
{
unsigned long flags;
struct pid *pgrp, *tty_pgrp;
int ret = 0;
if (current->signal->tty != tty)
return 0;
rcu_read_lock();
pgrp = task_pgrp(current);
spin_lock_irqsave(&tty->ctrl_lock, flags);
tty_pgrp = tty->pgrp;
spin_unlock_irqrestore(&tty->ctrl_lock, flags);
if (tty_pgrp && pgrp != tty->pgrp) {
if (is_ignored(sig)) {
if (sig == SIGTTIN)
ret = -EIO;
} else if (is_current_pgrp_orphaned())
ret = -EIO;
else {
kill_pgrp(pgrp, sig, 1);
set_thread_flag(TIF_SIGPENDING);
ret = -ERESTARTSYS;
}
}
rcu_read_unlock();
if (!tty_pgrp)
tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);
return ret;
}
Here, there are two conditions for EIO
. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN
signal.
This precisely conforms to POSIX (Issue 7, 2016) which says:
Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]
The thing is, we don't expect the parent shell to become orphaned.
Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?
Indeed, what I'm seeing in one of my strace
logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp
to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD
signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.
Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
add a comment |
I can reproduce it in a Ubuntu 16 like this:
Create a new Gnome Terminal window.
Run a child
bash
; thensuspend
kill %1
The window dies. UPDATE: if we use kill -KILL
this doesn't reproduce!
TL; DR:
My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the
SIGTERM
, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking theSIGTTIN
signal and so its TTYread
receives anEIO
, and it bails. A bash which has suspended itself withsuspend
should not assert itself into the foreground when it resumes executing due to a fatal signal.
To obtain more information, I attached strace -f -p <pid>
to the parent shell to see the system calls.
It looks like it may be exiting because, for some reason, it receives a -1 return from a read
of the standard input, with errno
being EIO
: in other words, I/O error on standard input.
Here is the tail end of the strace
log: PID 18860
is the parent, 18910
is the child:
Epilog of child exiting:
18910 exit_group(0) = ?
18910 +++ exited with 0 +++
Parent's TTY read
is interrupted in a restartable way by the SIGCHLD
:
18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---
Parent's signal handling calls wait4
to collect child:
18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
Parent executes return to the kernel from signal handler:
18860 rt_sigreturn({mask=}) = 0
And now comes the strange kicker, what the heck? The resumed read
bails with I/O error:
18860 read(0, 0x7ffe891c6717, 1) = -1 EIO (Input/output error)
And the parent begins to exit:
18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
[ ... ]
18860 write(2, "exitn", 5) = 5
18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
18860 write(3, "echo $$nbashnkill %1n", 21) = 21
18860 close(3) = 0
[ ... ]
etc.
It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.
So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1
) then it doesn't repro, suggesting that the child bash
takes some steps which put the TTY into a state whereby it generates -1/EIO
.
It does look like the kernel may be implicated in this as a possible root cause.
Also, I tried this a few more times. Sometimes the ioctl(0, ...)
calls that the parent issues when exiting also fail with -1/EIO
; sometimes they don't.
In the kernel, tty_read
can bail with EIO
for a few reasons. The next step would be to add some printk
debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:
static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
{
int i;
struct inode *inode = file_inode(file);
struct tty_struct *tty = file_tty(file);
struct tty_ldisc *ld;
if (tty_paranoia_check(tty, inode, "tty_read"))
return -EIO;
if (!tty || tty_io_error(tty))
return -EIO;
/* We want to wait for the line discipline to sort out in this
situation */
ld = tty_ldisc_ref_wait(tty);
if (!ld)
return hung_up_tty_read(file, buf, count, ppos);
if (ld->ops->read)
i = ld->ops->read(tty, file, buf, count);
else
i = -EIO;
tty_ldisc_deref(ld);
if (i > 0)
tty_update_time(&inode->i_atime);
return i;
}
It's almost certainly not due to the line discipline not having a read
function (the last EIO
). Either a failed paranoia check, or tty
being null, or the tty_io_error
being true.
It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty
pointer being null. tty
being null for some reason can't be ruled out.
tty_io_error
tests a flag in the TTY structure:
static inline bool tty_io_error(struct tty_struct *tty)
{
return test_bit(TTY_IO_ERROR, &tty->flags);
}
If that got set somehow, we would have a persistent EIO
return from read
attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.
So perhaps the ld->ops->read(tty, file, buf, count);
line discipline operation is returning -EIO
. The TTY should be in the POSIX line discipline at all times here numbered N_TTY
. I see the file name hasn't changed in twenty years; it is still in n_tty.c
. We want n_tty_read
This has only one EIO
situation:
if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
retval = -EIO;
break;
}
That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.
Ah, but look what happens on entry into n_tty_read
:
c = job_control(tty, file);
if (c < 0)
return c;
Here is where I strongly suspect the "smoking gun" may be. This code has EIO
returns and has to do with job control. This ends up in the following function, with the sig
argument being SIGTTIN
.
int __tty_check_change(struct tty_struct *tty, int sig)
{
unsigned long flags;
struct pid *pgrp, *tty_pgrp;
int ret = 0;
if (current->signal->tty != tty)
return 0;
rcu_read_lock();
pgrp = task_pgrp(current);
spin_lock_irqsave(&tty->ctrl_lock, flags);
tty_pgrp = tty->pgrp;
spin_unlock_irqrestore(&tty->ctrl_lock, flags);
if (tty_pgrp && pgrp != tty->pgrp) {
if (is_ignored(sig)) {
if (sig == SIGTTIN)
ret = -EIO;
} else if (is_current_pgrp_orphaned())
ret = -EIO;
else {
kill_pgrp(pgrp, sig, 1);
set_thread_flag(TIF_SIGPENDING);
ret = -ERESTARTSYS;
}
}
rcu_read_unlock();
if (!tty_pgrp)
tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);
return ret;
}
Here, there are two conditions for EIO
. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN
signal.
This precisely conforms to POSIX (Issue 7, 2016) which says:
Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]
The thing is, we don't expect the parent shell to become orphaned.
Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?
Indeed, what I'm seeing in one of my strace
logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp
to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD
signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.
Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
add a comment |
I can reproduce it in a Ubuntu 16 like this:
Create a new Gnome Terminal window.
Run a child
bash
; thensuspend
kill %1
The window dies. UPDATE: if we use kill -KILL
this doesn't reproduce!
TL; DR:
My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the
SIGTERM
, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking theSIGTTIN
signal and so its TTYread
receives anEIO
, and it bails. A bash which has suspended itself withsuspend
should not assert itself into the foreground when it resumes executing due to a fatal signal.
To obtain more information, I attached strace -f -p <pid>
to the parent shell to see the system calls.
It looks like it may be exiting because, for some reason, it receives a -1 return from a read
of the standard input, with errno
being EIO
: in other words, I/O error on standard input.
Here is the tail end of the strace
log: PID 18860
is the parent, 18910
is the child:
Epilog of child exiting:
18910 exit_group(0) = ?
18910 +++ exited with 0 +++
Parent's TTY read
is interrupted in a restartable way by the SIGCHLD
:
18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---
Parent's signal handling calls wait4
to collect child:
18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
Parent executes return to the kernel from signal handler:
18860 rt_sigreturn({mask=}) = 0
And now comes the strange kicker, what the heck? The resumed read
bails with I/O error:
18860 read(0, 0x7ffe891c6717, 1) = -1 EIO (Input/output error)
And the parent begins to exit:
18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
[ ... ]
18860 write(2, "exitn", 5) = 5
18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
18860 write(3, "echo $$nbashnkill %1n", 21) = 21
18860 close(3) = 0
[ ... ]
etc.
It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.
So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1
) then it doesn't repro, suggesting that the child bash
takes some steps which put the TTY into a state whereby it generates -1/EIO
.
It does look like the kernel may be implicated in this as a possible root cause.
Also, I tried this a few more times. Sometimes the ioctl(0, ...)
calls that the parent issues when exiting also fail with -1/EIO
; sometimes they don't.
In the kernel, tty_read
can bail with EIO
for a few reasons. The next step would be to add some printk
debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:
static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
{
int i;
struct inode *inode = file_inode(file);
struct tty_struct *tty = file_tty(file);
struct tty_ldisc *ld;
if (tty_paranoia_check(tty, inode, "tty_read"))
return -EIO;
if (!tty || tty_io_error(tty))
return -EIO;
/* We want to wait for the line discipline to sort out in this
situation */
ld = tty_ldisc_ref_wait(tty);
if (!ld)
return hung_up_tty_read(file, buf, count, ppos);
if (ld->ops->read)
i = ld->ops->read(tty, file, buf, count);
else
i = -EIO;
tty_ldisc_deref(ld);
if (i > 0)
tty_update_time(&inode->i_atime);
return i;
}
It's almost certainly not due to the line discipline not having a read
function (the last EIO
). Either a failed paranoia check, or tty
being null, or the tty_io_error
being true.
It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty
pointer being null. tty
being null for some reason can't be ruled out.
tty_io_error
tests a flag in the TTY structure:
static inline bool tty_io_error(struct tty_struct *tty)
{
return test_bit(TTY_IO_ERROR, &tty->flags);
}
If that got set somehow, we would have a persistent EIO
return from read
attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.
So perhaps the ld->ops->read(tty, file, buf, count);
line discipline operation is returning -EIO
. The TTY should be in the POSIX line discipline at all times here numbered N_TTY
. I see the file name hasn't changed in twenty years; it is still in n_tty.c
. We want n_tty_read
This has only one EIO
situation:
if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
retval = -EIO;
break;
}
That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.
Ah, but look what happens on entry into n_tty_read
:
c = job_control(tty, file);
if (c < 0)
return c;
Here is where I strongly suspect the "smoking gun" may be. This code has EIO
returns and has to do with job control. This ends up in the following function, with the sig
argument being SIGTTIN
.
int __tty_check_change(struct tty_struct *tty, int sig)
{
unsigned long flags;
struct pid *pgrp, *tty_pgrp;
int ret = 0;
if (current->signal->tty != tty)
return 0;
rcu_read_lock();
pgrp = task_pgrp(current);
spin_lock_irqsave(&tty->ctrl_lock, flags);
tty_pgrp = tty->pgrp;
spin_unlock_irqrestore(&tty->ctrl_lock, flags);
if (tty_pgrp && pgrp != tty->pgrp) {
if (is_ignored(sig)) {
if (sig == SIGTTIN)
ret = -EIO;
} else if (is_current_pgrp_orphaned())
ret = -EIO;
else {
kill_pgrp(pgrp, sig, 1);
set_thread_flag(TIF_SIGPENDING);
ret = -ERESTARTSYS;
}
}
rcu_read_unlock();
if (!tty_pgrp)
tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);
return ret;
}
Here, there are two conditions for EIO
. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN
signal.
This precisely conforms to POSIX (Issue 7, 2016) which says:
Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]
The thing is, we don't expect the parent shell to become orphaned.
Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?
Indeed, what I'm seeing in one of my strace
logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp
to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD
signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.
I can reproduce it in a Ubuntu 16 like this:
Create a new Gnome Terminal window.
Run a child
bash
; thensuspend
kill %1
The window dies. UPDATE: if we use kill -KILL
this doesn't reproduce!
TL; DR:
My current hypothesis (not entirely conclusive) from the below analysis is that when the child bash receives the
SIGTERM
, it seizes the terminal by forcing itself into the foreground process group. The parent Bash is likely blocking theSIGTTIN
signal and so its TTYread
receives anEIO
, and it bails. A bash which has suspended itself withsuspend
should not assert itself into the foreground when it resumes executing due to a fatal signal.
To obtain more information, I attached strace -f -p <pid>
to the parent shell to see the system calls.
It looks like it may be exiting because, for some reason, it receives a -1 return from a read
of the standard input, with errno
being EIO
: in other words, I/O error on standard input.
Here is the tail end of the strace
log: PID 18860
is the parent, 18910
is the child:
Epilog of child exiting:
18910 exit_group(0) = ?
18910 +++ exited with 0 +++
Parent's TTY read
is interrupted in a restartable way by the SIGCHLD
:
18860 <... read resumed> 0x7ffe891c6717, 1) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
18860 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=18910, si_uid=1001, si_status=0, si_utime=0, si_stime=1} ---
Parent's signal handling calls wait4
to collect child:
18860 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED|WCONTINUED, NULL) = 18910
18860 wait4(-1, 0x7ffe891c6010, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
Parent executes return to the kernel from signal handler:
18860 rt_sigreturn({mask=}) = 0
And now comes the strange kicker, what the heck? The resumed read
bails with I/O error:
18860 read(0, 0x7ffe891c6717, 1) = -1 EIO (Input/output error)
And the parent begins to exit:
18860 ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
18860 ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
18860 ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
[ ... ]
18860 write(2, "exitn", 5) = 5
18860 rt_sigaction(SIGINT, {0x460390, , SA_RESTORER, 0x7f598a157860}, {0x460390, , SA_RESTORER, 0x7f598a157860}, 8) = 0
18860 stat("/local/home/kaz/.bash_history", {st_mode=S_IFREG|0600, st_size=57362, ...}) = 0
18860 open("/local/home/kaz/.bash_history", O_WRONLY|O_APPEND) = 3
18860 write(3, "echo $$nbashnkill %1n", 21) = 21
18860 close(3) = 0
[ ... ]
etc.
It really looks as if the termination is a response to the I/O error, which is almost certainly unexpected.
So the question is, what has the termination of the child done to cause this subsequent I/O error? If the child gets no opportunity to do anything (kill -KILL %1
) then it doesn't repro, suggesting that the child bash
takes some steps which put the TTY into a state whereby it generates -1/EIO
.
It does look like the kernel may be implicated in this as a possible root cause.
Also, I tried this a few more times. Sometimes the ioctl(0, ...)
calls that the parent issues when exiting also fail with -1/EIO
; sometimes they don't.
In the kernel, tty_read
can bail with EIO
for a few reasons. The next step would be to add some printk
debugging to see exactly which. Here it is from 4.12.2, courtesy of free-electrons.com:
static ssize_t tty_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
{
int i;
struct inode *inode = file_inode(file);
struct tty_struct *tty = file_tty(file);
struct tty_ldisc *ld;
if (tty_paranoia_check(tty, inode, "tty_read"))
return -EIO;
if (!tty || tty_io_error(tty))
return -EIO;
/* We want to wait for the line discipline to sort out in this
situation */
ld = tty_ldisc_ref_wait(tty);
if (!ld)
return hung_up_tty_read(file, buf, count, ppos);
if (ld->ops->read)
i = ld->ops->read(tty, file, buf, count);
else
i = -EIO;
tty_ldisc_deref(ld);
if (i > 0)
tty_update_time(&inode->i_atime);
return i;
}
It's almost certainly not due to the line discipline not having a read
function (the last EIO
). Either a failed paranoia check, or tty
being null, or the tty_io_error
being true.
It's not the paranoia check, because when that goes off, it logs a warning message. I don't see one in my kernel log. The check has to be enabled at compile time, and it checks for the tty
pointer being null. tty
being null for some reason can't be ruled out.
tty_io_error
tests a flag in the TTY structure:
static inline bool tty_io_error(struct tty_struct *tty)
{
return test_bit(TTY_IO_ERROR, &tty->flags);
}
If that got set somehow, we would have a persistent EIO
return from read
attempts and probably other syscalls. This is is, however, something that is indicated by lower-level TTY drivers, like serial code.
So perhaps the ld->ops->read(tty, file, buf, count);
line discipline operation is returning -EIO
. The TTY should be in the POSIX line discipline at all times here numbered N_TTY
. I see the file name hasn't changed in twenty years; it is still in n_tty.c
. We want n_tty_read
This has only one EIO
situation:
if (test_bit(TTY_OTHER_CLOSED, &tty->flags)) {
retval = -EIO;
break;
}
That flag is related to TTY/PTY interaction though. The PTY here should be a device controlled by the gnome terminal; there is no reason why that would close in this situation.
Ah, but look what happens on entry into n_tty_read
:
c = job_control(tty, file);
if (c < 0)
return c;
Here is where I strongly suspect the "smoking gun" may be. This code has EIO
returns and has to do with job control. This ends up in the following function, with the sig
argument being SIGTTIN
.
int __tty_check_change(struct tty_struct *tty, int sig)
{
unsigned long flags;
struct pid *pgrp, *tty_pgrp;
int ret = 0;
if (current->signal->tty != tty)
return 0;
rcu_read_lock();
pgrp = task_pgrp(current);
spin_lock_irqsave(&tty->ctrl_lock, flags);
tty_pgrp = tty->pgrp;
spin_unlock_irqrestore(&tty->ctrl_lock, flags);
if (tty_pgrp && pgrp != tty->pgrp) {
if (is_ignored(sig)) {
if (sig == SIGTTIN)
ret = -EIO;
} else if (is_current_pgrp_orphaned())
ret = -EIO;
else {
kill_pgrp(pgrp, sig, 1);
set_thread_flag(TIF_SIGPENDING);
ret = -ERESTARTSYS;
}
}
rcu_read_unlock();
if (!tty_pgrp)
tty_warn(tty, "sig=%d, tty->pgrp == NULL!n", sig);
return ret;
}
Here, there are two conditions for EIO
. One is that the calling task trying to read from the TTY is not in the foreground process group, and is ignoring the SIGTTIN
signal.
This precisely conforms to POSIX (Issue 7, 2016) which says:
Any attempts by a process in a background process group to read from its controlling terminal cause its process group to be sent a SIGTTIN signal unless one of the following special cases applies: if the reading process is ignoring the SIGTTIN signal or the reading thread is blocking the SIGTTIN signal, or if the process group of the reading process is orphaned, the read() shall return -1, with errno set to [EIO] and no signal shall be sent. The default action of the SIGTTIN signal shall be to stop the process to which it is sent. [11.1.3 The Controlling Terminal]
The thing is, we don't expect the parent shell to become orphaned.
Might it simply be that the exiting child bash forces itself into the foreground as it exits, leaving the parent unexpectedly in the background?
Indeed, what I'm seeing in one of my strace
logs is that the parent bash is exiting before the child one, and the child is doing tcsetpgrp
to make itself the foreground. I.e. in some cases, the parent doesn't even get the SIGCHLD
signal; it gets the I/O error from the terminating child's TTY interference and bails. Then the child finishes its termination.
edited Jul 19 '17 at 23:48
answered Jul 19 '17 at 22:41
Kaz
4,56811432
4,56811432
Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
add a comment |
Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
Take a look at The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
add a comment |
It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
.
- It doesn't need
suspend
as it occurs also afterkill -STOP bash_pid
. - It doesn't occur if you
kill -9 %1
instead ofkill %1
. - It doesn't occur if you
kill pid
instead ofkill %1
. - It doesn't occur if the subprocess is something else than
bash
(trydash
orsleep 999
). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONTsleep 999
in this case, but it apparently does. - It doesn't occur in other shells (including
dash
executing adash
subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (theps uw
consistently shows the subprocess in stateT
). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.
It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
@Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
– kubanczyk
39 mins ago
It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
– Isaac
33 mins ago
You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
– kubanczyk
23 mins ago
add a comment |
It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
.
- It doesn't need
suspend
as it occurs also afterkill -STOP bash_pid
. - It doesn't occur if you
kill -9 %1
instead ofkill %1
. - It doesn't occur if you
kill pid
instead ofkill %1
. - It doesn't occur if the subprocess is something else than
bash
(trydash
orsleep 999
). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONTsleep 999
in this case, but it apparently does. - It doesn't occur in other shells (including
dash
executing adash
subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (theps uw
consistently shows the subprocess in stateT
). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.
It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
@Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
– kubanczyk
39 mins ago
It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
– Isaac
33 mins ago
You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
– kubanczyk
23 mins ago
add a comment |
It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
.
- It doesn't need
suspend
as it occurs also afterkill -STOP bash_pid
. - It doesn't occur if you
kill -9 %1
instead ofkill %1
. - It doesn't occur if you
kill pid
instead ofkill %1
. - It doesn't occur if the subprocess is something else than
bash
(trydash
orsleep 999
). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONTsleep 999
in this case, but it apparently does. - It doesn't occur in other shells (including
dash
executing adash
subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (theps uw
consistently shows the subprocess in stateT
). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.
It looks like a bug in bash. It replicates on my Ubuntu GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
.
- It doesn't need
suspend
as it occurs also afterkill -STOP bash_pid
. - It doesn't occur if you
kill -9 %1
instead ofkill %1
. - It doesn't occur if you
kill pid
instead ofkill %1
. - It doesn't occur if the subprocess is something else than
bash
(trydash
orsleep 999
). In that case however the bash behavior is still unexpected for me -- bash shouldn't SIGCONTsleep 999
in this case, but it apparently does. - It doesn't occur in other shells (including
dash
executing adash
subprocess) and they kill in a more expected way. Our stopped-and-killed subprocess remains stopped (theps uw
consistently shows the subprocess in stateT
). After you wake the subprocess with SIGCONT it processes the SIGTERM and dies without affecting its parent.
edited 27 mins ago
answered Jul 19 '17 at 20:42
kubanczyk
859514
859514
It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
@Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
– kubanczyk
39 mins ago
It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
– Isaac
33 mins ago
You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
– kubanczyk
23 mins ago
add a comment |
It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
@Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
– kubanczyk
39 mins ago
It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
– Isaac
33 mins ago
You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
– kubanczyk
23 mins ago
It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
It doesn't seem to be a bug: The terminal ends up in the wrong process group, and the parent shell can't do anything about it. You either get EOF on stdin, -1/EIO on the read, or a SIGHUP.
– Isaac
7 hours ago
@Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
– kubanczyk
39 mins ago
@Isaac That unparsable and confusing statement doesn't deny it's a bug. If you have arguments to back up your own opinion, please do provide a new answer. This will increase visibility and allow others to upvote or downvote it. The Kaz's analysis in this thread goes amazingly deep though.
– kubanczyk
39 mins ago
It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
– Isaac
33 mins ago
It is not "my own opinion". If you care to follow the provided link (click on the text) you will find out that that is the answer of Chet Ramey (the lead developer of bash) to the "bug report" @kubanczyk
– Isaac
33 mins ago
You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
– kubanczyk
23 mins ago
You say "doesn't seem to be a bug" and I don't see Chat Remney saying that.
– kubanczyk
23 mins ago
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f379507%2fweird-terminating-background-bash-kills-the-parent%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown