processes hanging when trying to access a file












1















I'm running Ubuntu 16.04.5 and the ext4 filesystem, and I have a file which causes processes to hang when they read it.



For example:



$ cat /path/to/file.txt


and then from a separate terminal:



$ sudo cat /proc/24147/stack
[<ffffffff811937dd>] wait_on_page_bit_killable+0xcd/0xf0
[<ffffffff81193c86>] generic_file_read_iter+0x486/0x6b0
[<ffffffff8121395e>] new_sync_read+0x9e/0xe0
[<ffffffff812139c9>] __vfs_read+0x29/0x40
[<ffffffff81213f96>] vfs_read+0x86/0x130
[<ffffffff81214ce5>] SyS_read+0x55/0xc0
[<ffffffff81829d4e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[<ffffffffffffffff>] 0xffffffffffffffff


I have many processes hung reading this file. Here's another example with a different call stack:



$ less /path/to/file.txt


and then from a separate terminal:



$ sudo cat /proc/23006/stack
[<ffffffff8140f954>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff812a1e83>] ext4_map_blocks+0x443/0x5a0
[<ffffffff812edb98>] ext4_mpage_readpages+0x368/0x920
[<ffffffff8129f626>] ext4_readpages+0x36/0x40
[<ffffffff811a0b89>] __do_page_cache_readahead+0x199/0x240
[<ffffffff811a0d6d>] ondemand_readahead+0x13d/0x250
[<ffffffff811a101e>] page_cache_sync_readahead+0x2e/0x50
[<ffffffff81193d4a>] generic_file_read_iter+0x54a/0x6b0
[<ffffffff8121395e>] new_sync_read+0x9e/0xe0
[<ffffffff812139c9>] __vfs_read+0x29/0x40
[<ffffffff81213f96>] vfs_read+0x86/0x130
[<ffffffff81214ce5>] SyS_read+0x55/0xc0
[<ffffffff81829d4e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[<ffffffffffffffff>] 0xffffffffffffffff


I need to narrow down the actual problem a bit. The context is somewhat involved (I'm running Apache and using mod_wsgi, and then Python code is writing to the file which is a log file, this volume is a RAID array on top of the instance store on an AWS instance etc, etc).



Is it possible to get a linux box into a state where cat'ing a file causes the terminal to hang as I'm showing above. From there I may determine what additional context (if any) would be useful here.



I should mention that this happens on a certain production machine (which sees heavy use) every month or so. I can restart the machine to recover, but I'm interested in understanding this state which it's in. Ideally I'd like to prevent this from happening in the first place.










share|improve this question









New contributor




nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • I believe this is the same: unix.stackexchange.com/questions/350761/…

    – V13
    4 hours ago











  • @V13 I added another example from the same machine with a different call stack. I'm not sure it's the same problem as the other question you referenced, but I guess that's the thing - I have no idea what it could be and was hoping for some input to narrow it down a bit.

    – nonagon
    41 mins ago











  • Can you SIGKILL that process? if not, then they're the same question, your filesystem or disk is broken, try to do offline fsck.

    – 炸鱼薯条德里克
    36 mins ago
















1















I'm running Ubuntu 16.04.5 and the ext4 filesystem, and I have a file which causes processes to hang when they read it.



For example:



$ cat /path/to/file.txt


and then from a separate terminal:



$ sudo cat /proc/24147/stack
[<ffffffff811937dd>] wait_on_page_bit_killable+0xcd/0xf0
[<ffffffff81193c86>] generic_file_read_iter+0x486/0x6b0
[<ffffffff8121395e>] new_sync_read+0x9e/0xe0
[<ffffffff812139c9>] __vfs_read+0x29/0x40
[<ffffffff81213f96>] vfs_read+0x86/0x130
[<ffffffff81214ce5>] SyS_read+0x55/0xc0
[<ffffffff81829d4e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[<ffffffffffffffff>] 0xffffffffffffffff


I have many processes hung reading this file. Here's another example with a different call stack:



$ less /path/to/file.txt


and then from a separate terminal:



$ sudo cat /proc/23006/stack
[<ffffffff8140f954>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff812a1e83>] ext4_map_blocks+0x443/0x5a0
[<ffffffff812edb98>] ext4_mpage_readpages+0x368/0x920
[<ffffffff8129f626>] ext4_readpages+0x36/0x40
[<ffffffff811a0b89>] __do_page_cache_readahead+0x199/0x240
[<ffffffff811a0d6d>] ondemand_readahead+0x13d/0x250
[<ffffffff811a101e>] page_cache_sync_readahead+0x2e/0x50
[<ffffffff81193d4a>] generic_file_read_iter+0x54a/0x6b0
[<ffffffff8121395e>] new_sync_read+0x9e/0xe0
[<ffffffff812139c9>] __vfs_read+0x29/0x40
[<ffffffff81213f96>] vfs_read+0x86/0x130
[<ffffffff81214ce5>] SyS_read+0x55/0xc0
[<ffffffff81829d4e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[<ffffffffffffffff>] 0xffffffffffffffff


I need to narrow down the actual problem a bit. The context is somewhat involved (I'm running Apache and using mod_wsgi, and then Python code is writing to the file which is a log file, this volume is a RAID array on top of the instance store on an AWS instance etc, etc).



Is it possible to get a linux box into a state where cat'ing a file causes the terminal to hang as I'm showing above. From there I may determine what additional context (if any) would be useful here.



I should mention that this happens on a certain production machine (which sees heavy use) every month or so. I can restart the machine to recover, but I'm interested in understanding this state which it's in. Ideally I'd like to prevent this from happening in the first place.










share|improve this question









New contributor




nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • I believe this is the same: unix.stackexchange.com/questions/350761/…

    – V13
    4 hours ago











  • @V13 I added another example from the same machine with a different call stack. I'm not sure it's the same problem as the other question you referenced, but I guess that's the thing - I have no idea what it could be and was hoping for some input to narrow it down a bit.

    – nonagon
    41 mins ago











  • Can you SIGKILL that process? if not, then they're the same question, your filesystem or disk is broken, try to do offline fsck.

    – 炸鱼薯条德里克
    36 mins ago














1












1








1








I'm running Ubuntu 16.04.5 and the ext4 filesystem, and I have a file which causes processes to hang when they read it.



For example:



$ cat /path/to/file.txt


and then from a separate terminal:



$ sudo cat /proc/24147/stack
[<ffffffff811937dd>] wait_on_page_bit_killable+0xcd/0xf0
[<ffffffff81193c86>] generic_file_read_iter+0x486/0x6b0
[<ffffffff8121395e>] new_sync_read+0x9e/0xe0
[<ffffffff812139c9>] __vfs_read+0x29/0x40
[<ffffffff81213f96>] vfs_read+0x86/0x130
[<ffffffff81214ce5>] SyS_read+0x55/0xc0
[<ffffffff81829d4e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[<ffffffffffffffff>] 0xffffffffffffffff


I have many processes hung reading this file. Here's another example with a different call stack:



$ less /path/to/file.txt


and then from a separate terminal:



$ sudo cat /proc/23006/stack
[<ffffffff8140f954>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff812a1e83>] ext4_map_blocks+0x443/0x5a0
[<ffffffff812edb98>] ext4_mpage_readpages+0x368/0x920
[<ffffffff8129f626>] ext4_readpages+0x36/0x40
[<ffffffff811a0b89>] __do_page_cache_readahead+0x199/0x240
[<ffffffff811a0d6d>] ondemand_readahead+0x13d/0x250
[<ffffffff811a101e>] page_cache_sync_readahead+0x2e/0x50
[<ffffffff81193d4a>] generic_file_read_iter+0x54a/0x6b0
[<ffffffff8121395e>] new_sync_read+0x9e/0xe0
[<ffffffff812139c9>] __vfs_read+0x29/0x40
[<ffffffff81213f96>] vfs_read+0x86/0x130
[<ffffffff81214ce5>] SyS_read+0x55/0xc0
[<ffffffff81829d4e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[<ffffffffffffffff>] 0xffffffffffffffff


I need to narrow down the actual problem a bit. The context is somewhat involved (I'm running Apache and using mod_wsgi, and then Python code is writing to the file which is a log file, this volume is a RAID array on top of the instance store on an AWS instance etc, etc).



Is it possible to get a linux box into a state where cat'ing a file causes the terminal to hang as I'm showing above. From there I may determine what additional context (if any) would be useful here.



I should mention that this happens on a certain production machine (which sees heavy use) every month or so. I can restart the machine to recover, but I'm interested in understanding this state which it's in. Ideally I'd like to prevent this from happening in the first place.










share|improve this question









New contributor




nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I'm running Ubuntu 16.04.5 and the ext4 filesystem, and I have a file which causes processes to hang when they read it.



For example:



$ cat /path/to/file.txt


and then from a separate terminal:



$ sudo cat /proc/24147/stack
[<ffffffff811937dd>] wait_on_page_bit_killable+0xcd/0xf0
[<ffffffff81193c86>] generic_file_read_iter+0x486/0x6b0
[<ffffffff8121395e>] new_sync_read+0x9e/0xe0
[<ffffffff812139c9>] __vfs_read+0x29/0x40
[<ffffffff81213f96>] vfs_read+0x86/0x130
[<ffffffff81214ce5>] SyS_read+0x55/0xc0
[<ffffffff81829d4e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[<ffffffffffffffff>] 0xffffffffffffffff


I have many processes hung reading this file. Here's another example with a different call stack:



$ less /path/to/file.txt


and then from a separate terminal:



$ sudo cat /proc/23006/stack
[<ffffffff8140f954>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff812a1e83>] ext4_map_blocks+0x443/0x5a0
[<ffffffff812edb98>] ext4_mpage_readpages+0x368/0x920
[<ffffffff8129f626>] ext4_readpages+0x36/0x40
[<ffffffff811a0b89>] __do_page_cache_readahead+0x199/0x240
[<ffffffff811a0d6d>] ondemand_readahead+0x13d/0x250
[<ffffffff811a101e>] page_cache_sync_readahead+0x2e/0x50
[<ffffffff81193d4a>] generic_file_read_iter+0x54a/0x6b0
[<ffffffff8121395e>] new_sync_read+0x9e/0xe0
[<ffffffff812139c9>] __vfs_read+0x29/0x40
[<ffffffff81213f96>] vfs_read+0x86/0x130
[<ffffffff81214ce5>] SyS_read+0x55/0xc0
[<ffffffff81829d4e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[<ffffffffffffffff>] 0xffffffffffffffff


I need to narrow down the actual problem a bit. The context is somewhat involved (I'm running Apache and using mod_wsgi, and then Python code is writing to the file which is a log file, this volume is a RAID array on top of the instance store on an AWS instance etc, etc).



Is it possible to get a linux box into a state where cat'ing a file causes the terminal to hang as I'm showing above. From there I may determine what additional context (if any) would be useful here.



I should mention that this happens on a certain production machine (which sees heavy use) every month or so. I can restart the machine to recover, but I'm interested in understanding this state which it's in. Ideally I'd like to prevent this from happening in the first place.







filesystems ext4






share|improve this question









New contributor




nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 41 mins ago







nonagon













New contributor




nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 5 hours ago









nonagonnonagon

1063




1063




New contributor




nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






nonagon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • I believe this is the same: unix.stackexchange.com/questions/350761/…

    – V13
    4 hours ago











  • @V13 I added another example from the same machine with a different call stack. I'm not sure it's the same problem as the other question you referenced, but I guess that's the thing - I have no idea what it could be and was hoping for some input to narrow it down a bit.

    – nonagon
    41 mins ago











  • Can you SIGKILL that process? if not, then they're the same question, your filesystem or disk is broken, try to do offline fsck.

    – 炸鱼薯条德里克
    36 mins ago



















  • I believe this is the same: unix.stackexchange.com/questions/350761/…

    – V13
    4 hours ago











  • @V13 I added another example from the same machine with a different call stack. I'm not sure it's the same problem as the other question you referenced, but I guess that's the thing - I have no idea what it could be and was hoping for some input to narrow it down a bit.

    – nonagon
    41 mins ago











  • Can you SIGKILL that process? if not, then they're the same question, your filesystem or disk is broken, try to do offline fsck.

    – 炸鱼薯条德里克
    36 mins ago

















I believe this is the same: unix.stackexchange.com/questions/350761/…

– V13
4 hours ago





I believe this is the same: unix.stackexchange.com/questions/350761/…

– V13
4 hours ago













@V13 I added another example from the same machine with a different call stack. I'm not sure it's the same problem as the other question you referenced, but I guess that's the thing - I have no idea what it could be and was hoping for some input to narrow it down a bit.

– nonagon
41 mins ago





@V13 I added another example from the same machine with a different call stack. I'm not sure it's the same problem as the other question you referenced, but I guess that's the thing - I have no idea what it could be and was hoping for some input to narrow it down a bit.

– nonagon
41 mins ago













Can you SIGKILL that process? if not, then they're the same question, your filesystem or disk is broken, try to do offline fsck.

– 炸鱼薯条德里克
36 mins ago





Can you SIGKILL that process? if not, then they're the same question, your filesystem or disk is broken, try to do offline fsck.

– 炸鱼薯条德里克
36 mins ago










0






active

oldest

votes











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






nonagon is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495854%2fprocesses-hanging-when-trying-to-access-a-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes








nonagon is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















nonagon is a new contributor. Be nice, and check out our Code of Conduct.













nonagon is a new contributor. Be nice, and check out our Code of Conduct.












nonagon is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495854%2fprocesses-hanging-when-trying-to-access-a-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Entries order in /etc/network/interfaces

新発田市

Grub takes very long (several minutes) to open Menu (in Multi-Boot-System)