csum errors on btrfs containing virtual disk












1














I have a btrfs disk that suffered some damage from hardware failure and spits i/o errors when copying certain files. I ran btrfs scrub, and when it reported csum errors I offlined it and did a btrfs check --check-data-csum and it returned several dozen of the following lines:



mirror 0 bytenr 549766098944 csum 1874004453 expected csum 2335064354



As far as I know, using --backup has a good chance to fix this problem, and would be the first step to take in repairing the file system. However, this was my virtual disk storage for qemu, and I'm worried that the internal congruity of the virtual disks (especially the windows one) will be harmed if I do this.



The btrfs manpage mentions an --init-csum-tree flag alongside other dangerous commands. Is this a good excuse to use this, or do I have other options?



CentOS Linux 7, kernel 3.10.0-514.26.2.el7.x86_64



btrfs-progs version 4.4.1 release 1.el7



Disk is a WD red 6TB (5.5TiB) WD60EFRX, one 5.5TiB partition



Virtual disks are in .qcow2 format










share|improve this question



























    1














    I have a btrfs disk that suffered some damage from hardware failure and spits i/o errors when copying certain files. I ran btrfs scrub, and when it reported csum errors I offlined it and did a btrfs check --check-data-csum and it returned several dozen of the following lines:



    mirror 0 bytenr 549766098944 csum 1874004453 expected csum 2335064354



    As far as I know, using --backup has a good chance to fix this problem, and would be the first step to take in repairing the file system. However, this was my virtual disk storage for qemu, and I'm worried that the internal congruity of the virtual disks (especially the windows one) will be harmed if I do this.



    The btrfs manpage mentions an --init-csum-tree flag alongside other dangerous commands. Is this a good excuse to use this, or do I have other options?



    CentOS Linux 7, kernel 3.10.0-514.26.2.el7.x86_64



    btrfs-progs version 4.4.1 release 1.el7



    Disk is a WD red 6TB (5.5TiB) WD60EFRX, one 5.5TiB partition



    Virtual disks are in .qcow2 format










    share|improve this question

























      1












      1








      1







      I have a btrfs disk that suffered some damage from hardware failure and spits i/o errors when copying certain files. I ran btrfs scrub, and when it reported csum errors I offlined it and did a btrfs check --check-data-csum and it returned several dozen of the following lines:



      mirror 0 bytenr 549766098944 csum 1874004453 expected csum 2335064354



      As far as I know, using --backup has a good chance to fix this problem, and would be the first step to take in repairing the file system. However, this was my virtual disk storage for qemu, and I'm worried that the internal congruity of the virtual disks (especially the windows one) will be harmed if I do this.



      The btrfs manpage mentions an --init-csum-tree flag alongside other dangerous commands. Is this a good excuse to use this, or do I have other options?



      CentOS Linux 7, kernel 3.10.0-514.26.2.el7.x86_64



      btrfs-progs version 4.4.1 release 1.el7



      Disk is a WD red 6TB (5.5TiB) WD60EFRX, one 5.5TiB partition



      Virtual disks are in .qcow2 format










      share|improve this question













      I have a btrfs disk that suffered some damage from hardware failure and spits i/o errors when copying certain files. I ran btrfs scrub, and when it reported csum errors I offlined it and did a btrfs check --check-data-csum and it returned several dozen of the following lines:



      mirror 0 bytenr 549766098944 csum 1874004453 expected csum 2335064354



      As far as I know, using --backup has a good chance to fix this problem, and would be the first step to take in repairing the file system. However, this was my virtual disk storage for qemu, and I'm worried that the internal congruity of the virtual disks (especially the windows one) will be harmed if I do this.



      The btrfs manpage mentions an --init-csum-tree flag alongside other dangerous commands. Is this a good excuse to use this, or do I have other options?



      CentOS Linux 7, kernel 3.10.0-514.26.2.el7.x86_64



      btrfs-progs version 4.4.1 release 1.el7



      Disk is a WD red 6TB (5.5TiB) WD60EFRX, one 5.5TiB partition



      Virtual disks are in .qcow2 format







      linux centos qemu btrfs checksum






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Aug 31 '17 at 15:39









      PSpacer

      336




      336






















          2 Answers
          2






          active

          oldest

          votes


















          0














          If the checksums are bad, the data is probably bad, and clearing out the checksum tree (which is what --init-csum-tree does) will not fix that, it will just expose the bad data directly to userspace and prevent detection of any other bit rot in old data on the FS. Essentially, you only had one copy of the data on the disk, and that copy is corrupted, so you're past the point of needing to worry about data potentially being bad in those disk images, since there almost certainly is some data corruption. If you only got a dozen or so of those error messages, then there won't be much corruption (each one should correspond to 4-16KiB of data, since BTRFS does checksums at the block level) at least, so that's a good thing.



          In this case, I would actually suggest using btrfs restore to pull the files off of the disk to a different location, or alternatively restoring from a backup. If you've just got a single disk and therefore no data replication, there's not much you can do when you get checksum errors short of restoring known good data to a new location.






          share|improve this answer





















          • Last backup was a few months ago, I'll pull what I can with restore and get the rest from backup. I checked the logs and I have exactly 192 checksum errors. That's only 1.5/5767168MiB, about 0.000026% corruption. Crossing my fingers...
            – PSpacer
            Aug 31 '17 at 16:25












          • Hate to be pessimistic here, but it's worth pointing out that that's about 0.000026% corruption in used blocks. BTRFS doesn't checksum unused space (because that would be a waste of time for almost everyone), so those are 192 blocks that failed checksums. If you're lucky, they still may be stuff like empty blocks inside the disk images though.
            – Austin Hemmelgarn
            Aug 31 '17 at 17:27










          • I guess it's a crapshoot until I boot it up, but if the corruption was caused by hardware failure elsewhere in the system (looks like memory) then the corrupted blocks were probably in use.
            – PSpacer
            Sep 1 '17 at 3:31



















          0














          Please note that there is a known issue with images of virtual machines on btrfs. So your data could indeed be ok. You should expect be more of these warnings/errors popping up in the future.
          https://www.spinics.net/lists/linux-btrfs/msg25940.html






          share|improve this answer








          New contributor




          Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.


















            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "106"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f389547%2fcsum-errors-on-btrfs-containing-virtual-disk%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            If the checksums are bad, the data is probably bad, and clearing out the checksum tree (which is what --init-csum-tree does) will not fix that, it will just expose the bad data directly to userspace and prevent detection of any other bit rot in old data on the FS. Essentially, you only had one copy of the data on the disk, and that copy is corrupted, so you're past the point of needing to worry about data potentially being bad in those disk images, since there almost certainly is some data corruption. If you only got a dozen or so of those error messages, then there won't be much corruption (each one should correspond to 4-16KiB of data, since BTRFS does checksums at the block level) at least, so that's a good thing.



            In this case, I would actually suggest using btrfs restore to pull the files off of the disk to a different location, or alternatively restoring from a backup. If you've just got a single disk and therefore no data replication, there's not much you can do when you get checksum errors short of restoring known good data to a new location.






            share|improve this answer





















            • Last backup was a few months ago, I'll pull what I can with restore and get the rest from backup. I checked the logs and I have exactly 192 checksum errors. That's only 1.5/5767168MiB, about 0.000026% corruption. Crossing my fingers...
              – PSpacer
              Aug 31 '17 at 16:25












            • Hate to be pessimistic here, but it's worth pointing out that that's about 0.000026% corruption in used blocks. BTRFS doesn't checksum unused space (because that would be a waste of time for almost everyone), so those are 192 blocks that failed checksums. If you're lucky, they still may be stuff like empty blocks inside the disk images though.
              – Austin Hemmelgarn
              Aug 31 '17 at 17:27










            • I guess it's a crapshoot until I boot it up, but if the corruption was caused by hardware failure elsewhere in the system (looks like memory) then the corrupted blocks were probably in use.
              – PSpacer
              Sep 1 '17 at 3:31
















            0














            If the checksums are bad, the data is probably bad, and clearing out the checksum tree (which is what --init-csum-tree does) will not fix that, it will just expose the bad data directly to userspace and prevent detection of any other bit rot in old data on the FS. Essentially, you only had one copy of the data on the disk, and that copy is corrupted, so you're past the point of needing to worry about data potentially being bad in those disk images, since there almost certainly is some data corruption. If you only got a dozen or so of those error messages, then there won't be much corruption (each one should correspond to 4-16KiB of data, since BTRFS does checksums at the block level) at least, so that's a good thing.



            In this case, I would actually suggest using btrfs restore to pull the files off of the disk to a different location, or alternatively restoring from a backup. If you've just got a single disk and therefore no data replication, there's not much you can do when you get checksum errors short of restoring known good data to a new location.






            share|improve this answer





















            • Last backup was a few months ago, I'll pull what I can with restore and get the rest from backup. I checked the logs and I have exactly 192 checksum errors. That's only 1.5/5767168MiB, about 0.000026% corruption. Crossing my fingers...
              – PSpacer
              Aug 31 '17 at 16:25












            • Hate to be pessimistic here, but it's worth pointing out that that's about 0.000026% corruption in used blocks. BTRFS doesn't checksum unused space (because that would be a waste of time for almost everyone), so those are 192 blocks that failed checksums. If you're lucky, they still may be stuff like empty blocks inside the disk images though.
              – Austin Hemmelgarn
              Aug 31 '17 at 17:27










            • I guess it's a crapshoot until I boot it up, but if the corruption was caused by hardware failure elsewhere in the system (looks like memory) then the corrupted blocks were probably in use.
              – PSpacer
              Sep 1 '17 at 3:31














            0












            0








            0






            If the checksums are bad, the data is probably bad, and clearing out the checksum tree (which is what --init-csum-tree does) will not fix that, it will just expose the bad data directly to userspace and prevent detection of any other bit rot in old data on the FS. Essentially, you only had one copy of the data on the disk, and that copy is corrupted, so you're past the point of needing to worry about data potentially being bad in those disk images, since there almost certainly is some data corruption. If you only got a dozen or so of those error messages, then there won't be much corruption (each one should correspond to 4-16KiB of data, since BTRFS does checksums at the block level) at least, so that's a good thing.



            In this case, I would actually suggest using btrfs restore to pull the files off of the disk to a different location, or alternatively restoring from a backup. If you've just got a single disk and therefore no data replication, there's not much you can do when you get checksum errors short of restoring known good data to a new location.






            share|improve this answer












            If the checksums are bad, the data is probably bad, and clearing out the checksum tree (which is what --init-csum-tree does) will not fix that, it will just expose the bad data directly to userspace and prevent detection of any other bit rot in old data on the FS. Essentially, you only had one copy of the data on the disk, and that copy is corrupted, so you're past the point of needing to worry about data potentially being bad in those disk images, since there almost certainly is some data corruption. If you only got a dozen or so of those error messages, then there won't be much corruption (each one should correspond to 4-16KiB of data, since BTRFS does checksums at the block level) at least, so that's a good thing.



            In this case, I would actually suggest using btrfs restore to pull the files off of the disk to a different location, or alternatively restoring from a backup. If you've just got a single disk and therefore no data replication, there's not much you can do when you get checksum errors short of restoring known good data to a new location.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Aug 31 '17 at 16:06









            Austin Hemmelgarn

            5,98111016




            5,98111016












            • Last backup was a few months ago, I'll pull what I can with restore and get the rest from backup. I checked the logs and I have exactly 192 checksum errors. That's only 1.5/5767168MiB, about 0.000026% corruption. Crossing my fingers...
              – PSpacer
              Aug 31 '17 at 16:25












            • Hate to be pessimistic here, but it's worth pointing out that that's about 0.000026% corruption in used blocks. BTRFS doesn't checksum unused space (because that would be a waste of time for almost everyone), so those are 192 blocks that failed checksums. If you're lucky, they still may be stuff like empty blocks inside the disk images though.
              – Austin Hemmelgarn
              Aug 31 '17 at 17:27










            • I guess it's a crapshoot until I boot it up, but if the corruption was caused by hardware failure elsewhere in the system (looks like memory) then the corrupted blocks were probably in use.
              – PSpacer
              Sep 1 '17 at 3:31


















            • Last backup was a few months ago, I'll pull what I can with restore and get the rest from backup. I checked the logs and I have exactly 192 checksum errors. That's only 1.5/5767168MiB, about 0.000026% corruption. Crossing my fingers...
              – PSpacer
              Aug 31 '17 at 16:25












            • Hate to be pessimistic here, but it's worth pointing out that that's about 0.000026% corruption in used blocks. BTRFS doesn't checksum unused space (because that would be a waste of time for almost everyone), so those are 192 blocks that failed checksums. If you're lucky, they still may be stuff like empty blocks inside the disk images though.
              – Austin Hemmelgarn
              Aug 31 '17 at 17:27










            • I guess it's a crapshoot until I boot it up, but if the corruption was caused by hardware failure elsewhere in the system (looks like memory) then the corrupted blocks were probably in use.
              – PSpacer
              Sep 1 '17 at 3:31
















            Last backup was a few months ago, I'll pull what I can with restore and get the rest from backup. I checked the logs and I have exactly 192 checksum errors. That's only 1.5/5767168MiB, about 0.000026% corruption. Crossing my fingers...
            – PSpacer
            Aug 31 '17 at 16:25






            Last backup was a few months ago, I'll pull what I can with restore and get the rest from backup. I checked the logs and I have exactly 192 checksum errors. That's only 1.5/5767168MiB, about 0.000026% corruption. Crossing my fingers...
            – PSpacer
            Aug 31 '17 at 16:25














            Hate to be pessimistic here, but it's worth pointing out that that's about 0.000026% corruption in used blocks. BTRFS doesn't checksum unused space (because that would be a waste of time for almost everyone), so those are 192 blocks that failed checksums. If you're lucky, they still may be stuff like empty blocks inside the disk images though.
            – Austin Hemmelgarn
            Aug 31 '17 at 17:27




            Hate to be pessimistic here, but it's worth pointing out that that's about 0.000026% corruption in used blocks. BTRFS doesn't checksum unused space (because that would be a waste of time for almost everyone), so those are 192 blocks that failed checksums. If you're lucky, they still may be stuff like empty blocks inside the disk images though.
            – Austin Hemmelgarn
            Aug 31 '17 at 17:27












            I guess it's a crapshoot until I boot it up, but if the corruption was caused by hardware failure elsewhere in the system (looks like memory) then the corrupted blocks were probably in use.
            – PSpacer
            Sep 1 '17 at 3:31




            I guess it's a crapshoot until I boot it up, but if the corruption was caused by hardware failure elsewhere in the system (looks like memory) then the corrupted blocks were probably in use.
            – PSpacer
            Sep 1 '17 at 3:31













            0














            Please note that there is a known issue with images of virtual machines on btrfs. So your data could indeed be ok. You should expect be more of these warnings/errors popping up in the future.
            https://www.spinics.net/lists/linux-btrfs/msg25940.html






            share|improve this answer








            New contributor




            Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.























              0














              Please note that there is a known issue with images of virtual machines on btrfs. So your data could indeed be ok. You should expect be more of these warnings/errors popping up in the future.
              https://www.spinics.net/lists/linux-btrfs/msg25940.html






              share|improve this answer








              New contributor




              Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





















                0












                0








                0






                Please note that there is a known issue with images of virtual machines on btrfs. So your data could indeed be ok. You should expect be more of these warnings/errors popping up in the future.
                https://www.spinics.net/lists/linux-btrfs/msg25940.html






                share|improve this answer








                New contributor




                Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                Please note that there is a known issue with images of virtual machines on btrfs. So your data could indeed be ok. You should expect be more of these warnings/errors popping up in the future.
                https://www.spinics.net/lists/linux-btrfs/msg25940.html







                share|improve this answer








                New contributor




                Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                share|improve this answer



                share|improve this answer






                New contributor




                Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 19 mins ago









                Martin

                1




                1




                New contributor




                Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                Martin is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Unix & Linux Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f389547%2fcsum-errors-on-btrfs-containing-virtual-disk%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Accessing regular linux commands in Huawei's Dopra Linux

                    Can't connect RFCOMM socket: Host is down

                    Kernel panic - not syncing: Fatal Exception in Interrupt