How can I fix broken shift-JIS filenames?












2














I've got some files with shift-jis filenames in ANSI.
e.g.



home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è 


when they should be in shift-jis like



home_03@青いトランク開いた、ファイル有り


This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?



edit:



another example



Ší‹ï‘ä@ƒXƒpƒi


should be



器具台@スパナ









share|improve this question





























    2














    I've got some files with shift-jis filenames in ANSI.
    e.g.



    home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è 


    when they should be in shift-jis like



    home_03@青いトランク開いた、ファイル有り


    This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?



    edit:



    another example



    Ší‹ï‘ä@ƒXƒpƒi


    should be



    器具台@スパナ









    share|improve this question



























      2












      2








      2


      1





      I've got some files with shift-jis filenames in ANSI.
      e.g.



      home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è 


      when they should be in shift-jis like



      home_03@青いトランク開いた、ファイル有り


      This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?



      edit:



      another example



      Ší‹ï‘ä@ƒXƒpƒi


      should be



      器具台@スパナ









      share|improve this question















      I've got some files with shift-jis filenames in ANSI.
      e.g.



      home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è 


      when they should be in shift-jis like



      home_03@青いトランク開いた、ファイル有り


      This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?



      edit:



      another example



      Ší‹ï‘ä@ƒXƒpƒi


      should be



      器具台@スパナ






      windows-8.1 filenames shift-jis






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 2 hours ago

























      asked 4 hours ago









      Hiccup

      1358




      1358






















          1 Answer
          1






          active

          oldest

          votes


















          2














          Since you're using Windows, PowerShell is probably the easiest method.



          Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




          1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

          2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

          3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

          4. Rename the file


          Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


          You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



          Then we apply the conversion steps:



          $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


          In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



          Putting it together with a standard loop:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


          Or if you prefer to recurse into subdirectories:



          $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
          $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
          Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


          Add -File to Get-ChildItem if you want to avoid renaming directories.





          Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.






          share|improve this answer























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "3"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            Since you're using Windows, PowerShell is probably the easiest method.



            Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




            1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

            2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

            3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

            4. Rename the file


            Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



            $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
            $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


            You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



            Then we apply the conversion steps:



            $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


            In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



            Putting it together with a standard loop:



            $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
            $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
            Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


            Or if you prefer to recurse into subdirectories:



            $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
            $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
            Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


            Add -File to Get-ChildItem if you want to avoid renaming directories.





            Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.






            share|improve this answer




























              2














              Since you're using Windows, PowerShell is probably the easiest method.



              Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




              1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

              2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

              3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

              4. Rename the file


              Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



              $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
              $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


              You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



              Then we apply the conversion steps:



              $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


              In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



              Putting it together with a standard loop:



              $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
              $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
              Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


              Or if you prefer to recurse into subdirectories:



              $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
              $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
              Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


              Add -File to Get-ChildItem if you want to avoid renaming directories.





              Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.






              share|improve this answer


























                2












                2








                2






                Since you're using Windows, PowerShell is probably the easiest method.



                Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




                1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

                2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

                3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

                4. Rename the file


                Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



                $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
                $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


                You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



                Then we apply the conversion steps:



                $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


                In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



                Putting it together with a standard loop:



                $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
                $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
                Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


                Or if you prefer to recurse into subdirectories:



                $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
                $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
                Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


                Add -File to Get-ChildItem if you want to avoid renaming directories.





                Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.






                share|improve this answer














                Since you're using Windows, PowerShell is probably the easiest method.



                Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:




                1. Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)

                2. Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).

                3. Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.

                4. Rename the file


                Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).



                $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
                $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")


                You could also use [System.Text.Encoding]::Default to get the current system codepage but I prefer to be explicit.



                Then we apply the conversion steps:



                $newName = $destEnc.GetString($srcEnc.GetBytes($oldName))


                In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è becomes home_03@ツいトランク開いたAファイル有り. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.



                Putting it together with a standard loop:



                $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
                $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
                Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


                Or if you prefer to recurse into subdirectories:



                $srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
                $destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
                Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}


                Add -File to Get-ChildItem if you want to avoid renaming directories.





                Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144 (0x90) between the first @ and Â, and a 129 (0x81) between the ½ and A. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 2 hours ago

























                answered 3 hours ago









                Bob

                45.2k20137171




                45.2k20137171






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Super User!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Accessing regular linux commands in Huawei's Dopra Linux

                    Can't connect RFCOMM socket: Host is down

                    Kernel panic - not syncing: Fatal Exception in Interrupt