How can I fix broken shift-JIS filenames?
I've got some files with shift-jis filenames in ANSI.
e.g.
home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
when they should be in shift-jis like
home_03@青いトランク開いた、ファイル有り
This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?
edit:
another example
Ší‹ï‘ä@ƒXƒpƒi
should be
器具台@スパナ
windows-8.1 filenames shift-jis
add a comment |
I've got some files with shift-jis filenames in ANSI.
e.g.
home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
when they should be in shift-jis like
home_03@青いトランク開いた、ファイル有り
This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?
edit:
another example
Ší‹ï‘ä@ƒXƒpƒi
should be
器具台@スパナ
windows-8.1 filenames shift-jis
add a comment |
I've got some files with shift-jis filenames in ANSI.
e.g.
home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
when they should be in shift-jis like
home_03@青いトランク開いた、ファイル有り
This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?
edit:
another example
Ší‹ï‘ä@ƒXƒpƒi
should be
器具台@スパナ
windows-8.1 filenames shift-jis
I've got some files with shift-jis filenames in ANSI.
e.g.
home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
when they should be in shift-jis like
home_03@青いトランク開いた、ファイル有り
This is because the archive extractor I'm using doesn't support shift-jis. That can't really be helped. But is there a way to fix the filenames of the files I've extracted?
edit:
another example
Ší‹ï‘ä@ƒXƒpƒi
should be
器具台@スパナ
windows-8.1 filenames shift-jis
windows-8.1 filenames shift-jis
edited 2 hours ago
asked 4 hours ago
Hiccup
1358
1358
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
add a comment |
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
add a comment |
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
Since you're using Windows, PowerShell is probably the easiest method.
Now, PowerShell internally uses UTF-16 for its strings, so a conversion would involve four steps:
- Read the incorrect filename from the filesystem into PS (represented internally as a UTF-16 string)
- Tell PS to convert the string into a raw byte array as if the string were <incorrect encoding>. We can't use the PS string directly (as it's UTF-16).
- Tell PS to convert the byte array back to a string interpreting it as <correct encoding>. This will give use a UTF-16 string of the raw bytes interpreted as Shift-JIS.
- Rename the file
Let's start by defining the encodings. In your case, I'm guessing your source is Windows-1252 (default non-Unicode codepage for Western/English Windows).
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
You could also use [System.Text.Encoding]::Default
to get the current system codepage but I prefer to be explicit.
Then we apply the conversion steps:
$newName = $destEnc.GetString($srcEnc.GetBytes($oldName))
In your example, home_03@‚¢ƒgƒ‰ƒ“ƒNŠJ‚¢‚½Aƒtƒ@ƒCƒ‹—L‚è
becomes home_03@ツいトランク開いたAファイル有り
. While this is different from your example result (see notes at bottom), it matches what I get from http://string-functions.com/encodedecode.aspx's Windows-1252 => Shift-JIS. If this is incorrect, you may have to play around until you find the correct source and destination encodings.
Putting it together with a standard loop:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Or if you prefer to recurse into subdirectories:
$srcEnc = [System.Text.Encoding]::GetEncoding("Windows-1252")
$destEnc = [System.Text.Encoding]::GetEncoding("Shift-JIS")
Get-ChildItem -Recurse | %{Rename-Item $_ $destEnc.GetString($srcEnc.GetBytes($_.Name))}
Add -File
to Get-ChildItem
if you want to avoid renaming directories.
Looks like your example included two characters that were invalid in Windows-1252 and were likely dropped when you posted the question (based on reversing the process using your example output). There's a 144
(0x90
) between the first @
and Â
, and a 129
(0x81
) between the ½
and A
. For the convenience of anyone else looking to test, here's a base64-encoded version of the raw bytes: aG9tZV8wM0CQwoKig2eDiYOTg06KSoKigr2BQYN0g0CDQ4OLl0yC6A==
.
edited 2 hours ago
answered 3 hours ago
Bob
45.2k20137171
45.2k20137171
add a comment |
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1387145%2fhow-can-i-fix-broken-shift-jis-filenames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown