Distinguish ascii from UTF-8 characters in the same file












0















On Ubuntu 18.04, I created a dummy text file with just one UTF-8 character, è. The other characters are all ascii:



$ cat dummytext
Hello
Helloè


This is the resulting hexdump:



$ hexdump -C dummyfile
00000000 48 65 6c 6c 6f 0a 48 65 6c 6c 6f c3 a8 0a |Hello.Hello...|
0000000e


The file is identified as



$ file dummyfile
dummyfile2: UTF-8 Unicode text


Each character is represented by a single byte, except for the UTF-8 è character, which is c3a8, so it is represented by 2 bytes. How can the file contents be correctly interpreted, if the number of bytes used to represent each character is not constant?



My guess: maybe the parser, when encountering a hex value which is greater than the last ascii character 7F (and this is the case of c3), is forced to read at least another byte, to determine the right character to be printed?










share|improve this question























  • I think that you haven't quite expressed the question that you mean to ask. Your question seems actually to be two questions: How does file know that this is UTF-8, when it could instead be an old 8-bit encoding? followed by How does a UTF-8 decoder know where multiple-byte sequences begin and end?.

    – JdeBP
    5 hours ago











  • @JdeBP Maybe unconsciously the actual questions were the ones you wrote (even if I just used file as a further verification). DopeGhoti's answer fits to the second one. For the first one, maybe file looks for bytes "whose high order bit is set" and then is able to guess if there is an UTF-8 encoding.

    – BowPark
    5 hours ago






  • 1





    The file command on Ubuntu, as one of its tests, reads the first 96KiB of the file and checks whether there are any non-ASCII well-formed UTF-8 characters in it.

    – Mark Plotnick
    4 hours ago
















0















On Ubuntu 18.04, I created a dummy text file with just one UTF-8 character, è. The other characters are all ascii:



$ cat dummytext
Hello
Helloè


This is the resulting hexdump:



$ hexdump -C dummyfile
00000000 48 65 6c 6c 6f 0a 48 65 6c 6c 6f c3 a8 0a |Hello.Hello...|
0000000e


The file is identified as



$ file dummyfile
dummyfile2: UTF-8 Unicode text


Each character is represented by a single byte, except for the UTF-8 è character, which is c3a8, so it is represented by 2 bytes. How can the file contents be correctly interpreted, if the number of bytes used to represent each character is not constant?



My guess: maybe the parser, when encountering a hex value which is greater than the last ascii character 7F (and this is the case of c3), is forced to read at least another byte, to determine the right character to be printed?










share|improve this question























  • I think that you haven't quite expressed the question that you mean to ask. Your question seems actually to be two questions: How does file know that this is UTF-8, when it could instead be an old 8-bit encoding? followed by How does a UTF-8 decoder know where multiple-byte sequences begin and end?.

    – JdeBP
    5 hours ago











  • @JdeBP Maybe unconsciously the actual questions were the ones you wrote (even if I just used file as a further verification). DopeGhoti's answer fits to the second one. For the first one, maybe file looks for bytes "whose high order bit is set" and then is able to guess if there is an UTF-8 encoding.

    – BowPark
    5 hours ago






  • 1





    The file command on Ubuntu, as one of its tests, reads the first 96KiB of the file and checks whether there are any non-ASCII well-formed UTF-8 characters in it.

    – Mark Plotnick
    4 hours ago














0












0








0








On Ubuntu 18.04, I created a dummy text file with just one UTF-8 character, è. The other characters are all ascii:



$ cat dummytext
Hello
Helloè


This is the resulting hexdump:



$ hexdump -C dummyfile
00000000 48 65 6c 6c 6f 0a 48 65 6c 6c 6f c3 a8 0a |Hello.Hello...|
0000000e


The file is identified as



$ file dummyfile
dummyfile2: UTF-8 Unicode text


Each character is represented by a single byte, except for the UTF-8 è character, which is c3a8, so it is represented by 2 bytes. How can the file contents be correctly interpreted, if the number of bytes used to represent each character is not constant?



My guess: maybe the parser, when encountering a hex value which is greater than the last ascii character 7F (and this is the case of c3), is forced to read at least another byte, to determine the right character to be printed?










share|improve this question














On Ubuntu 18.04, I created a dummy text file with just one UTF-8 character, è. The other characters are all ascii:



$ cat dummytext
Hello
Helloè


This is the resulting hexdump:



$ hexdump -C dummyfile
00000000 48 65 6c 6c 6f 0a 48 65 6c 6c 6f c3 a8 0a |Hello.Hello...|
0000000e


The file is identified as



$ file dummyfile
dummyfile2: UTF-8 Unicode text


Each character is represented by a single byte, except for the UTF-8 è character, which is c3a8, so it is represented by 2 bytes. How can the file contents be correctly interpreted, if the number of bytes used to represent each character is not constant?



My guess: maybe the parser, when encountering a hex value which is greater than the last ascii character 7F (and this is the case of c3), is forced to read at least another byte, to determine the right character to be printed?







text-processing unicode character-encoding ascii






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 6 hours ago









BowParkBowPark

1,60882746




1,60882746













  • I think that you haven't quite expressed the question that you mean to ask. Your question seems actually to be two questions: How does file know that this is UTF-8, when it could instead be an old 8-bit encoding? followed by How does a UTF-8 decoder know where multiple-byte sequences begin and end?.

    – JdeBP
    5 hours ago











  • @JdeBP Maybe unconsciously the actual questions were the ones you wrote (even if I just used file as a further verification). DopeGhoti's answer fits to the second one. For the first one, maybe file looks for bytes "whose high order bit is set" and then is able to guess if there is an UTF-8 encoding.

    – BowPark
    5 hours ago






  • 1





    The file command on Ubuntu, as one of its tests, reads the first 96KiB of the file and checks whether there are any non-ASCII well-formed UTF-8 characters in it.

    – Mark Plotnick
    4 hours ago



















  • I think that you haven't quite expressed the question that you mean to ask. Your question seems actually to be two questions: How does file know that this is UTF-8, when it could instead be an old 8-bit encoding? followed by How does a UTF-8 decoder know where multiple-byte sequences begin and end?.

    – JdeBP
    5 hours ago











  • @JdeBP Maybe unconsciously the actual questions were the ones you wrote (even if I just used file as a further verification). DopeGhoti's answer fits to the second one. For the first one, maybe file looks for bytes "whose high order bit is set" and then is able to guess if there is an UTF-8 encoding.

    – BowPark
    5 hours ago






  • 1





    The file command on Ubuntu, as one of its tests, reads the first 96KiB of the file and checks whether there are any non-ASCII well-formed UTF-8 characters in it.

    – Mark Plotnick
    4 hours ago

















I think that you haven't quite expressed the question that you mean to ask. Your question seems actually to be two questions: How does file know that this is UTF-8, when it could instead be an old 8-bit encoding? followed by How does a UTF-8 decoder know where multiple-byte sequences begin and end?.

– JdeBP
5 hours ago





I think that you haven't quite expressed the question that you mean to ask. Your question seems actually to be two questions: How does file know that this is UTF-8, when it could instead be an old 8-bit encoding? followed by How does a UTF-8 decoder know where multiple-byte sequences begin and end?.

– JdeBP
5 hours ago













@JdeBP Maybe unconsciously the actual questions were the ones you wrote (even if I just used file as a further verification). DopeGhoti's answer fits to the second one. For the first one, maybe file looks for bytes "whose high order bit is set" and then is able to guess if there is an UTF-8 encoding.

– BowPark
5 hours ago





@JdeBP Maybe unconsciously the actual questions were the ones you wrote (even if I just used file as a further verification). DopeGhoti's answer fits to the second one. For the first one, maybe file looks for bytes "whose high order bit is set" and then is able to guess if there is an UTF-8 encoding.

– BowPark
5 hours ago




1




1





The file command on Ubuntu, as one of its tests, reads the first 96KiB of the file and checks whether there are any non-ASCII well-formed UTF-8 characters in it.

– Mark Plotnick
4 hours ago





The file command on Ubuntu, as one of its tests, reads the first 96KiB of the file and checks whether there are any non-ASCII well-formed UTF-8 characters in it.

– Mark Plotnick
4 hours ago










1 Answer
1






active

oldest

votes


















3














From the BSD manual, section 5, the page on UTF8 reads:




DESCRIPTION



The UTF-8 encoding represents UCS-4 characters as a sequence of octets, using between 1 and 6 for each character. It is backwards
compatible with ASCII, so 0x00-0x7f refer to the ASCII character set.



The multibyte encoding of non-ASCII characters consist entirely of bytes whose high order bit is set. The actual encoding is
represented by the following table:



 [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
[0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
[0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
1110bbbb, 10bbbbbb, 10bbbbbb
[0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb


If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80;0xE0 0x80 0x80), the shortest representation
is always used. Longer ones are detected as an error as they pose a
potential security risk, and destroy the 1:1 character:octet sequence mapping.




From the Linux manual, section 7, the page on UTF8 similarly reads:




DESCRIPTION



[... UTF-8 is situationally better than UCS-2 in part because i]n addition, the majority of UNIX tools expect ASCII files and can't read 16-bit words as characters without major modifications. [...]



The UTF-8 encoding of Unicode and UCS does not have these problems and is the common way in which Unicode is used on UNIX-style operating systems.



Properties



The UTF-8 encoding has the following nice properties:




  • UCS characters 0x00000000 to 0x0000007f (the classic US-ASCII characters) are encoded simply as bytes 0x00 to 0x7f (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8.




So it's not really possible to distinguish ASCII from UTF-8 because, in a UTF-8 file, ASCII is UTF-8. file looks at the first 96KiB of a file and tries to determine what it is. Because it sees more than zero UTF-8 code sequences, it determines the file to be UTF-8 because it is a strict superset of ASCII.






share|improve this answer


























  • Thank you. In Ubuntu there is not the same manpage. The homologous one is in section 7, and it is not as concise and clear as yours, which can instead be found in FreeBSD.

    – BowPark
    4 hours ago













  • I've added a similar citation from the Linux manual (7) to go along with the BSD manual (5) one.

    – DopeGhoti
    2 hours ago













  • Thank you so much!

    – BowPark
    1 hour ago











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f507782%2fdistinguish-ascii-from-utf-8-characters-in-the-same-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














From the BSD manual, section 5, the page on UTF8 reads:




DESCRIPTION



The UTF-8 encoding represents UCS-4 characters as a sequence of octets, using between 1 and 6 for each character. It is backwards
compatible with ASCII, so 0x00-0x7f refer to the ASCII character set.



The multibyte encoding of non-ASCII characters consist entirely of bytes whose high order bit is set. The actual encoding is
represented by the following table:



 [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
[0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
[0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
1110bbbb, 10bbbbbb, 10bbbbbb
[0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb


If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80;0xE0 0x80 0x80), the shortest representation
is always used. Longer ones are detected as an error as they pose a
potential security risk, and destroy the 1:1 character:octet sequence mapping.




From the Linux manual, section 7, the page on UTF8 similarly reads:




DESCRIPTION



[... UTF-8 is situationally better than UCS-2 in part because i]n addition, the majority of UNIX tools expect ASCII files and can't read 16-bit words as characters without major modifications. [...]



The UTF-8 encoding of Unicode and UCS does not have these problems and is the common way in which Unicode is used on UNIX-style operating systems.



Properties



The UTF-8 encoding has the following nice properties:




  • UCS characters 0x00000000 to 0x0000007f (the classic US-ASCII characters) are encoded simply as bytes 0x00 to 0x7f (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8.




So it's not really possible to distinguish ASCII from UTF-8 because, in a UTF-8 file, ASCII is UTF-8. file looks at the first 96KiB of a file and tries to determine what it is. Because it sees more than zero UTF-8 code sequences, it determines the file to be UTF-8 because it is a strict superset of ASCII.






share|improve this answer


























  • Thank you. In Ubuntu there is not the same manpage. The homologous one is in section 7, and it is not as concise and clear as yours, which can instead be found in FreeBSD.

    – BowPark
    4 hours ago













  • I've added a similar citation from the Linux manual (7) to go along with the BSD manual (5) one.

    – DopeGhoti
    2 hours ago













  • Thank you so much!

    – BowPark
    1 hour ago
















3














From the BSD manual, section 5, the page on UTF8 reads:




DESCRIPTION



The UTF-8 encoding represents UCS-4 characters as a sequence of octets, using between 1 and 6 for each character. It is backwards
compatible with ASCII, so 0x00-0x7f refer to the ASCII character set.



The multibyte encoding of non-ASCII characters consist entirely of bytes whose high order bit is set. The actual encoding is
represented by the following table:



 [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
[0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
[0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
1110bbbb, 10bbbbbb, 10bbbbbb
[0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb


If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80;0xE0 0x80 0x80), the shortest representation
is always used. Longer ones are detected as an error as they pose a
potential security risk, and destroy the 1:1 character:octet sequence mapping.




From the Linux manual, section 7, the page on UTF8 similarly reads:




DESCRIPTION



[... UTF-8 is situationally better than UCS-2 in part because i]n addition, the majority of UNIX tools expect ASCII files and can't read 16-bit words as characters without major modifications. [...]



The UTF-8 encoding of Unicode and UCS does not have these problems and is the common way in which Unicode is used on UNIX-style operating systems.



Properties



The UTF-8 encoding has the following nice properties:




  • UCS characters 0x00000000 to 0x0000007f (the classic US-ASCII characters) are encoded simply as bytes 0x00 to 0x7f (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8.




So it's not really possible to distinguish ASCII from UTF-8 because, in a UTF-8 file, ASCII is UTF-8. file looks at the first 96KiB of a file and tries to determine what it is. Because it sees more than zero UTF-8 code sequences, it determines the file to be UTF-8 because it is a strict superset of ASCII.






share|improve this answer


























  • Thank you. In Ubuntu there is not the same manpage. The homologous one is in section 7, and it is not as concise and clear as yours, which can instead be found in FreeBSD.

    – BowPark
    4 hours ago













  • I've added a similar citation from the Linux manual (7) to go along with the BSD manual (5) one.

    – DopeGhoti
    2 hours ago













  • Thank you so much!

    – BowPark
    1 hour ago














3












3








3







From the BSD manual, section 5, the page on UTF8 reads:




DESCRIPTION



The UTF-8 encoding represents UCS-4 characters as a sequence of octets, using between 1 and 6 for each character. It is backwards
compatible with ASCII, so 0x00-0x7f refer to the ASCII character set.



The multibyte encoding of non-ASCII characters consist entirely of bytes whose high order bit is set. The actual encoding is
represented by the following table:



 [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
[0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
[0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
1110bbbb, 10bbbbbb, 10bbbbbb
[0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb


If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80;0xE0 0x80 0x80), the shortest representation
is always used. Longer ones are detected as an error as they pose a
potential security risk, and destroy the 1:1 character:octet sequence mapping.




From the Linux manual, section 7, the page on UTF8 similarly reads:




DESCRIPTION



[... UTF-8 is situationally better than UCS-2 in part because i]n addition, the majority of UNIX tools expect ASCII files and can't read 16-bit words as characters without major modifications. [...]



The UTF-8 encoding of Unicode and UCS does not have these problems and is the common way in which Unicode is used on UNIX-style operating systems.



Properties



The UTF-8 encoding has the following nice properties:




  • UCS characters 0x00000000 to 0x0000007f (the classic US-ASCII characters) are encoded simply as bytes 0x00 to 0x7f (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8.




So it's not really possible to distinguish ASCII from UTF-8 because, in a UTF-8 file, ASCII is UTF-8. file looks at the first 96KiB of a file and tries to determine what it is. Because it sees more than zero UTF-8 code sequences, it determines the file to be UTF-8 because it is a strict superset of ASCII.






share|improve this answer















From the BSD manual, section 5, the page on UTF8 reads:




DESCRIPTION



The UTF-8 encoding represents UCS-4 characters as a sequence of octets, using between 1 and 6 for each character. It is backwards
compatible with ASCII, so 0x00-0x7f refer to the ASCII character set.



The multibyte encoding of non-ASCII characters consist entirely of bytes whose high order bit is set. The actual encoding is
represented by the following table:



 [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
[0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
[0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
1110bbbb, 10bbbbbb, 10bbbbbb
[0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb


If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80;0xE0 0x80 0x80), the shortest representation
is always used. Longer ones are detected as an error as they pose a
potential security risk, and destroy the 1:1 character:octet sequence mapping.




From the Linux manual, section 7, the page on UTF8 similarly reads:




DESCRIPTION



[... UTF-8 is situationally better than UCS-2 in part because i]n addition, the majority of UNIX tools expect ASCII files and can't read 16-bit words as characters without major modifications. [...]



The UTF-8 encoding of Unicode and UCS does not have these problems and is the common way in which Unicode is used on UNIX-style operating systems.



Properties



The UTF-8 encoding has the following nice properties:




  • UCS characters 0x00000000 to 0x0000007f (the classic US-ASCII characters) are encoded simply as bytes 0x00 to 0x7f (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8.




So it's not really possible to distinguish ASCII from UTF-8 because, in a UTF-8 file, ASCII is UTF-8. file looks at the first 96KiB of a file and tries to determine what it is. Because it sees more than zero UTF-8 code sequences, it determines the file to be UTF-8 because it is a strict superset of ASCII.







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 hours ago

























answered 6 hours ago









DopeGhotiDopeGhoti

46.5k56190




46.5k56190













  • Thank you. In Ubuntu there is not the same manpage. The homologous one is in section 7, and it is not as concise and clear as yours, which can instead be found in FreeBSD.

    – BowPark
    4 hours ago













  • I've added a similar citation from the Linux manual (7) to go along with the BSD manual (5) one.

    – DopeGhoti
    2 hours ago













  • Thank you so much!

    – BowPark
    1 hour ago



















  • Thank you. In Ubuntu there is not the same manpage. The homologous one is in section 7, and it is not as concise and clear as yours, which can instead be found in FreeBSD.

    – BowPark
    4 hours ago













  • I've added a similar citation from the Linux manual (7) to go along with the BSD manual (5) one.

    – DopeGhoti
    2 hours ago













  • Thank you so much!

    – BowPark
    1 hour ago

















Thank you. In Ubuntu there is not the same manpage. The homologous one is in section 7, and it is not as concise and clear as yours, which can instead be found in FreeBSD.

– BowPark
4 hours ago







Thank you. In Ubuntu there is not the same manpage. The homologous one is in section 7, and it is not as concise and clear as yours, which can instead be found in FreeBSD.

– BowPark
4 hours ago















I've added a similar citation from the Linux manual (7) to go along with the BSD manual (5) one.

– DopeGhoti
2 hours ago







I've added a similar citation from the Linux manual (7) to go along with the BSD manual (5) one.

– DopeGhoti
2 hours ago















Thank you so much!

– BowPark
1 hour ago





Thank you so much!

– BowPark
1 hour ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f507782%2fdistinguish-ascii-from-utf-8-characters-in-the-same-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Entries order in /etc/network/interfaces

新発田市

Grub takes very long (several minutes) to open Menu (in Multi-Boot-System)