simple command to strip header and footer from a file
I want a command to strip an XML-Header and Footer from a file:
<?xml version="1.0" encoding="UTF-8"?>
<conxml>
<MsgPain001>
<HashValue>A9C72997C702A2F841B0EEEC3BD274DE1CB7BEA4B813E030D068CB853BCFECA6</HashValue>
<HashAlgorithm>SHA256</HashAlgorithm>
<Document>
...
</Document>
<Document>
...
</Document>
</MsgPain001>
</conxml>
...
Should become just
<Document>
...
</Document>
<Document>
...
</Document>
(note the indenting, the indent of the first document-tag should be stripped of.
This sounds like a (greedy) regex
<Document>.*</Document>
But I don't get it due to the linefeeds.
I need it in a pipe to compute a hash over the contained documents.
sed regular-expression
add a comment |
I want a command to strip an XML-Header and Footer from a file:
<?xml version="1.0" encoding="UTF-8"?>
<conxml>
<MsgPain001>
<HashValue>A9C72997C702A2F841B0EEEC3BD274DE1CB7BEA4B813E030D068CB853BCFECA6</HashValue>
<HashAlgorithm>SHA256</HashAlgorithm>
<Document>
...
</Document>
<Document>
...
</Document>
</MsgPain001>
</conxml>
...
Should become just
<Document>
...
</Document>
<Document>
...
</Document>
(note the indenting, the indent of the first document-tag should be stripped of.
This sounds like a (greedy) regex
<Document>.*</Document>
But I don't get it due to the linefeeds.
I need it in a pipe to compute a hash over the contained documents.
sed regular-expression
add a comment |
I want a command to strip an XML-Header and Footer from a file:
<?xml version="1.0" encoding="UTF-8"?>
<conxml>
<MsgPain001>
<HashValue>A9C72997C702A2F841B0EEEC3BD274DE1CB7BEA4B813E030D068CB853BCFECA6</HashValue>
<HashAlgorithm>SHA256</HashAlgorithm>
<Document>
...
</Document>
<Document>
...
</Document>
</MsgPain001>
</conxml>
...
Should become just
<Document>
...
</Document>
<Document>
...
</Document>
(note the indenting, the indent of the first document-tag should be stripped of.
This sounds like a (greedy) regex
<Document>.*</Document>
But I don't get it due to the linefeeds.
I need it in a pipe to compute a hash over the contained documents.
sed regular-expression
I want a command to strip an XML-Header and Footer from a file:
<?xml version="1.0" encoding="UTF-8"?>
<conxml>
<MsgPain001>
<HashValue>A9C72997C702A2F841B0EEEC3BD274DE1CB7BEA4B813E030D068CB853BCFECA6</HashValue>
<HashAlgorithm>SHA256</HashAlgorithm>
<Document>
...
</Document>
<Document>
...
</Document>
</MsgPain001>
</conxml>
...
Should become just
<Document>
...
</Document>
<Document>
...
</Document>
(note the indenting, the indent of the first document-tag should be stripped of.
This sounds like a (greedy) regex
<Document>.*</Document>
But I don't get it due to the linefeeds.
I need it in a pipe to compute a hash over the contained documents.
sed regular-expression
sed regular-expression
edited 29 mins ago
Rui F Ribeiro
41.5k1483140
41.5k1483140
asked Oct 20 '11 at 13:34
BastlBastl
2314
2314
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Using sed
:
sed -n '/<Document>/,/</Document>/ p' yourfile.xml
Explanation:
-n
makessed
silent, meaning it does not output the whole file contents,
/pattern/
searches for lines including specified pattern,
a
,
b
(the comma) tellssed
to perform an action on the lines froma
tob
(wherea
andb
get defined by matching the above patterns),
p
stands for print and is the action performed on the lines that matched the above.
Edit: If you'd like to additionally strip the whitespace before <Document>
, it can be done this way:
sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml
thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?
– Bastl
Oct 20 '11 at 13:50
It works with whitespace as well as any other characters surrounding<Document>
. See the update of my answer for deeper explanation.
– rozcietrzewiacz
Oct 20 '11 at 13:59
good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?
– Bastl
Oct 20 '11 at 14:06
Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)
– rozcietrzewiacz
Oct 20 '11 at 14:36
1
@Bastl Note that if there's any text between</Document>
and the next<Document>
, it'll be stripped.
– Gilles
Oct 20 '11 at 17:29
add a comment |
To prevent text from being stripped between </Document>
and the next <Document>
you may have to use a series of sed
commands (cf. Gilles' comment above).
Essentially sed
reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document
tags for further processing.
# version 1
# marker: HERE
cat file.xml |
sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' |
sed -n -e '/HERE<Document>/,/</Document>HERE/ p' |
sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'
# version 2 (using the Bash shell)
# marker: $'01'
cat file.xml |
sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' |
sed -n -e $'/01<Document>/,/<\/Document>01/ p' |
sed -e $'s/^ *01//' -e $'s/01 *$//' |
cat -vet
... but I guess all this could be done more elegantly (& reliably) using xmlstarlet
!
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22988%2fsimple-command-to-strip-header-and-footer-from-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using sed
:
sed -n '/<Document>/,/</Document>/ p' yourfile.xml
Explanation:
-n
makessed
silent, meaning it does not output the whole file contents,
/pattern/
searches for lines including specified pattern,
a
,
b
(the comma) tellssed
to perform an action on the lines froma
tob
(wherea
andb
get defined by matching the above patterns),
p
stands for print and is the action performed on the lines that matched the above.
Edit: If you'd like to additionally strip the whitespace before <Document>
, it can be done this way:
sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml
thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?
– Bastl
Oct 20 '11 at 13:50
It works with whitespace as well as any other characters surrounding<Document>
. See the update of my answer for deeper explanation.
– rozcietrzewiacz
Oct 20 '11 at 13:59
good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?
– Bastl
Oct 20 '11 at 14:06
Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)
– rozcietrzewiacz
Oct 20 '11 at 14:36
1
@Bastl Note that if there's any text between</Document>
and the next<Document>
, it'll be stripped.
– Gilles
Oct 20 '11 at 17:29
add a comment |
Using sed
:
sed -n '/<Document>/,/</Document>/ p' yourfile.xml
Explanation:
-n
makessed
silent, meaning it does not output the whole file contents,
/pattern/
searches for lines including specified pattern,
a
,
b
(the comma) tellssed
to perform an action on the lines froma
tob
(wherea
andb
get defined by matching the above patterns),
p
stands for print and is the action performed on the lines that matched the above.
Edit: If you'd like to additionally strip the whitespace before <Document>
, it can be done this way:
sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml
thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?
– Bastl
Oct 20 '11 at 13:50
It works with whitespace as well as any other characters surrounding<Document>
. See the update of my answer for deeper explanation.
– rozcietrzewiacz
Oct 20 '11 at 13:59
good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?
– Bastl
Oct 20 '11 at 14:06
Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)
– rozcietrzewiacz
Oct 20 '11 at 14:36
1
@Bastl Note that if there's any text between</Document>
and the next<Document>
, it'll be stripped.
– Gilles
Oct 20 '11 at 17:29
add a comment |
Using sed
:
sed -n '/<Document>/,/</Document>/ p' yourfile.xml
Explanation:
-n
makessed
silent, meaning it does not output the whole file contents,
/pattern/
searches for lines including specified pattern,
a
,
b
(the comma) tellssed
to perform an action on the lines froma
tob
(wherea
andb
get defined by matching the above patterns),
p
stands for print and is the action performed on the lines that matched the above.
Edit: If you'd like to additionally strip the whitespace before <Document>
, it can be done this way:
sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml
Using sed
:
sed -n '/<Document>/,/</Document>/ p' yourfile.xml
Explanation:
-n
makessed
silent, meaning it does not output the whole file contents,
/pattern/
searches for lines including specified pattern,
a
,
b
(the comma) tellssed
to perform an action on the lines froma
tob
(wherea
andb
get defined by matching the above patterns),
p
stands for print and is the action performed on the lines that matched the above.
Edit: If you'd like to additionally strip the whitespace before <Document>
, it can be done this way:
sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml
edited Oct 20 '11 at 14:36
answered Oct 20 '11 at 13:45
rozcietrzewiaczrozcietrzewiacz
29.4k47392
29.4k47392
thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?
– Bastl
Oct 20 '11 at 13:50
It works with whitespace as well as any other characters surrounding<Document>
. See the update of my answer for deeper explanation.
– rozcietrzewiacz
Oct 20 '11 at 13:59
good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?
– Bastl
Oct 20 '11 at 14:06
Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)
– rozcietrzewiacz
Oct 20 '11 at 14:36
1
@Bastl Note that if there's any text between</Document>
and the next<Document>
, it'll be stripped.
– Gilles
Oct 20 '11 at 17:29
add a comment |
thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?
– Bastl
Oct 20 '11 at 13:50
It works with whitespace as well as any other characters surrounding<Document>
. See the update of my answer for deeper explanation.
– rozcietrzewiacz
Oct 20 '11 at 13:59
good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?
– Bastl
Oct 20 '11 at 14:06
Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)
– rozcietrzewiacz
Oct 20 '11 at 14:36
1
@Bastl Note that if there's any text between</Document>
and the next<Document>
, it'll be stripped.
– Gilles
Oct 20 '11 at 17:29
thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?
– Bastl
Oct 20 '11 at 13:50
thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?
– Bastl
Oct 20 '11 at 13:50
It works with whitespace as well as any other characters surrounding
<Document>
. See the update of my answer for deeper explanation.– rozcietrzewiacz
Oct 20 '11 at 13:59
It works with whitespace as well as any other characters surrounding
<Document>
. See the update of my answer for deeper explanation.– rozcietrzewiacz
Oct 20 '11 at 13:59
good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?
– Bastl
Oct 20 '11 at 14:06
good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?
– Bastl
Oct 20 '11 at 14:06
Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)
– rozcietrzewiacz
Oct 20 '11 at 14:36
Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)
– rozcietrzewiacz
Oct 20 '11 at 14:36
1
1
@Bastl Note that if there's any text between
</Document>
and the next <Document>
, it'll be stripped.– Gilles
Oct 20 '11 at 17:29
@Bastl Note that if there's any text between
</Document>
and the next <Document>
, it'll be stripped.– Gilles
Oct 20 '11 at 17:29
add a comment |
To prevent text from being stripped between </Document>
and the next <Document>
you may have to use a series of sed
commands (cf. Gilles' comment above).
Essentially sed
reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document
tags for further processing.
# version 1
# marker: HERE
cat file.xml |
sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' |
sed -n -e '/HERE<Document>/,/</Document>HERE/ p' |
sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'
# version 2 (using the Bash shell)
# marker: $'01'
cat file.xml |
sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' |
sed -n -e $'/01<Document>/,/<\/Document>01/ p' |
sed -e $'s/^ *01//' -e $'s/01 *$//' |
cat -vet
... but I guess all this could be done more elegantly (& reliably) using xmlstarlet
!
add a comment |
To prevent text from being stripped between </Document>
and the next <Document>
you may have to use a series of sed
commands (cf. Gilles' comment above).
Essentially sed
reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document
tags for further processing.
# version 1
# marker: HERE
cat file.xml |
sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' |
sed -n -e '/HERE<Document>/,/</Document>HERE/ p' |
sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'
# version 2 (using the Bash shell)
# marker: $'01'
cat file.xml |
sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' |
sed -n -e $'/01<Document>/,/<\/Document>01/ p' |
sed -e $'s/^ *01//' -e $'s/01 *$//' |
cat -vet
... but I guess all this could be done more elegantly (& reliably) using xmlstarlet
!
add a comment |
To prevent text from being stripped between </Document>
and the next <Document>
you may have to use a series of sed
commands (cf. Gilles' comment above).
Essentially sed
reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document
tags for further processing.
# version 1
# marker: HERE
cat file.xml |
sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' |
sed -n -e '/HERE<Document>/,/</Document>HERE/ p' |
sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'
# version 2 (using the Bash shell)
# marker: $'01'
cat file.xml |
sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' |
sed -n -e $'/01<Document>/,/<\/Document>01/ p' |
sed -e $'s/^ *01//' -e $'s/01 *$//' |
cat -vet
... but I guess all this could be done more elegantly (& reliably) using xmlstarlet
!
To prevent text from being stripped between </Document>
and the next <Document>
you may have to use a series of sed
commands (cf. Gilles' comment above).
Essentially sed
reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document
tags for further processing.
# version 1
# marker: HERE
cat file.xml |
sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' |
sed -n -e '/HERE<Document>/,/</Document>HERE/ p' |
sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'
# version 2 (using the Bash shell)
# marker: $'01'
cat file.xml |
sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' |
sed -n -e $'/01<Document>/,/<\/Document>01/ p' |
sed -e $'s/^ *01//' -e $'s/01 *$//' |
cat -vet
... but I guess all this could be done more elegantly (& reliably) using xmlstarlet
!
answered Oct 21 '11 at 12:48
jonjon
111
111
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22988%2fsimple-command-to-strip-header-and-footer-from-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown