simple command to strip header and footer from a file

I want a command to strip an XML-Header and Footer from a file:

<?xml version="1.0" encoding="UTF-8"?>

<conxml>

<MsgPain001>

    <HashValue>A9C72997C702A2F841B0EEEC3BD274DE1CB7BEA4B813E030D068CB853BCFECA6</HashValue>

    <HashAlgorithm>SHA256</HashAlgorithm>

    <Document>

                ...

    </Document>

    <Document>

                ...

    </Document>

</MsgPain001>

</conxml>

...

Should become just

<Document>

         ...

    </Document>

    <Document>

          ...

    </Document>

(note the indenting, the indent of the first document-tag should be stripped of.

This sounds like a (greedy) regex

<Document>.*</Document>

But I don't get it due to the linefeeds.

I need it in a pipe to compute a hash over the contained documents.

edited 29 mins ago

Rui F Ribeiro

41.5k1483140

asked Oct 20 '11 at 13:34

Bastl

2314

add a comment |

I want a command to strip an XML-Header and Footer from a file:

<?xml version="1.0" encoding="UTF-8"?>

<conxml>

<MsgPain001>

    <HashValue>A9C72997C702A2F841B0EEEC3BD274DE1CB7BEA4B813E030D068CB853BCFECA6</HashValue>

    <HashAlgorithm>SHA256</HashAlgorithm>

    <Document>

                ...

    </Document>

    <Document>

                ...

    </Document>

</MsgPain001>

</conxml>

...

Should become just

<Document>

         ...

    </Document>

    <Document>

          ...

    </Document>

(note the indenting, the indent of the first document-tag should be stripped of.

This sounds like a (greedy) regex

<Document>.*</Document>

But I don't get it due to the linefeeds.

I need it in a pipe to compute a hash over the contained documents.

edited 29 mins ago

Rui F Ribeiro

41.5k1483140

asked Oct 20 '11 at 13:34

Bastl

2314

add a comment |

I want a command to strip an XML-Header and Footer from a file:

<?xml version="1.0" encoding="UTF-8"?>

<conxml>

<MsgPain001>

    <HashValue>A9C72997C702A2F841B0EEEC3BD274DE1CB7BEA4B813E030D068CB853BCFECA6</HashValue>

    <HashAlgorithm>SHA256</HashAlgorithm>

    <Document>

                ...

    </Document>

    <Document>

                ...

    </Document>

</MsgPain001>

</conxml>

...

Should become just

<Document>

         ...

    </Document>

    <Document>

          ...

    </Document>

(note the indenting, the indent of the first document-tag should be stripped of.

This sounds like a (greedy) regex

<Document>.*</Document>

But I don't get it due to the linefeeds.

I need it in a pipe to compute a hash over the contained documents.

edited 29 mins ago

Rui F Ribeiro

41.5k1483140

asked Oct 20 '11 at 13:34

Bastl

2314

I want a command to strip an XML-Header and Footer from a file:

<?xml version="1.0" encoding="UTF-8"?>

<conxml>

<MsgPain001>

    <HashValue>A9C72997C702A2F841B0EEEC3BD274DE1CB7BEA4B813E030D068CB853BCFECA6</HashValue>

    <HashAlgorithm>SHA256</HashAlgorithm>

    <Document>

                ...

    </Document>

    <Document>

                ...

    </Document>

</MsgPain001>

</conxml>

...

Should become just

<Document>

         ...

    </Document>

    <Document>

          ...

    </Document>

(note the indenting, the indent of the first document-tag should be stripped of.

This sounds like a (greedy) regex

<Document>.*</Document>

But I don't get it due to the linefeeds.

I need it in a pipe to compute a hash over the contained documents.

sed regular-expression

edited 29 mins ago

Rui F Ribeiro

41.5k1483140

asked Oct 20 '11 at 13:34

Bastl

2314

edited 29 mins ago

Rui F Ribeiro

41.5k1483140

asked Oct 20 '11 at 13:34

Bastl

2314

edited 29 mins ago

Rui F Ribeiro

41.5k1483140

edited 29 mins ago

Rui F Ribeiro

41.5k1483140

edited 29 mins ago

Rui F Ribeiro

41.5k1483140

asked Oct 20 '11 at 13:34

Bastl

2314

asked Oct 20 '11 at 13:34

Bastl

2314

asked Oct 20 '11 at 13:34

Bastl

2314

add a comment |

2 Answers
2

active

oldest

votes

Using sed:

 sed -n '/<Document>/,/</Document>/ p' yourfile.xml

Explanation:

-n makes sed silent, meaning it does not output the whole file contents,

/pattern/ searches for lines including specified pattern,

a,b (the comma) tells sed to perform an action on the lines from a to b (where a and b get defined by matching the above patterns),

p stands for print and is the action performed on the lines that matched the above.

Edit: If you'd like to additionally strip the whitespace before <Document>, it can be done this way:

 sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml

edited Oct 20 '11 at 14:36

answered Oct 20 '11 at 13:45

rozcietrzewiacz

29.4k47392

thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?

– Bastl
Oct 20 '11 at 13:50

It works with whitespace as well as any other characters surrounding <Document>. See the update of my answer for deeper explanation.

– rozcietrzewiacz
Oct 20 '11 at 13:59

good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?

– Bastl
Oct 20 '11 at 14:06

Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)

– rozcietrzewiacz
Oct 20 '11 at 14:36

1

@Bastl Note that if there's any text between </Document> and the next <Document>, it'll be stripped.

– Gilles
Oct 20 '11 at 17:29

add a comment |

To prevent text from being stripped between </Document> and the next <Document> you may have to use a series of sed commands (cf. Gilles' comment above).

Essentially sed reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document tags for further processing.

# version 1

# marker: HERE

cat file.xml | 

sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' | 

sed -n -e '/HERE<Document>/,/</Document>HERE/ p' | 

sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'



# version 2    (using the Bash shell)

# marker: $'01'

cat file.xml | 

sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' | 

sed -n -e $'/01<Document>/,/<\/Document>01/ p' | 

sed -e $'s/^ *01//' -e $'s/01 *$//' | 

cat -vet

... but I guess all this could be done more elegantly (& reliably) using xmlstarlet!

answered Oct 21 '11 at 12:48

jon

111

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f22988%2fsimple-command-to-strip-header-and-footer-from-a-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Using sed:

 sed -n '/<Document>/,/</Document>/ p' yourfile.xml

Explanation:

-n makes sed silent, meaning it does not output the whole file contents,

/pattern/ searches for lines including specified pattern,

a,b (the comma) tells sed to perform an action on the lines from a to b (where a and b get defined by matching the above patterns),

p stands for print and is the action performed on the lines that matched the above.

Edit: If you'd like to additionally strip the whitespace before <Document>, it can be done this way:

 sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml

edited Oct 20 '11 at 14:36

answered Oct 20 '11 at 13:45

rozcietrzewiacz

29.4k47392

thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?

– Bastl
Oct 20 '11 at 13:50

It works with whitespace as well as any other characters surrounding <Document>. See the update of my answer for deeper explanation.

– rozcietrzewiacz
Oct 20 '11 at 13:59

good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?

– Bastl
Oct 20 '11 at 14:06

Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)

– rozcietrzewiacz
Oct 20 '11 at 14:36

1

@Bastl Note that if there's any text between </Document> and the next <Document>, it'll be stripped.

– Gilles
Oct 20 '11 at 17:29

add a comment |

Using sed:

 sed -n '/<Document>/,/</Document>/ p' yourfile.xml

Explanation:

-n makes sed silent, meaning it does not output the whole file contents,

/pattern/ searches for lines including specified pattern,

a,b (the comma) tells sed to perform an action on the lines from a to b (where a and b get defined by matching the above patterns),

p stands for print and is the action performed on the lines that matched the above.

Edit: If you'd like to additionally strip the whitespace before <Document>, it can be done this way:

 sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml

edited Oct 20 '11 at 14:36

answered Oct 20 '11 at 13:45

rozcietrzewiacz

29.4k47392

thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?

– Bastl
Oct 20 '11 at 13:50

It works with whitespace as well as any other characters surrounding <Document>. See the update of my answer for deeper explanation.

– rozcietrzewiacz
Oct 20 '11 at 13:59

good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?

– Bastl
Oct 20 '11 at 14:06

Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)

– rozcietrzewiacz
Oct 20 '11 at 14:36

1

@Bastl Note that if there's any text between </Document> and the next <Document>, it'll be stripped.

– Gilles
Oct 20 '11 at 17:29

add a comment |

Using sed:

 sed -n '/<Document>/,/</Document>/ p' yourfile.xml

Explanation:

-n makes sed silent, meaning it does not output the whole file contents,

/pattern/ searches for lines including specified pattern,

a,b (the comma) tells sed to perform an action on the lines from a to b (where a and b get defined by matching the above patterns),

p stands for print and is the action performed on the lines that matched the above.

Edit: If you'd like to additionally strip the whitespace before <Document>, it can be done this way:

 sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml

edited Oct 20 '11 at 14:36

answered Oct 20 '11 at 13:45

rozcietrzewiacz

29.4k47392

Using sed:

 sed -n '/<Document>/,/</Document>/ p' yourfile.xml

Explanation:

-n makes sed silent, meaning it does not output the whole file contents,

/pattern/ searches for lines including specified pattern,

a,b (the comma) tells sed to perform an action on the lines from a to b (where a and b get defined by matching the above patterns),

p stands for print and is the action performed on the lines that matched the above.

Edit: If you'd like to additionally strip the whitespace before <Document>, it can be done this way:

 sed -ne '/ <Document>/s/^ *//' -e '/<Document>/,/</Document>/ p' yourfile.xml

edited Oct 20 '11 at 14:36

answered Oct 20 '11 at 13:45

rozcietrzewiacz

29.4k47392

edited Oct 20 '11 at 14:36

answered Oct 20 '11 at 13:45

rozcietrzewiacz

29.4k47392

answered Oct 20 '11 at 13:45

rozcietrzewiacz

29.4k47392

answered Oct 20 '11 at 13:45

rozcietrzewiacz

29.4k47392

thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?

– Bastl
Oct 20 '11 at 13:50

It works with whitespace as well as any other characters surrounding <Document>. See the update of my answer for deeper explanation.

– rozcietrzewiacz
Oct 20 '11 at 13:59

good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?

– Bastl
Oct 20 '11 at 14:06

Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)

– rozcietrzewiacz
Oct 20 '11 at 14:36

1

@Bastl Note that if there's any text between </Document> and the next <Document>, it'll be stripped.

– Gilles
Oct 20 '11 at 17:29

add a comment |

thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?

– Bastl
Oct 20 '11 at 13:50

It works with whitespace as well as any other characters surrounding <Document>. See the update of my answer for deeper explanation.

– rozcietrzewiacz
Oct 20 '11 at 13:59

good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?

– Bastl
Oct 20 '11 at 14:06

Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)

– rozcietrzewiacz
Oct 20 '11 at 14:36

1

@Bastl Note that if there's any text between </Document> and the next <Document>, it'll be stripped.

– Gilles
Oct 20 '11 at 17:29

thanks, I'm sed noob. What about indenting whitespace? What does the ',' do ?

– Bastl
Oct 20 '11 at 13:50

It works with whitespace as well as any other characters surrounding <Document>. See the update of my answer for deeper explanation.

– rozcietrzewiacz
Oct 20 '11 at 13:59

good. that's nearly perfect. Now I need to strip off preceeding whitespace from the first line. Is it possible inside your command?

– Bastl
Oct 20 '11 at 14:06

Yes, though it'll be a bit more complicated - see update. (At this point, I am not sure if it is the simplest way.)

– rozcietrzewiacz
Oct 20 '11 at 14:36

@Bastl Note that if there's any text between </Document> and the next <Document>, it'll be stripped.

– Gilles
Oct 20 '11 at 17:29

add a comment |

To prevent text from being stripped between </Document> and the next <Document> you may have to use a series of sed commands (cf. Gilles' comment above).

Essentially sed reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document tags for further processing.

# version 1

# marker: HERE

cat file.xml | 

sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' | 

sed -n -e '/HERE<Document>/,/</Document>HERE/ p' | 

sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'



# version 2    (using the Bash shell)

# marker: $'01'

cat file.xml | 

sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' | 

sed -n -e $'/01<Document>/,/<\/Document>01/ p' | 

sed -e $'s/^ *01//' -e $'s/01 *$//' | 

cat -vet

... but I guess all this could be done more elegantly (& reliably) using xmlstarlet!

answered Oct 21 '11 at 12:48

jon

111

add a comment |

To prevent text from being stripped between </Document> and the next <Document> you may have to use a series of sed commands (cf. Gilles' comment above).

Essentially sed reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document tags for further processing.

# version 1

# marker: HERE

cat file.xml | 

sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' | 

sed -n -e '/HERE<Document>/,/</Document>HERE/ p' | 

sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'



# version 2    (using the Bash shell)

# marker: $'01'

cat file.xml | 

sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' | 

sed -n -e $'/01<Document>/,/<\/Document>01/ p' | 

sed -e $'s/^ *01//' -e $'s/01 *$//' | 

cat -vet

... but I guess all this could be done more elegantly (& reliably) using xmlstarlet!

answered Oct 21 '11 at 12:48

jon

111

add a comment |

To prevent text from being stripped between </Document> and the next <Document> you may have to use a series of sed commands (cf. Gilles' comment above).

Essentially sed reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document tags for further processing.

# version 1

# marker: HERE

cat file.xml | 

sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' | 

sed -n -e '/HERE<Document>/,/</Document>HERE/ p' | 

sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'



# version 2    (using the Bash shell)

# marker: $'01'

cat file.xml | 

sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' | 

sed -n -e $'/01<Document>/,/<\/Document>01/ p' | 

sed -e $'s/^ *01//' -e $'s/01 *$//' | 

cat -vet

... but I guess all this could be done more elegantly (& reliably) using xmlstarlet!

answered Oct 21 '11 at 12:48

jon

111

To prevent text from being stripped between </Document> and the next <Document> you may have to use a series of sed commands (cf. Gilles' comment above).

Essentially sed reads the entire file into the hold buffer (so that the file contents can be treated as a single line) and marks the first and last Document tags for further processing.

# version 1

# marker: HERE

cat file.xml | 

sed -n '1h;1!H;${;g;s/(<Document>.*</Document>)/HERE1HERE/g;p;}' | 

sed -n -e '/HERE<Document>/,/</Document>HERE/ p' | 

sed -e 's/^ *HERE(<Document>)/1/' -e 's/(</Document>)HERE *$/1/'



# version 2    (using the Bash shell)

# marker: $'01'

cat file.xml | 

sed -n $'1h;1!H;${;g;s/\(<Document>.*<\/Document>\)/01\101/g;p;}' | 

sed -n -e $'/01<Document>/,/<\/Document>01/ p' | 

sed -e $'s/^ *01//' -e $'s/01 *$//' | 

cat -vet

... but I guess all this could be done more elegantly (& reliably) using xmlstarlet!

answered Oct 21 '11 at 12:48

jon

111

answered Oct 21 '11 at 12:48

jon

111

answered Oct 21 '11 at 12:48

jon

111

answered Oct 21 '11 at 12:48

jon

111

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj