Grouping input using awk,sed,grep











up vote
-2
down vote

favorite












I have an input file as below:



1001   Ivanov I.I: chess 
2021 Petrov P.P: chess, football
3352 Sidorov S.S:
1000 Putin V.V: judo
8773 Schwarzenegger A: judo, chess, football


I expect the output as follows:



-- chess -- 
Ivanov I.I
Petrov P.P
Schwarzenegger A
-- football --
Petrov P.P
Schwarzenegger A
-- judo --
Putin V.V.
Schwarzenegger A









share|improve this question




















  • 1




    Are the <p> paragraph </p> tags in your code block meant to be part of the input data, or are they your attempt to present the input file without them? To show code, you need to precede each block of code with a blank line, and each line of code needs to be prefixed with 4 spaces... To get help for the markdown syntax, click the small (?) button at the top right of the edit box.
    – Peter.O
    Aug 21 '15 at 4:42








  • 2




    This post does not meet our quality standards. Also see How to Ask
    – Tejas
    Aug 21 '15 at 10:22















up vote
-2
down vote

favorite












I have an input file as below:



1001   Ivanov I.I: chess 
2021 Petrov P.P: chess, football
3352 Sidorov S.S:
1000 Putin V.V: judo
8773 Schwarzenegger A: judo, chess, football


I expect the output as follows:



-- chess -- 
Ivanov I.I
Petrov P.P
Schwarzenegger A
-- football --
Petrov P.P
Schwarzenegger A
-- judo --
Putin V.V.
Schwarzenegger A









share|improve this question




















  • 1




    Are the <p> paragraph </p> tags in your code block meant to be part of the input data, or are they your attempt to present the input file without them? To show code, you need to precede each block of code with a blank line, and each line of code needs to be prefixed with 4 spaces... To get help for the markdown syntax, click the small (?) button at the top right of the edit box.
    – Peter.O
    Aug 21 '15 at 4:42








  • 2




    This post does not meet our quality standards. Also see How to Ask
    – Tejas
    Aug 21 '15 at 10:22













up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I have an input file as below:



1001   Ivanov I.I: chess 
2021 Petrov P.P: chess, football
3352 Sidorov S.S:
1000 Putin V.V: judo
8773 Schwarzenegger A: judo, chess, football


I expect the output as follows:



-- chess -- 
Ivanov I.I
Petrov P.P
Schwarzenegger A
-- football --
Petrov P.P
Schwarzenegger A
-- judo --
Putin V.V.
Schwarzenegger A









share|improve this question















I have an input file as below:



1001   Ivanov I.I: chess 
2021 Petrov P.P: chess, football
3352 Sidorov S.S:
1000 Putin V.V: judo
8773 Schwarzenegger A: judo, chess, football


I expect the output as follows:



-- chess -- 
Ivanov I.I
Petrov P.P
Schwarzenegger A
-- football --
Petrov P.P
Schwarzenegger A
-- judo --
Putin V.V.
Schwarzenegger A






sed awk grep






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 at 14:46









Rui F Ribeiro

38.3k1475126




38.3k1475126










asked Aug 21 '15 at 3:35









sharanaprasad mailar

81




81








  • 1




    Are the <p> paragraph </p> tags in your code block meant to be part of the input data, or are they your attempt to present the input file without them? To show code, you need to precede each block of code with a blank line, and each line of code needs to be prefixed with 4 spaces... To get help for the markdown syntax, click the small (?) button at the top right of the edit box.
    – Peter.O
    Aug 21 '15 at 4:42








  • 2




    This post does not meet our quality standards. Also see How to Ask
    – Tejas
    Aug 21 '15 at 10:22














  • 1




    Are the <p> paragraph </p> tags in your code block meant to be part of the input data, or are they your attempt to present the input file without them? To show code, you need to precede each block of code with a blank line, and each line of code needs to be prefixed with 4 spaces... To get help for the markdown syntax, click the small (?) button at the top right of the edit box.
    – Peter.O
    Aug 21 '15 at 4:42








  • 2




    This post does not meet our quality standards. Also see How to Ask
    – Tejas
    Aug 21 '15 at 10:22








1




1




Are the <p> paragraph </p> tags in your code block meant to be part of the input data, or are they your attempt to present the input file without them? To show code, you need to precede each block of code with a blank line, and each line of code needs to be prefixed with 4 spaces... To get help for the markdown syntax, click the small (?) button at the top right of the edit box.
– Peter.O
Aug 21 '15 at 4:42






Are the <p> paragraph </p> tags in your code block meant to be part of the input data, or are they your attempt to present the input file without them? To show code, you need to precede each block of code with a blank line, and each line of code needs to be prefixed with 4 spaces... To get help for the markdown syntax, click the small (?) button at the top right of the edit box.
– Peter.O
Aug 21 '15 at 4:42






2




2




This post does not meet our quality standards. Also see How to Ask
– Tejas
Aug 21 '15 at 10:22




This post does not meet our quality standards. Also see How to Ask
– Tejas
Aug 21 '15 at 10:22










2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted










Here's a bit of an ugly "oneliner", that does the job exactly as specified by you in the question.



for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/ <p>/' | sed 's/$/</p>/'; done;



However, I'm not writing this post solely to help you solve this precise problem: I deliberately wrote the solution in parts that do very specific things and below is an explanation of what those parts do. So if you want to learn how to use these tools in addition to just solving this specific problem, do read ahead!



Let's go through how it works, bit by bit:



for tag in - defines us a new variable called tag. This is used for the tags you need.



cat input - reads the file called input, change this name to your file name.



| is a pipe; a fun way to get data flown through a series of commands.



cut -d: -f2 - using a colon : as a delimiter character, take the second field. This nets us the text after a colon on each line.



sed 's/</p>//g' - removes the end paragraph tag on each line of the input.



At this point, the data would look something like this:



 chess 
chess, football

judo
judo, chess, football


Next, let's get rid of the empty lines consisting of only whitespace, as some people don't have tags associated with them.



sed '/^[[:space:]]*$/d' - removes all lines that only have whitespace in them. Great!



tr "," "\n" - replaces all commas with newlines, so that each tag will be on a separate line.



sed 's/[[:space:]]//g' - removes any extra whitespace in the beginning of lines



sort -u - sorts the tags alphabetically, and removes any repetitions. Now we have a perfect list of all tags, in order, and without repetitions:



chess
football
judo




Now, for each of those tags, one after another, we do the following:



echo "<p>-- $tag --</p>" - print out the paragraph html tags, two dashes, tag name, two dashes, and an end paragraph tag, as specified.



grep $tag input - find lines that have the specific tag



awk '{print $2, $3}' - print the name fields (Last name + initials)



sed 's/://g' - remove the colon that was considered a part of the initials by awk



sed 's/^/ <p>/' - add a paragraph tag to beginning of each line



sed 's/$/</p>/' - add a close paragraph tag to end of each line



done; - and we got done with it, yay!



Here's the results:



$ for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/  <p>/' | sed 's/$/</p>/'; done;
<p>-- chess --</p>
<p>Ivanov I.I</p>
<p>Petrov P.P</p>
<p>Schwarzenegger A</p>
<p>-- football --</p>
<p>Petrov P.P</p>
<p>Schwarzenegger A</p>
<p>-- judo --</p>
<p>Putin V.V</p>
<p>Schwarzenegger A</p>




EDIT:
It was mentioned that the tags are not a part of the input. This simplifies things a bit:



$ for tag in `cat input |cut -d: -f2 |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "--$tag --"; grep $tag input | awk '{print $2, $3}' | sed 's/://g'; done; 
--chess --
Ivanov I.I
Petrov P.P
Schwarzenegger A
--football --
Petrov P.P
Schwarzenegger A
--judo --
Putin V.V
Schwarzenegger A




Hope this helps you in the journey that is the Unix command line, and mastering some of the most powerful tools within.






share|improve this answer























  • Oh Sorry .. Tags are not part of input.
    – sharanaprasad mailar
    Aug 21 '15 at 9:56












  • Thanks So much Walther ... I removed those tags from your command .. It works perfectly.. Thanks a lot
    – sharanaprasad mailar
    Aug 21 '15 at 10:07










  • No problem! If you think this was useful, please mark the solution as accepted.
    – Walther
    Aug 21 '15 at 10:47




















up vote
0
down vote













awk -F: '{
sub( "[ 0-9]+", "", $1 )
gsub( " +", "", $2 )
cz=split( $2,comp,"," )
for( c=1; c<=cz; c++ ) {
if( comp[c] ) {
if( ! allcomp[comp[c]] )
fifocomp[++fifoc] = comp[c]
allcomp[comp[c]] = allcomp[comp[c]] " " $1 "n"
}
}
} END {
for( c=1; c<=fifoc; c++ ) {
print "-- " fifocomp[c] " --"
printf "%s", allcomp[fifocomp[c]]
}
}' file


output:



-- chess --
Ivanov I.I
Petrov P.P
Schwarzenegger A
-- football --
Petrov P.P
Schwarzenegger A
-- judo --
Putin V.V
Schwarzenegger A





share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f224566%2fgrouping-input-using-awk-sed-grep%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote



    accepted










    Here's a bit of an ugly "oneliner", that does the job exactly as specified by you in the question.



    for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/ <p>/' | sed 's/$/</p>/'; done;



    However, I'm not writing this post solely to help you solve this precise problem: I deliberately wrote the solution in parts that do very specific things and below is an explanation of what those parts do. So if you want to learn how to use these tools in addition to just solving this specific problem, do read ahead!



    Let's go through how it works, bit by bit:



    for tag in - defines us a new variable called tag. This is used for the tags you need.



    cat input - reads the file called input, change this name to your file name.



    | is a pipe; a fun way to get data flown through a series of commands.



    cut -d: -f2 - using a colon : as a delimiter character, take the second field. This nets us the text after a colon on each line.



    sed 's/</p>//g' - removes the end paragraph tag on each line of the input.



    At this point, the data would look something like this:



     chess 
    chess, football

    judo
    judo, chess, football


    Next, let's get rid of the empty lines consisting of only whitespace, as some people don't have tags associated with them.



    sed '/^[[:space:]]*$/d' - removes all lines that only have whitespace in them. Great!



    tr "," "\n" - replaces all commas with newlines, so that each tag will be on a separate line.



    sed 's/[[:space:]]//g' - removes any extra whitespace in the beginning of lines



    sort -u - sorts the tags alphabetically, and removes any repetitions. Now we have a perfect list of all tags, in order, and without repetitions:



    chess
    football
    judo




    Now, for each of those tags, one after another, we do the following:



    echo "<p>-- $tag --</p>" - print out the paragraph html tags, two dashes, tag name, two dashes, and an end paragraph tag, as specified.



    grep $tag input - find lines that have the specific tag



    awk '{print $2, $3}' - print the name fields (Last name + initials)



    sed 's/://g' - remove the colon that was considered a part of the initials by awk



    sed 's/^/ <p>/' - add a paragraph tag to beginning of each line



    sed 's/$/</p>/' - add a close paragraph tag to end of each line



    done; - and we got done with it, yay!



    Here's the results:



    $ for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/  <p>/' | sed 's/$/</p>/'; done;
    <p>-- chess --</p>
    <p>Ivanov I.I</p>
    <p>Petrov P.P</p>
    <p>Schwarzenegger A</p>
    <p>-- football --</p>
    <p>Petrov P.P</p>
    <p>Schwarzenegger A</p>
    <p>-- judo --</p>
    <p>Putin V.V</p>
    <p>Schwarzenegger A</p>




    EDIT:
    It was mentioned that the tags are not a part of the input. This simplifies things a bit:



    $ for tag in `cat input |cut -d: -f2 |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "--$tag --"; grep $tag input | awk '{print $2, $3}' | sed 's/://g'; done; 
    --chess --
    Ivanov I.I
    Petrov P.P
    Schwarzenegger A
    --football --
    Petrov P.P
    Schwarzenegger A
    --judo --
    Putin V.V
    Schwarzenegger A




    Hope this helps you in the journey that is the Unix command line, and mastering some of the most powerful tools within.






    share|improve this answer























    • Oh Sorry .. Tags are not part of input.
      – sharanaprasad mailar
      Aug 21 '15 at 9:56












    • Thanks So much Walther ... I removed those tags from your command .. It works perfectly.. Thanks a lot
      – sharanaprasad mailar
      Aug 21 '15 at 10:07










    • No problem! If you think this was useful, please mark the solution as accepted.
      – Walther
      Aug 21 '15 at 10:47

















    up vote
    2
    down vote



    accepted










    Here's a bit of an ugly "oneliner", that does the job exactly as specified by you in the question.



    for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/ <p>/' | sed 's/$/</p>/'; done;



    However, I'm not writing this post solely to help you solve this precise problem: I deliberately wrote the solution in parts that do very specific things and below is an explanation of what those parts do. So if you want to learn how to use these tools in addition to just solving this specific problem, do read ahead!



    Let's go through how it works, bit by bit:



    for tag in - defines us a new variable called tag. This is used for the tags you need.



    cat input - reads the file called input, change this name to your file name.



    | is a pipe; a fun way to get data flown through a series of commands.



    cut -d: -f2 - using a colon : as a delimiter character, take the second field. This nets us the text after a colon on each line.



    sed 's/</p>//g' - removes the end paragraph tag on each line of the input.



    At this point, the data would look something like this:



     chess 
    chess, football

    judo
    judo, chess, football


    Next, let's get rid of the empty lines consisting of only whitespace, as some people don't have tags associated with them.



    sed '/^[[:space:]]*$/d' - removes all lines that only have whitespace in them. Great!



    tr "," "\n" - replaces all commas with newlines, so that each tag will be on a separate line.



    sed 's/[[:space:]]//g' - removes any extra whitespace in the beginning of lines



    sort -u - sorts the tags alphabetically, and removes any repetitions. Now we have a perfect list of all tags, in order, and without repetitions:



    chess
    football
    judo




    Now, for each of those tags, one after another, we do the following:



    echo "<p>-- $tag --</p>" - print out the paragraph html tags, two dashes, tag name, two dashes, and an end paragraph tag, as specified.



    grep $tag input - find lines that have the specific tag



    awk '{print $2, $3}' - print the name fields (Last name + initials)



    sed 's/://g' - remove the colon that was considered a part of the initials by awk



    sed 's/^/ <p>/' - add a paragraph tag to beginning of each line



    sed 's/$/</p>/' - add a close paragraph tag to end of each line



    done; - and we got done with it, yay!



    Here's the results:



    $ for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/  <p>/' | sed 's/$/</p>/'; done;
    <p>-- chess --</p>
    <p>Ivanov I.I</p>
    <p>Petrov P.P</p>
    <p>Schwarzenegger A</p>
    <p>-- football --</p>
    <p>Petrov P.P</p>
    <p>Schwarzenegger A</p>
    <p>-- judo --</p>
    <p>Putin V.V</p>
    <p>Schwarzenegger A</p>




    EDIT:
    It was mentioned that the tags are not a part of the input. This simplifies things a bit:



    $ for tag in `cat input |cut -d: -f2 |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "--$tag --"; grep $tag input | awk '{print $2, $3}' | sed 's/://g'; done; 
    --chess --
    Ivanov I.I
    Petrov P.P
    Schwarzenegger A
    --football --
    Petrov P.P
    Schwarzenegger A
    --judo --
    Putin V.V
    Schwarzenegger A




    Hope this helps you in the journey that is the Unix command line, and mastering some of the most powerful tools within.






    share|improve this answer























    • Oh Sorry .. Tags are not part of input.
      – sharanaprasad mailar
      Aug 21 '15 at 9:56












    • Thanks So much Walther ... I removed those tags from your command .. It works perfectly.. Thanks a lot
      – sharanaprasad mailar
      Aug 21 '15 at 10:07










    • No problem! If you think this was useful, please mark the solution as accepted.
      – Walther
      Aug 21 '15 at 10:47















    up vote
    2
    down vote



    accepted







    up vote
    2
    down vote



    accepted






    Here's a bit of an ugly "oneliner", that does the job exactly as specified by you in the question.



    for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/ <p>/' | sed 's/$/</p>/'; done;



    However, I'm not writing this post solely to help you solve this precise problem: I deliberately wrote the solution in parts that do very specific things and below is an explanation of what those parts do. So if you want to learn how to use these tools in addition to just solving this specific problem, do read ahead!



    Let's go through how it works, bit by bit:



    for tag in - defines us a new variable called tag. This is used for the tags you need.



    cat input - reads the file called input, change this name to your file name.



    | is a pipe; a fun way to get data flown through a series of commands.



    cut -d: -f2 - using a colon : as a delimiter character, take the second field. This nets us the text after a colon on each line.



    sed 's/</p>//g' - removes the end paragraph tag on each line of the input.



    At this point, the data would look something like this:



     chess 
    chess, football

    judo
    judo, chess, football


    Next, let's get rid of the empty lines consisting of only whitespace, as some people don't have tags associated with them.



    sed '/^[[:space:]]*$/d' - removes all lines that only have whitespace in them. Great!



    tr "," "\n" - replaces all commas with newlines, so that each tag will be on a separate line.



    sed 's/[[:space:]]//g' - removes any extra whitespace in the beginning of lines



    sort -u - sorts the tags alphabetically, and removes any repetitions. Now we have a perfect list of all tags, in order, and without repetitions:



    chess
    football
    judo




    Now, for each of those tags, one after another, we do the following:



    echo "<p>-- $tag --</p>" - print out the paragraph html tags, two dashes, tag name, two dashes, and an end paragraph tag, as specified.



    grep $tag input - find lines that have the specific tag



    awk '{print $2, $3}' - print the name fields (Last name + initials)



    sed 's/://g' - remove the colon that was considered a part of the initials by awk



    sed 's/^/ <p>/' - add a paragraph tag to beginning of each line



    sed 's/$/</p>/' - add a close paragraph tag to end of each line



    done; - and we got done with it, yay!



    Here's the results:



    $ for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/  <p>/' | sed 's/$/</p>/'; done;
    <p>-- chess --</p>
    <p>Ivanov I.I</p>
    <p>Petrov P.P</p>
    <p>Schwarzenegger A</p>
    <p>-- football --</p>
    <p>Petrov P.P</p>
    <p>Schwarzenegger A</p>
    <p>-- judo --</p>
    <p>Putin V.V</p>
    <p>Schwarzenegger A</p>




    EDIT:
    It was mentioned that the tags are not a part of the input. This simplifies things a bit:



    $ for tag in `cat input |cut -d: -f2 |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "--$tag --"; grep $tag input | awk '{print $2, $3}' | sed 's/://g'; done; 
    --chess --
    Ivanov I.I
    Petrov P.P
    Schwarzenegger A
    --football --
    Petrov P.P
    Schwarzenegger A
    --judo --
    Putin V.V
    Schwarzenegger A




    Hope this helps you in the journey that is the Unix command line, and mastering some of the most powerful tools within.






    share|improve this answer














    Here's a bit of an ugly "oneliner", that does the job exactly as specified by you in the question.



    for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/ <p>/' | sed 's/$/</p>/'; done;



    However, I'm not writing this post solely to help you solve this precise problem: I deliberately wrote the solution in parts that do very specific things and below is an explanation of what those parts do. So if you want to learn how to use these tools in addition to just solving this specific problem, do read ahead!



    Let's go through how it works, bit by bit:



    for tag in - defines us a new variable called tag. This is used for the tags you need.



    cat input - reads the file called input, change this name to your file name.



    | is a pipe; a fun way to get data flown through a series of commands.



    cut -d: -f2 - using a colon : as a delimiter character, take the second field. This nets us the text after a colon on each line.



    sed 's/</p>//g' - removes the end paragraph tag on each line of the input.



    At this point, the data would look something like this:



     chess 
    chess, football

    judo
    judo, chess, football


    Next, let's get rid of the empty lines consisting of only whitespace, as some people don't have tags associated with them.



    sed '/^[[:space:]]*$/d' - removes all lines that only have whitespace in them. Great!



    tr "," "\n" - replaces all commas with newlines, so that each tag will be on a separate line.



    sed 's/[[:space:]]//g' - removes any extra whitespace in the beginning of lines



    sort -u - sorts the tags alphabetically, and removes any repetitions. Now we have a perfect list of all tags, in order, and without repetitions:



    chess
    football
    judo




    Now, for each of those tags, one after another, we do the following:



    echo "<p>-- $tag --</p>" - print out the paragraph html tags, two dashes, tag name, two dashes, and an end paragraph tag, as specified.



    grep $tag input - find lines that have the specific tag



    awk '{print $2, $3}' - print the name fields (Last name + initials)



    sed 's/://g' - remove the colon that was considered a part of the initials by awk



    sed 's/^/ <p>/' - add a paragraph tag to beginning of each line



    sed 's/$/</p>/' - add a close paragraph tag to end of each line



    done; - and we got done with it, yay!



    Here's the results:



    $ for tag in `cat input |cut -d: -f2 |sed 's/</p>//g' |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "<p>-- $tag --</p>"; grep $tag input | awk '{print $2, $3}' | sed 's/://g' | sed 's/^/  <p>/' | sed 's/$/</p>/'; done;
    <p>-- chess --</p>
    <p>Ivanov I.I</p>
    <p>Petrov P.P</p>
    <p>Schwarzenegger A</p>
    <p>-- football --</p>
    <p>Petrov P.P</p>
    <p>Schwarzenegger A</p>
    <p>-- judo --</p>
    <p>Putin V.V</p>
    <p>Schwarzenegger A</p>




    EDIT:
    It was mentioned that the tags are not a part of the input. This simplifies things a bit:



    $ for tag in `cat input |cut -d: -f2 |sed '/^[[:space:]]*$/d' |tr "," "\n" | sed 's/[[:space:]]//g' | sort -u`; do echo "--$tag --"; grep $tag input | awk '{print $2, $3}' | sed 's/://g'; done; 
    --chess --
    Ivanov I.I
    Petrov P.P
    Schwarzenegger A
    --football --
    Petrov P.P
    Schwarzenegger A
    --judo --
    Putin V.V
    Schwarzenegger A




    Hope this helps you in the journey that is the Unix command line, and mastering some of the most powerful tools within.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Aug 21 '15 at 10:05

























    answered Aug 21 '15 at 9:45









    Walther

    43637




    43637












    • Oh Sorry .. Tags are not part of input.
      – sharanaprasad mailar
      Aug 21 '15 at 9:56












    • Thanks So much Walther ... I removed those tags from your command .. It works perfectly.. Thanks a lot
      – sharanaprasad mailar
      Aug 21 '15 at 10:07










    • No problem! If you think this was useful, please mark the solution as accepted.
      – Walther
      Aug 21 '15 at 10:47




















    • Oh Sorry .. Tags are not part of input.
      – sharanaprasad mailar
      Aug 21 '15 at 9:56












    • Thanks So much Walther ... I removed those tags from your command .. It works perfectly.. Thanks a lot
      – sharanaprasad mailar
      Aug 21 '15 at 10:07










    • No problem! If you think this was useful, please mark the solution as accepted.
      – Walther
      Aug 21 '15 at 10:47


















    Oh Sorry .. Tags are not part of input.
    – sharanaprasad mailar
    Aug 21 '15 at 9:56






    Oh Sorry .. Tags are not part of input.
    – sharanaprasad mailar
    Aug 21 '15 at 9:56














    Thanks So much Walther ... I removed those tags from your command .. It works perfectly.. Thanks a lot
    – sharanaprasad mailar
    Aug 21 '15 at 10:07




    Thanks So much Walther ... I removed those tags from your command .. It works perfectly.. Thanks a lot
    – sharanaprasad mailar
    Aug 21 '15 at 10:07












    No problem! If you think this was useful, please mark the solution as accepted.
    – Walther
    Aug 21 '15 at 10:47






    No problem! If you think this was useful, please mark the solution as accepted.
    – Walther
    Aug 21 '15 at 10:47














    up vote
    0
    down vote













    awk -F: '{
    sub( "[ 0-9]+", "", $1 )
    gsub( " +", "", $2 )
    cz=split( $2,comp,"," )
    for( c=1; c<=cz; c++ ) {
    if( comp[c] ) {
    if( ! allcomp[comp[c]] )
    fifocomp[++fifoc] = comp[c]
    allcomp[comp[c]] = allcomp[comp[c]] " " $1 "n"
    }
    }
    } END {
    for( c=1; c<=fifoc; c++ ) {
    print "-- " fifocomp[c] " --"
    printf "%s", allcomp[fifocomp[c]]
    }
    }' file


    output:



    -- chess --
    Ivanov I.I
    Petrov P.P
    Schwarzenegger A
    -- football --
    Petrov P.P
    Schwarzenegger A
    -- judo --
    Putin V.V
    Schwarzenegger A





    share|improve this answer



























      up vote
      0
      down vote













      awk -F: '{
      sub( "[ 0-9]+", "", $1 )
      gsub( " +", "", $2 )
      cz=split( $2,comp,"," )
      for( c=1; c<=cz; c++ ) {
      if( comp[c] ) {
      if( ! allcomp[comp[c]] )
      fifocomp[++fifoc] = comp[c]
      allcomp[comp[c]] = allcomp[comp[c]] " " $1 "n"
      }
      }
      } END {
      for( c=1; c<=fifoc; c++ ) {
      print "-- " fifocomp[c] " --"
      printf "%s", allcomp[fifocomp[c]]
      }
      }' file


      output:



      -- chess --
      Ivanov I.I
      Petrov P.P
      Schwarzenegger A
      -- football --
      Petrov P.P
      Schwarzenegger A
      -- judo --
      Putin V.V
      Schwarzenegger A





      share|improve this answer

























        up vote
        0
        down vote










        up vote
        0
        down vote









        awk -F: '{
        sub( "[ 0-9]+", "", $1 )
        gsub( " +", "", $2 )
        cz=split( $2,comp,"," )
        for( c=1; c<=cz; c++ ) {
        if( comp[c] ) {
        if( ! allcomp[comp[c]] )
        fifocomp[++fifoc] = comp[c]
        allcomp[comp[c]] = allcomp[comp[c]] " " $1 "n"
        }
        }
        } END {
        for( c=1; c<=fifoc; c++ ) {
        print "-- " fifocomp[c] " --"
        printf "%s", allcomp[fifocomp[c]]
        }
        }' file


        output:



        -- chess --
        Ivanov I.I
        Petrov P.P
        Schwarzenegger A
        -- football --
        Petrov P.P
        Schwarzenegger A
        -- judo --
        Putin V.V
        Schwarzenegger A





        share|improve this answer














        awk -F: '{
        sub( "[ 0-9]+", "", $1 )
        gsub( " +", "", $2 )
        cz=split( $2,comp,"," )
        for( c=1; c<=cz; c++ ) {
        if( comp[c] ) {
        if( ! allcomp[comp[c]] )
        fifocomp[++fifoc] = comp[c]
        allcomp[comp[c]] = allcomp[comp[c]] " " $1 "n"
        }
        }
        } END {
        for( c=1; c<=fifoc; c++ ) {
        print "-- " fifocomp[c] " --"
        printf "%s", allcomp[fifocomp[c]]
        }
        }' file


        output:



        -- chess --
        Ivanov I.I
        Petrov P.P
        Schwarzenegger A
        -- football --
        Petrov P.P
        Schwarzenegger A
        -- judo --
        Putin V.V
        Schwarzenegger A






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Aug 21 '15 at 15:34

























        answered Aug 21 '15 at 12:47









        Peter.O

        18.7k1791143




        18.7k1791143






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f224566%2fgrouping-input-using-awk-sed-grep%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Accessing regular linux commands in Huawei's Dopra Linux

            Can't connect RFCOMM socket: Host is down

            Kernel panic - not syncing: Fatal Exception in Interrupt