Grep a range of values with specific starting characters












-1















I have 10GB files in which i want to count the occurrences of some specific text i.e TY[0-9].



Example file:



ABC,2A,2018-07-06,2018-06-20 00:00:00
BCD,TY1,2018-07-06,2018-06-20 00:00:00
EFG,TY2,2018-07-06,2018-06-20 00:00:00
IGH,2A,2018-07-06,2018-06-20 00:00:00


I want to get the count of all text starting with TY and then a digit. I tried using egrep but am not getting the correct result.



egrep  "^TY[0-9]" Filename









share|improve this question





























    -1















    I have 10GB files in which i want to count the occurrences of some specific text i.e TY[0-9].



    Example file:



    ABC,2A,2018-07-06,2018-06-20 00:00:00
    BCD,TY1,2018-07-06,2018-06-20 00:00:00
    EFG,TY2,2018-07-06,2018-06-20 00:00:00
    IGH,2A,2018-07-06,2018-06-20 00:00:00


    I want to get the count of all text starting with TY and then a digit. I tried using egrep but am not getting the correct result.



    egrep  "^TY[0-9]" Filename









    share|improve this question



























      -1












      -1








      -1








      I have 10GB files in which i want to count the occurrences of some specific text i.e TY[0-9].



      Example file:



      ABC,2A,2018-07-06,2018-06-20 00:00:00
      BCD,TY1,2018-07-06,2018-06-20 00:00:00
      EFG,TY2,2018-07-06,2018-06-20 00:00:00
      IGH,2A,2018-07-06,2018-06-20 00:00:00


      I want to get the count of all text starting with TY and then a digit. I tried using egrep but am not getting the correct result.



      egrep  "^TY[0-9]" Filename









      share|improve this question
















      I have 10GB files in which i want to count the occurrences of some specific text i.e TY[0-9].



      Example file:



      ABC,2A,2018-07-06,2018-06-20 00:00:00
      BCD,TY1,2018-07-06,2018-06-20 00:00:00
      EFG,TY2,2018-07-06,2018-06-20 00:00:00
      IGH,2A,2018-07-06,2018-06-20 00:00:00


      I want to get the count of all text starting with TY and then a digit. I tried using egrep but am not getting the correct result.



      egrep  "^TY[0-9]" Filename






      awk grep






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 12 mins ago









      Crypteya

      918




      918










      asked Jun 21 '18 at 18:37









      DeveloperDeveloper

      15517




      15517






















          3 Answers
          3






          active

          oldest

          votes


















          3














          Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:



          awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename


          I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.



          cut -d, -f2 filename | grep -c '^TY[[:digit:]]'


          ... but I'm not sure.





          After some testing on my OpenBSD system, using a 1.1GB file, the cut+grep is actually almost 50% quicker than awk (8 seconds vs. 15 seconds). And a pure grep solution (grep -Ec '<TY[0-9]' filename, taken from glenn's solution) takes 13 seconds.



          So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.






          share|improve this answer


























          • In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

            – DopeGhoti
            Jun 21 '18 at 18:59













          • @DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

            – Kusalananda
            Jun 21 '18 at 19:02





















          2














          You want to use a word boundary instead of the start-of-line anchor:



          $ grep -Ec '<TY[0-9]' file
          2


          Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then



          $ grep -Eo '<TY[0-9]' file | wc -l





          share|improve this answer































            1














            If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:



            <file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'


            Which on an input like:



            TY1,TY2,TY,TYFOO
            TY213,X-TY2,TY4


            Would return 4 (TY1, TY2, TY213, TY4).



            (?<!...) and (?!...) are respectively negative look behing and ahead operators. So here, we're looking for TY followed by one or more (+) digits (d), provided its neither preceded nor followed by a character other than ,.



            Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:



            <file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'


            (on my system, that's about 10 times as fast as the perl solution)






            share|improve this answer

























              Your Answer








              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "106"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451168%2fgrep-a-range-of-values-with-specific-starting-characters%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:



              awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename


              I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.



              cut -d, -f2 filename | grep -c '^TY[[:digit:]]'


              ... but I'm not sure.





              After some testing on my OpenBSD system, using a 1.1GB file, the cut+grep is actually almost 50% quicker than awk (8 seconds vs. 15 seconds). And a pure grep solution (grep -Ec '<TY[0-9]' filename, taken from glenn's solution) takes 13 seconds.



              So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.






              share|improve this answer


























              • In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

                – DopeGhoti
                Jun 21 '18 at 18:59













              • @DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

                – Kusalananda
                Jun 21 '18 at 19:02


















              3














              Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:



              awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename


              I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.



              cut -d, -f2 filename | grep -c '^TY[[:digit:]]'


              ... but I'm not sure.





              After some testing on my OpenBSD system, using a 1.1GB file, the cut+grep is actually almost 50% quicker than awk (8 seconds vs. 15 seconds). And a pure grep solution (grep -Ec '<TY[0-9]' filename, taken from glenn's solution) takes 13 seconds.



              So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.






              share|improve this answer


























              • In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

                – DopeGhoti
                Jun 21 '18 at 18:59













              • @DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

                – Kusalananda
                Jun 21 '18 at 19:02
















              3












              3








              3







              Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:



              awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename


              I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.



              cut -d, -f2 filename | grep -c '^TY[[:digit:]]'


              ... but I'm not sure.





              After some testing on my OpenBSD system, using a 1.1GB file, the cut+grep is actually almost 50% quicker than awk (8 seconds vs. 15 seconds). And a pure grep solution (grep -Ec '<TY[0-9]' filename, taken from glenn's solution) takes 13 seconds.



              So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.






              share|improve this answer















              Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:



              awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename


              I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.



              cut -d, -f2 filename | grep -c '^TY[[:digit:]]'


              ... but I'm not sure.





              After some testing on my OpenBSD system, using a 1.1GB file, the cut+grep is actually almost 50% quicker than awk (8 seconds vs. 15 seconds). And a pure grep solution (grep -Ec '<TY[0-9]' filename, taken from glenn's solution) takes 13 seconds.



              So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Jun 21 '18 at 19:02

























              answered Jun 21 '18 at 18:47









              KusalanandaKusalananda

              127k16239393




              127k16239393













              • In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

                – DopeGhoti
                Jun 21 '18 at 18:59













              • @DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

                – Kusalananda
                Jun 21 '18 at 19:02





















              • In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

                – DopeGhoti
                Jun 21 '18 at 18:59













              • @DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

                – Kusalananda
                Jun 21 '18 at 19:02



















              In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

              – DopeGhoti
              Jun 21 '18 at 18:59







              In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

              – DopeGhoti
              Jun 21 '18 at 18:59















              @DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

              – Kusalananda
              Jun 21 '18 at 19:02







              @DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

              – Kusalananda
              Jun 21 '18 at 19:02















              2














              You want to use a word boundary instead of the start-of-line anchor:



              $ grep -Ec '<TY[0-9]' file
              2


              Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then



              $ grep -Eo '<TY[0-9]' file | wc -l





              share|improve this answer




























                2














                You want to use a word boundary instead of the start-of-line anchor:



                $ grep -Ec '<TY[0-9]' file
                2


                Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then



                $ grep -Eo '<TY[0-9]' file | wc -l





                share|improve this answer


























                  2












                  2








                  2







                  You want to use a word boundary instead of the start-of-line anchor:



                  $ grep -Ec '<TY[0-9]' file
                  2


                  Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then



                  $ grep -Eo '<TY[0-9]' file | wc -l





                  share|improve this answer













                  You want to use a word boundary instead of the start-of-line anchor:



                  $ grep -Ec '<TY[0-9]' file
                  2


                  Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then



                  $ grep -Eo '<TY[0-9]' file | wc -l






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jun 21 '18 at 18:45









                  glenn jackmanglenn jackman

                  51.2k571110




                  51.2k571110























                      1














                      If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:



                      <file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'


                      Which on an input like:



                      TY1,TY2,TY,TYFOO
                      TY213,X-TY2,TY4


                      Would return 4 (TY1, TY2, TY213, TY4).



                      (?<!...) and (?!...) are respectively negative look behing and ahead operators. So here, we're looking for TY followed by one or more (+) digits (d), provided its neither preceded nor followed by a character other than ,.



                      Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:



                      <file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'


                      (on my system, that's about 10 times as fast as the perl solution)






                      share|improve this answer






























                        1














                        If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:



                        <file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'


                        Which on an input like:



                        TY1,TY2,TY,TYFOO
                        TY213,X-TY2,TY4


                        Would return 4 (TY1, TY2, TY213, TY4).



                        (?<!...) and (?!...) are respectively negative look behing and ahead operators. So here, we're looking for TY followed by one or more (+) digits (d), provided its neither preceded nor followed by a character other than ,.



                        Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:



                        <file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'


                        (on my system, that's about 10 times as fast as the perl solution)






                        share|improve this answer




























                          1












                          1








                          1







                          If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:



                          <file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'


                          Which on an input like:



                          TY1,TY2,TY,TYFOO
                          TY213,X-TY2,TY4


                          Would return 4 (TY1, TY2, TY213, TY4).



                          (?<!...) and (?!...) are respectively negative look behing and ahead operators. So here, we're looking for TY followed by one or more (+) digits (d), provided its neither preceded nor followed by a character other than ,.



                          Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:



                          <file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'


                          (on my system, that's about 10 times as fast as the perl solution)






                          share|improve this answer















                          If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:



                          <file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'


                          Which on an input like:



                          TY1,TY2,TY,TYFOO
                          TY213,X-TY2,TY4


                          Would return 4 (TY1, TY2, TY213, TY4).



                          (?<!...) and (?!...) are respectively negative look behing and ahead operators. So here, we're looking for TY followed by one or more (+) digits (d), provided its neither preceded nor followed by a character other than ,.



                          Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:



                          <file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'


                          (on my system, that's about 10 times as fast as the perl solution)







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Jun 21 '18 at 19:03

























                          answered Jun 21 '18 at 18:51









                          Stéphane ChazelasStéphane Chazelas

                          303k56570926




                          303k56570926






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Unix & Linux Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451168%2fgrep-a-range-of-values-with-specific-starting-characters%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Accessing regular linux commands in Huawei's Dopra Linux

                              Can't connect RFCOMM socket: Host is down

                              Kernel panic - not syncing: Fatal Exception in Interrupt