Count ONLY alphanumeric words in a specific column of a file












1















I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.



I know that the wc will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?



If you wanted to see a snippet of the file I've been given, here:




disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism











share|improve this question





























    1















    I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.



    I know that the wc will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?



    If you wanted to see a snippet of the file I've been given, here:




    disobediently RESINY GRAPHICS lownesses prickers intractabiliti
    es 85790227 villainously MINIS blinkering applicants TORPIDITIES
    subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
    ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
    EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
    misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
    85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
    prologuing indelicacy NUTRITIONS decompresses manlike aggregately
    NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
    UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
    brilliantness changeableness driest uncouth abjectnesses grumpiness
    ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
    AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
    CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
    94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
    65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
    HEELLESS feminises LUCKINESSES patriarchate anticommunism











    share|improve this question



























      1












      1








      1








      I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.



      I know that the wc will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?



      If you wanted to see a snippet of the file I've been given, here:




      disobediently RESINY GRAPHICS lownesses prickers intractabiliti
      es 85790227 villainously MINIS blinkering applicants TORPIDITIES
      subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
      ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
      EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
      misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
      85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
      prologuing indelicacy NUTRITIONS decompresses manlike aggregately
      NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
      UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
      brilliantness changeableness driest uncouth abjectnesses grumpiness
      ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
      AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
      CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
      94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
      65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
      HEELLESS feminises LUCKINESSES patriarchate anticommunism











      share|improve this question
















      I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.



      I know that the wc will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?



      If you wanted to see a snippet of the file I've been given, here:




      disobediently RESINY GRAPHICS lownesses prickers intractabiliti
      es 85790227 villainously MINIS blinkering applicants TORPIDITIES
      subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
      ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
      EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
      misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
      85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
      prologuing indelicacy NUTRITIONS decompresses manlike aggregately
      NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
      UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
      brilliantness changeableness driest uncouth abjectnesses grumpiness
      ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
      AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
      CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
      94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
      65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
      HEELLESS feminises LUCKINESSES patriarchate anticommunism








      scripting wc






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 53 mins ago









      Rui F Ribeiro

      40.1k1479136




      40.1k1479136










      asked Apr 11 '16 at 23:07









      secondublysecondubly

      1084




      1084






















          1 Answer
          1






          active

          oldest

          votes


















          1














          If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.



          Given your text (in testfile) will give:



          $ sed -e s/[0-9]*//g testfile



          disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism




          The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.



          You can count the words directly with wc after the filtering with the pipe.



          $ sed -e s/[0-9]*//g testfile | wc
          2 104 1035





          share|improve this answer
























          • I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

            – secondubly
            Apr 11 '16 at 23:27











          • Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

            – secondubly
            Apr 11 '16 at 23:34











          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "106"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f275837%2fcount-only-alphanumeric-words-in-a-specific-column-of-a-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.



          Given your text (in testfile) will give:



          $ sed -e s/[0-9]*//g testfile



          disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism




          The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.



          You can count the words directly with wc after the filtering with the pipe.



          $ sed -e s/[0-9]*//g testfile | wc
          2 104 1035





          share|improve this answer
























          • I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

            – secondubly
            Apr 11 '16 at 23:27











          • Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

            – secondubly
            Apr 11 '16 at 23:34
















          1














          If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.



          Given your text (in testfile) will give:



          $ sed -e s/[0-9]*//g testfile



          disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism




          The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.



          You can count the words directly with wc after the filtering with the pipe.



          $ sed -e s/[0-9]*//g testfile | wc
          2 104 1035





          share|improve this answer
























          • I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

            – secondubly
            Apr 11 '16 at 23:27











          • Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

            – secondubly
            Apr 11 '16 at 23:34














          1












          1








          1







          If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.



          Given your text (in testfile) will give:



          $ sed -e s/[0-9]*//g testfile



          disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism




          The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.



          You can count the words directly with wc after the filtering with the pipe.



          $ sed -e s/[0-9]*//g testfile | wc
          2 104 1035





          share|improve this answer













          If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.



          Given your text (in testfile) will give:



          $ sed -e s/[0-9]*//g testfile



          disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism




          The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.



          You can count the words directly with wc after the filtering with the pipe.



          $ sed -e s/[0-9]*//g testfile | wc
          2 104 1035






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 11 '16 at 23:12









          flammi88flammi88

          8118




          8118













          • I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

            – secondubly
            Apr 11 '16 at 23:27











          • Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

            – secondubly
            Apr 11 '16 at 23:34



















          • I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

            – secondubly
            Apr 11 '16 at 23:27











          • Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

            – secondubly
            Apr 11 '16 at 23:34

















          I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

          – secondubly
          Apr 11 '16 at 23:27





          I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

          – secondubly
          Apr 11 '16 at 23:27













          Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

          – secondubly
          Apr 11 '16 at 23:34





          Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

          – secondubly
          Apr 11 '16 at 23:34


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Unix & Linux Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f275837%2fcount-only-alphanumeric-words-in-a-specific-column-of-a-file%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          サソリ

          広島県道265号伴広島線

          Setup Asymptote in Texstudio