Count ONLY alphanumeric words in a specific column of a file

I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.

I know that the wc will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?

If you wanted to see a snippet of the file I've been given, here:

disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism

edited 53 mins ago

Rui F Ribeiro

40.1k1479136

asked Apr 11 '16 at 23:07

secondubly

1084

add a comment |

I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.

If you wanted to see a snippet of the file I've been given, here:

disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism

edited 53 mins ago

Rui F Ribeiro

40.1k1479136

asked Apr 11 '16 at 23:07

secondubly

1084

add a comment |

I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.

If you wanted to see a snippet of the file I've been given, here:

disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism

edited 53 mins ago

Rui F Ribeiro

40.1k1479136

asked Apr 11 '16 at 23:07

secondubly

1084

I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.

If you wanted to see a snippet of the file I've been given, here:

disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism

scripting wc

edited 53 mins ago

Rui F Ribeiro

40.1k1479136

asked Apr 11 '16 at 23:07

secondubly

1084

edited 53 mins ago

Rui F Ribeiro

40.1k1479136

asked Apr 11 '16 at 23:07

secondubly

1084

edited 53 mins ago

Rui F Ribeiro

40.1k1479136

edited 53 mins ago

Rui F Ribeiro

40.1k1479136

edited 53 mins ago

Rui F Ribeiro

40.1k1479136

asked Apr 11 '16 at 23:07

secondubly

1084

asked Apr 11 '16 at 23:07

secondubly

1084

asked Apr 11 '16 at 23:07

secondubly

1084

add a comment |

1 Answer
1

active

oldest

votes

If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.

Given your text (in testfile) will give:

$ sed -e s/[0-9]*//g testfile

disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism

The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.

You can count the words directly with wc after the filtering with the pipe.

$ sed -e s/[0-9]*//g testfile | wc

  2     104    1035

answered Apr 11 '16 at 23:12

flammi88

8118

I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

– secondubly
Apr 11 '16 at 23:27

Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

– secondubly
Apr 11 '16 at 23:34

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f275837%2fcount-only-alphanumeric-words-in-a-specific-column-of-a-file%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.

Given your text (in testfile) will give:

$ sed -e s/[0-9]*//g testfile

disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism

The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.

You can count the words directly with wc after the filtering with the pipe.

$ sed -e s/[0-9]*//g testfile | wc

  2     104    1035

answered Apr 11 '16 at 23:12

flammi88

8118

I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

– secondubly
Apr 11 '16 at 23:27

Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

– secondubly
Apr 11 '16 at 23:34

add a comment |

If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.

Given your text (in testfile) will give:

$ sed -e s/[0-9]*//g testfile

disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism

The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.

You can count the words directly with wc after the filtering with the pipe.

$ sed -e s/[0-9]*//g testfile | wc

  2     104    1035

answered Apr 11 '16 at 23:12

flammi88

8118

I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

– secondubly
Apr 11 '16 at 23:27

Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

– secondubly
Apr 11 '16 at 23:34

add a comment |

If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.

Given your text (in testfile) will give:

$ sed -e s/[0-9]*//g testfile

disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism

The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.

You can count the words directly with wc after the filtering with the pipe.

$ sed -e s/[0-9]*//g testfile | wc

  2     104    1035

answered Apr 11 '16 at 23:12

flammi88

8118

If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.

Given your text (in testfile) will give:

$ sed -e s/[0-9]*//g testfile

disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism

The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.

You can count the words directly with wc after the filtering with the pipe.

$ sed -e s/[0-9]*//g testfile | wc

  2     104    1035

answered Apr 11 '16 at 23:12

flammi88

8118

answered Apr 11 '16 at 23:12

flammi88

8118

answered Apr 11 '16 at 23:12

flammi88

8118

answered Apr 11 '16 at 23:12

flammi88

8118

I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

– secondubly
Apr 11 '16 at 23:27

Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

– secondubly
Apr 11 '16 at 23:34

add a comment |

I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

– secondubly
Apr 11 '16 at 23:27

Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

– secondubly
Apr 11 '16 at 23:34

I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!

– secondubly
Apr 11 '16 at 23:27

Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.

– secondubly
Apr 11 '16 at 23:34

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj