Count ONLY alphanumeric words in a specific column of a file
I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.
I know that the wc
will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?
If you wanted to see a snippet of the file I've been given, here:
disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism
scripting wc
add a comment |
I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.
I know that the wc
will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?
If you wanted to see a snippet of the file I've been given, here:
disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism
scripting wc
add a comment |
I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.
I know that the wc
will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?
If you wanted to see a snippet of the file I've been given, here:
disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism
scripting wc
I'm given a test file and am supposed to be able to count the words in a specific column of a file. The catch is that there are some lines that contain only numbers.
I know that the wc
will give the word count of something - but if I remember correctly it does not distinguish between numbers and actual words (so a file with 0184674673 HELLO would give a word count of two) - is there some non-convoluted way (short of going through each line in the column, checking if the word has any numbers, and incrementing a counter if not) of solving this with a command or no?
If you wanted to see a snippet of the file I've been given, here:
disobediently RESINY GRAPHICS lownesses prickers intractabiliti
es 85790227 villainously MINIS blinkering applicants TORPIDITIES
subtexts apportioned carded electrocardiograph SINTERED FOOTSORE
ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER 15318116 PUTRIDNESS
EXECUTIONAL vanguard LUCENTLY CONGRUENTLY 14117187 pretending
misshapes cowslip 18714723 JUDGES INTERNATIONALIZING DISCUSSES
85192973 quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks
prologuing indelicacy NUTRITIONS decompresses manlike aggregately
NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN
UNCONTROLLED SUFFERER CLOPPING DUALLY 5363130 DISCOMMODING ENTRANCED
brilliantness changeableness driest uncouth abjectnesses grumpiness
ache 94854804 JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine
AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful 98908803
CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE
94683675 cannibal hostilely KALI ADMIRATIONS 95714958 AMPUTATED
65196125 VIEWFINDER uprated narrowing disavowing ALPINES Stahl
HEELLESS feminises LUCKINESSES patriarchate anticommunism
scripting wc
scripting wc
edited 53 mins ago
Rui F Ribeiro
40.1k1479136
40.1k1479136
asked Apr 11 '16 at 23:07
secondublysecondubly
1084
1084
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.
Given your text (in testfile) will give:
$ sed -e s/[0-9]*//g testfile
disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism
The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.
You can count the words directly with wc after the filtering with the pipe.
$ sed -e s/[0-9]*//g testfile | wc
2 104 1035
I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!
– secondubly
Apr 11 '16 at 23:27
Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.
– secondubly
Apr 11 '16 at 23:34
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f275837%2fcount-only-alphanumeric-words-in-a-specific-column-of-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.
Given your text (in testfile) will give:
$ sed -e s/[0-9]*//g testfile
disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism
The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.
You can count the words directly with wc after the filtering with the pipe.
$ sed -e s/[0-9]*//g testfile | wc
2 104 1035
I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!
– secondubly
Apr 11 '16 at 23:27
Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.
– secondubly
Apr 11 '16 at 23:34
add a comment |
If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.
Given your text (in testfile) will give:
$ sed -e s/[0-9]*//g testfile
disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism
The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.
You can count the words directly with wc after the filtering with the pipe.
$ sed -e s/[0-9]*//g testfile | wc
2 104 1035
I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!
– secondubly
Apr 11 '16 at 23:27
Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.
– secondubly
Apr 11 '16 at 23:34
add a comment |
If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.
Given your text (in testfile) will give:
$ sed -e s/[0-9]*//g testfile
disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism
The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.
You can count the words directly with wc after the filtering with the pipe.
$ sed -e s/[0-9]*//g testfile | wc
2 104 1035
If you are not restricted to wc, you can filter out the numbers with a tool like sed and then count the words using wc.
Given your text (in testfile) will give:
$ sed -e s/[0-9]*//g testfile
disobediently RESINY GRAPHICS lownesses prickers intractabiliti es villainously MINIS blinkering applicants TORPIDITIES subtexts apportioned carded electrocardiograph SINTERED FOOTSORE ENTHRALMENTS Sherpa FLAN OVERRULES NEWSREADER PUTRIDNESS EXECUTIONAL vanguard LUCENTLY CONGRUENTLY pretending misshapes cowslip JUDGES INTERNATIONALIZING DISCUSSES quorate shirking SECURES scrofula exclude NUCLIDE shipwrecks prologuing indelicacy NUTRITIONS decompresses manlike aggregately NEGOTIANT chewy Egypt bloodsports STOREYS worthier BELLOWING HAEMATIN UNCONTROLLED SUFFERER CLOPPING DUALLY DISCOMMODING ENTRANCED brilliantness changeableness driest uncouth abjectnesses grumpiness ache JETSAMS barbarousness REPOSSESSIONS INCLINATION Jardine AUTHORISED parading ties Hillyer USHER COMPLIANCES disdainful CANDIDACY Rostov titrates DICTIONARIES optimists luted WART RAPINE cannibal hostilely KALI ADMIRATIONS AMPUTATED VIEWFINDER uprated narrowing disavowing ALPINES Stahl HEELLESS feminises LUCKINESSES patriarchate anticommunism
The regex pattern I used has the drawback that the whitespaces after the numbers are not removed, but this seems not to matter for counting with wc.
You can count the words directly with wc after the filtering with the pipe.
$ sed -e s/[0-9]*//g testfile | wc
2 104 1035
answered Apr 11 '16 at 23:12
flammi88flammi88
8118
8118
I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!
– secondubly
Apr 11 '16 at 23:27
Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.
– secondubly
Apr 11 '16 at 23:34
add a comment |
I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!
– secondubly
Apr 11 '16 at 23:27
Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.
– secondubly
Apr 11 '16 at 23:34
I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!
– secondubly
Apr 11 '16 at 23:27
I am not restricted to wc! I honestly had not thought of this, but it makes things a lot easier than my method, I'll try it out!
– secondubly
Apr 11 '16 at 23:27
Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.
– secondubly
Apr 11 '16 at 23:34
Edit - I used your command (I had to cut the specific column I was reading from) but it worked like a charm! I went ahead and marked your answer as the correct one.
– secondubly
Apr 11 '16 at 23:34
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f275837%2fcount-only-alphanumeric-words-in-a-specific-column-of-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown