Grep a range of values with specific starting characters

-1

I have 10GB files in which i want to count the occurrences of some specific text i.e TY[0-9].

Example file:

ABC,2A,2018-07-06,2018-06-20 00:00:00

BCD,TY1,2018-07-06,2018-06-20 00:00:00

EFG,TY2,2018-07-06,2018-06-20 00:00:00

IGH,2A,2018-07-06,2018-06-20 00:00:00

I want to get the count of all text starting with TY and then a digit. I tried using egrep but am not getting the correct result.

egrep  "^TY[0-9]" Filename

edited 12 mins ago

Crypteya

918

asked Jun 21 '18 at 18:37

Developer

15517

add a comment |

-1

I have 10GB files in which i want to count the occurrences of some specific text i.e TY[0-9].

Example file:

ABC,2A,2018-07-06,2018-06-20 00:00:00

BCD,TY1,2018-07-06,2018-06-20 00:00:00

EFG,TY2,2018-07-06,2018-06-20 00:00:00

IGH,2A,2018-07-06,2018-06-20 00:00:00

I want to get the count of all text starting with TY and then a digit. I tried using egrep but am not getting the correct result.

egrep  "^TY[0-9]" Filename

edited 12 mins ago

Crypteya

918

asked Jun 21 '18 at 18:37

Developer

15517

add a comment |

-1

I have 10GB files in which i want to count the occurrences of some specific text i.e TY[0-9].

Example file:

ABC,2A,2018-07-06,2018-06-20 00:00:00

BCD,TY1,2018-07-06,2018-06-20 00:00:00

EFG,TY2,2018-07-06,2018-06-20 00:00:00

IGH,2A,2018-07-06,2018-06-20 00:00:00

I want to get the count of all text starting with TY and then a digit. I tried using egrep but am not getting the correct result.

egrep  "^TY[0-9]" Filename

edited 12 mins ago

Crypteya

918

asked Jun 21 '18 at 18:37

Developer

15517

I have 10GB files in which i want to count the occurrences of some specific text i.e TY[0-9].

Example file:

ABC,2A,2018-07-06,2018-06-20 00:00:00

BCD,TY1,2018-07-06,2018-06-20 00:00:00

EFG,TY2,2018-07-06,2018-06-20 00:00:00

IGH,2A,2018-07-06,2018-06-20 00:00:00

I want to get the count of all text starting with TY and then a digit. I tried using egrep but am not getting the correct result.

egrep  "^TY[0-9]" Filename

awk grep

edited 12 mins ago

Crypteya

918

asked Jun 21 '18 at 18:37

Developer

15517

edited 12 mins ago

Crypteya

918

asked Jun 21 '18 at 18:37

Developer

15517

edited 12 mins ago

Crypteya

918

edited 12 mins ago

Crypteya

918

edited 12 mins ago

Crypteya

918

asked Jun 21 '18 at 18:37

Developer

15517

asked Jun 21 '18 at 18:37

Developer

15517

asked Jun 21 '18 at 18:37

Developer

15517

add a comment |

3 Answers
3

active

oldest

votes

Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:

awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename

I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.

cut -d, -f2 filename | grep -c '^TY[[:digit:]]'

... but I'm not sure.

After some testing on my OpenBSD system, using a 1.1GB file, the cut+grep is actually almost 50% quicker than awk (8 seconds vs. 15 seconds). And a pure grep solution (grep -Ec '<TY[0-9]' filename, taken from glenn's solution) takes 13 seconds.

So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.

edited Jun 21 '18 at 19:02

answered Jun 21 '18 at 18:47

Kusalananda

127k16239393

In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

– DopeGhoti
Jun 21 '18 at 18:59

@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

– Kusalananda
Jun 21 '18 at 19:02

add a comment |

You want to use a word boundary instead of the start-of-line anchor:

$ grep -Ec '<TY[0-9]' file

2

Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then

$ grep -Eo '<TY[0-9]' file | wc -l

answered Jun 21 '18 at 18:45

glenn jackman

51.2k571110

add a comment |

If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:

<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'

Which on an input like:

TY1,TY2,TY,TYFOO

TY213,X-TY2,TY4

Would return 4 (TY1, TY2, TY213, TY4).

(?<!...) and (?!...) are respectively negative look behing and ahead operators. So here, we're looking for TY followed by one or more (+) digits (d), provided its neither preceded nor followed by a character other than ,.

Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:

<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'

(on my system, that's about 10 times as fast as the perl solution)

edited Jun 21 '18 at 19:03

answered Jun 21 '18 at 18:51

Stéphane Chazelas

303k56570926

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f451168%2fgrep-a-range-of-values-with-specific-starting-characters%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:

awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename

I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.

cut -d, -f2 filename | grep -c '^TY[[:digit:]]'

... but I'm not sure.

So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.

edited Jun 21 '18 at 19:02

answered Jun 21 '18 at 18:47

Kusalananda

127k16239393

In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

– DopeGhoti
Jun 21 '18 at 18:59

@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

– Kusalananda
Jun 21 '18 at 19:02

add a comment |

Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:

awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename

I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.

cut -d, -f2 filename | grep -c '^TY[[:digit:]]'

... but I'm not sure.

So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.

edited Jun 21 '18 at 19:02

answered Jun 21 '18 at 18:47

Kusalananda

127k16239393

In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

– DopeGhoti
Jun 21 '18 at 18:59

@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

– Kusalananda
Jun 21 '18 at 19:02

add a comment |

Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:

awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename

I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.

cut -d, -f2 filename | grep -c '^TY[[:digit:]]'

... but I'm not sure.

So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.

edited Jun 21 '18 at 19:02

answered Jun 21 '18 at 18:47

Kusalananda

127k16239393

Using awk to count the number of times the second comma-delimited field in the file starts with the string TY followed by a digit:

awk -F, '$2 ~ /^TY[[:digit:]]/ { n++ } END { print n }' filename

I'm wondering whether using cut in combination with grep would be quick? Cutting out the second column would give grep less data to work with, and so it may be quicker than just grep alone.

cut -d, -f2 filename | grep -c '^TY[[:digit:]]'

... but I'm not sure.

So if the string is to picked out of the second field only, one may gain some time by extracting only that field before matching.

edited Jun 21 '18 at 19:02

answered Jun 21 '18 at 18:47

Kusalananda

127k16239393

edited Jun 21 '18 at 19:02

answered Jun 21 '18 at 18:47

Kusalananda

127k16239393

answered Jun 21 '18 at 18:47

Kusalananda

127k16239393

answered Jun 21 '18 at 18:47

Kusalananda

127k16239393

In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

– DopeGhoti
Jun 21 '18 at 18:59

@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

– Kusalananda
Jun 21 '18 at 19:02

add a comment |

In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

– DopeGhoti
Jun 21 '18 at 18:59

@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

– Kusalananda
Jun 21 '18 at 19:02

In your second example, why not cut -d, -f2 inputfile | grep -c [...] rather than | grep | wc -l?

– DopeGhoti
Jun 21 '18 at 18:59

@DopeGhoti Derrp. Yes. Thanks. Made it even quicker too.

– Kusalananda
Jun 21 '18 at 19:02

add a comment |

You want to use a word boundary instead of the start-of-line anchor:

$ grep -Ec '<TY[0-9]' file

2

Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then

$ grep -Eo '<TY[0-9]' file | wc -l

answered Jun 21 '18 at 18:45

glenn jackman

51.2k571110

add a comment |

You want to use a word boundary instead of the start-of-line anchor:

$ grep -Ec '<TY[0-9]' file

2

Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then

$ grep -Eo '<TY[0-9]' file | wc -l

answered Jun 21 '18 at 18:45

glenn jackman

51.2k571110

add a comment |

You want to use a word boundary instead of the start-of-line anchor:

$ grep -Ec '<TY[0-9]' file

2

Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then

$ grep -Eo '<TY[0-9]' file | wc -l

answered Jun 21 '18 at 18:45

glenn jackman

51.2k571110

You want to use a word boundary instead of the start-of-line anchor:

$ grep -Ec '<TY[0-9]' file

2

Note: that is a count of all lines with a "TY word". It is not a count of all "TY word"s. If you can have more than one per line, then

$ grep -Eo '<TY[0-9]' file | wc -l

answered Jun 21 '18 at 18:45

glenn jackman

51.2k571110

answered Jun 21 '18 at 18:45

glenn jackman

51.2k571110

answered Jun 21 '18 at 18:45

glenn jackman

51.2k571110

answered Jun 21 '18 at 18:45

glenn jackman

51.2k571110

add a comment |

If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:

<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'

Which on an input like:

TY1,TY2,TY,TYFOO

TY213,X-TY2,TY4

Would return 4 (TY1, TY2, TY213, TY4).

Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:

<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'

(on my system, that's about 10 times as fast as the perl solution)

edited Jun 21 '18 at 19:03

answered Jun 21 '18 at 18:51

Stéphane Chazelas

303k56570926

add a comment |

If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:

<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'

Which on an input like:

TY1,TY2,TY,TYFOO

TY213,X-TY2,TY4

Would return 4 (TY1, TY2, TY213, TY4).

Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:

<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'

(on my system, that's about 10 times as fast as the perl solution)

edited Jun 21 '18 at 19:03

answered Jun 21 '18 at 18:51

Stéphane Chazelas

303k56570926

add a comment |

If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:

<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'

Which on an input like:

TY1,TY2,TY,TYFOO

TY213,X-TY2,TY4

Would return 4 (TY1, TY2, TY213, TY4).

Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:

<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'

(on my system, that's about 10 times as fast as the perl solution)

edited Jun 21 '18 at 19:03

answered Jun 21 '18 at 18:51

Stéphane Chazelas

303k56570926

If you want to find the number of occurrence of a , delimited field that starts with TY and is followed by any number of decimal digits, you could do:

<file perl -lne '$n += () = /(?<![^,])TYd+(?![^,])/g; END{print 0+$n}'

Which on an input like:

TY1,TY2,TY,TYFOO

TY213,X-TY2,TY4

Would return 4 (TY1, TY2, TY213, TY4).

Another way to do it would be to convert ,s to newlines and count the number of resulting lines that start with TY followed by one or more digits:

<file tr , 'n' | LC_ALL=C grep -xEc 'TY[[:digit:]]+'

(on my system, that's about 10 times as fast as the perl solution)

edited Jun 21 '18 at 19:03

answered Jun 21 '18 at 18:51

Stéphane Chazelas

303k56570926

edited Jun 21 '18 at 19:03

answered Jun 21 '18 at 18:51

Stéphane Chazelas

303k56570926

answered Jun 21 '18 at 18:51

Stéphane Chazelas

303k56570926

answered Jun 21 '18 at 18:51

Stéphane Chazelas

303k56570926

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj