How do I grep recursively through .gz files?
up vote
117
down vote
favorite
I am using a script to regularly download my gmail messages that compresses the raw .eml into .gz files. The script creates a folder for each day, and then compresses every message into its own file.
I would like a way to search through this archive for a "string."
Grep alone doesn't appear to do it. I also tried SearchMonkey.
files grep search recursive compression
add a comment |
up vote
117
down vote
favorite
I am using a script to regularly download my gmail messages that compresses the raw .eml into .gz files. The script creates a folder for each day, and then compresses every message into its own file.
I would like a way to search through this archive for a "string."
Grep alone doesn't appear to do it. I also tried SearchMonkey.
files grep search recursive compression
13
usezgrep
:zgrep - search possibly compressed files for a regular expression
– Arkadiusz Drabczyk
Mar 2 '15 at 16:10
add a comment |
up vote
117
down vote
favorite
up vote
117
down vote
favorite
I am using a script to regularly download my gmail messages that compresses the raw .eml into .gz files. The script creates a folder for each day, and then compresses every message into its own file.
I would like a way to search through this archive for a "string."
Grep alone doesn't appear to do it. I also tried SearchMonkey.
files grep search recursive compression
I am using a script to regularly download my gmail messages that compresses the raw .eml into .gz files. The script creates a folder for each day, and then compresses every message into its own file.
I would like a way to search through this archive for a "string."
Grep alone doesn't appear to do it. I also tried SearchMonkey.
files grep search recursive compression
files grep search recursive compression
edited Mar 2 '15 at 20:37
Gilles
523k12610461576
523k12610461576
asked Mar 2 '15 at 16:03
Kendor
686265
686265
13
usezgrep
:zgrep - search possibly compressed files for a regular expression
– Arkadiusz Drabczyk
Mar 2 '15 at 16:10
add a comment |
13
usezgrep
:zgrep - search possibly compressed files for a regular expression
– Arkadiusz Drabczyk
Mar 2 '15 at 16:10
13
13
use
zgrep
: zgrep - search possibly compressed files for a regular expression
– Arkadiusz Drabczyk
Mar 2 '15 at 16:10
use
zgrep
: zgrep - search possibly compressed files for a regular expression
– Arkadiusz Drabczyk
Mar 2 '15 at 16:10
add a comment |
6 Answers
6
active
oldest
votes
up vote
122
down vote
If you want to grep recursively in all .eml.gz files in the current directory, you can use:
find . -name *.eml.gz -print0 | xargs -0 zgrep "STRING"
You have to escape the first *
so that the shell does not interpret it. -print0
tells find to print a null character after each file it finds; xargs -0
reads from standard input and runs the command after it for each file; zgrep
works like grep
, but uncompresses the file first.
2
'-print0' and '-0' are not mandatory. xargs uses 'n' by default.
– Jaime M.
Jul 7 '15 at 8:50
1
They're necessary if there might be space characters in the paths; there's no reason other than complexity not to use them.
– Daniel Griscom
Sep 23 '15 at 14:38
2
zgrep
actually seems faster thangrep
run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.
– Geremia
Aug 19 '16 at 17:54
@JaimeM.xargs
uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily:find . -name '*.eml.gz' -exec zgrep "STRING" {} +
That gets the same many arguments per-launch ofxargs
, the safety of-print0
/-0
, and all without the overhead of an extra process launch and piping, and fairly concisely.-exec
with+
is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.
– ShadowRanger
Dec 9 '16 at 18:38
@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them.ABCLog04_18_18_2_21.gz
Is there a way to recursively look for files beginning with ABC*. I tried replacing*.eml.gz
in your example above withABCLog*
and get an error about file format.:find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
– DevelopingDeveloper
Apr 18 at 19:21
add a comment |
up vote
54
down vote
There's a lot of confusion here because there isn't just one zgrep
. I have two versions on my system, zgrep
from gzip
and zgrep
from zutils
. The former is just a wrapper script that calls gzip -cdfq
. It doesn't support the -r, --recursive
switch.1
The latter is a c++
program and it supports the -r, --recursive
option.
Running zgrep --version | head -n 1
will reveal which one (if any) of them is the default:
zgrep (gzip) 1.6
is the wrapper script,
zgrep (zutils) 1.3
is the cpp
executable.
If you have the latter you could run:
zgrep 'pattern' -r --format=gz /path/to/dir
Anyway, as suggested, find
+ zgrep
will work equally well with either version of zgrep
:
find /path/to/dir -name '*.gz' -exec zgrep -- 'pattern' {} +
If zgrep
is missing from your system (highly unlikely) you could try with:
find /path/to/dir -name '*.gz' -exec sh -c 'gzip -cd "$0" | grep -- "pattern"' {} ;
but there's a major downside: you won't know where the matches are as there's no file name prepended to the matching lines .
1: because it would be problematic
1
ifzgrep
from zutils is not available you can install it in Ubuntu withsudo apt-get install zutils
.
– therealmarv
Jul 27 '15 at 1:46
1
Continued from @therealmarv ... and then Ubuntu will use the zutils zgrep instead of the gzip one. Then -r works!
– Elijah Lynn
Mar 8 '17 at 22:08
Is there a way to print the line number of the file the pattern is matched on?
– DogEatDog
Nov 8 '17 at 18:48
@DogEatDog - just likegrep -n
,zgrep -n
will print line no.s. It's in the manual...
– don_crissti
Nov 9 '17 at 22:55
add a comment |
up vote
6
down vote
ag
is a variant of grep
, with some nice extra features.
- has -z option for compressed files,
- has many of ack features.
- it is fast
So:
ag -r -z your-pattern-goes-here folder
If not installed,
apt-get install silversearcher-ag (debian and friends)
yum install the_silver_searcher (fedora)
brew install the_silver_searcher (mac)
1
I getag: truncated file: Success
as a result. Any other flag should I add?
– Yar
Sep 11 '17 at 21:10
add a comment |
up vote
4
down vote
Recursion alone is easy:
-r, --recursive
Read all files under each directory, recursively, following
symbolic links only if they are on the command line. This is
equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
However, for compressed files you need something like:
shopt globstar
for file in /path/to/directory/**/*gz; do zcat ""$file" | grep pattern; done
path/to/directory
should be the parent directory that contains the subdirectories for each day.
zgrep
is the obvious answer but, unfortunately, it does not support the -r
flag. From man zgrep
:
These grep options will cause zgrep to terminate with an error code: (-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*).
add a comment |
up vote
3
down vote
If your system has zgrep, you can simply
zgrep -irs your-pattern-goes-here the-folder-to-search-goes-here/
If your system does not have zgrep, you can use the find command to run zcat and grep against each file like so:
find the-folder-to-search-goes-here/ -name '*.gz'
-exec sh -c 'echo "Searching {}" ; zcat "{}" | grep your-pattern-goes-here ' ;
Forgive me greeness on this... the files to be searched through are a couple of layers deep. ~/gmvault-db/db/2015-02 contains a folder for each month archived, and then underneath that the .gz files for that month are stored. If I'm search for .mil within that whole tree, is that what I would do? find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:28
1
That's fine - the "r" in -irs will cause zgrep to search recursively. The find command operates recursively by default, so any file which ends in .gz will be zcatted and passed into grep. (and the {} will be expanded to the relative path of the file which is about to be searched). So when you get a hit, it will be preceded bySearching ~/gmvault-db/db/2015-02/03/whatever.gz
– Nate from Kalamazoo
Mar 2 '15 at 16:29
Here's what I get back: find: "paths must precede expression: -exec" Here's the command I used: find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:36
take out the backslash between the '*.gz' and the -exec.
– Nate from Kalamazoo
Mar 2 '15 at 16:37
4
zgrep
won't take the-r
flag for some reason. That's mention inman zgrep
(also see my answer).
– terdon♦
Mar 2 '15 at 17:12
|
show 2 more comments
up vote
0
down vote
xzgrep -l "string" ./*/*.eml.gz
xzgrep is a derivative of the zgrep utils (less /bin/xzgrep)
From the Man page:
xzgrep invokes grep(1) on files which may be either uncompressed or compressed with xz(1), lzma(1), gzip(1), bzip2(1), or lzop(1). All
options specified are passed directly to grep(1).
-l print the matching file name
-R for recursion will not work as it's specifically prohibited in the script, however simple shell globbing should get us there
./*/*.eml.gz
from a relative path where ./today/sample.eml.gz, match on all instances of that are one level below our relative position in the shell, that ends with ".eml.gz"
add a comment |
protected by Anthon Apr 22 '16 at 10:30
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
122
down vote
If you want to grep recursively in all .eml.gz files in the current directory, you can use:
find . -name *.eml.gz -print0 | xargs -0 zgrep "STRING"
You have to escape the first *
so that the shell does not interpret it. -print0
tells find to print a null character after each file it finds; xargs -0
reads from standard input and runs the command after it for each file; zgrep
works like grep
, but uncompresses the file first.
2
'-print0' and '-0' are not mandatory. xargs uses 'n' by default.
– Jaime M.
Jul 7 '15 at 8:50
1
They're necessary if there might be space characters in the paths; there's no reason other than complexity not to use them.
– Daniel Griscom
Sep 23 '15 at 14:38
2
zgrep
actually seems faster thangrep
run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.
– Geremia
Aug 19 '16 at 17:54
@JaimeM.xargs
uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily:find . -name '*.eml.gz' -exec zgrep "STRING" {} +
That gets the same many arguments per-launch ofxargs
, the safety of-print0
/-0
, and all without the overhead of an extra process launch and piping, and fairly concisely.-exec
with+
is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.
– ShadowRanger
Dec 9 '16 at 18:38
@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them.ABCLog04_18_18_2_21.gz
Is there a way to recursively look for files beginning with ABC*. I tried replacing*.eml.gz
in your example above withABCLog*
and get an error about file format.:find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
– DevelopingDeveloper
Apr 18 at 19:21
add a comment |
up vote
122
down vote
If you want to grep recursively in all .eml.gz files in the current directory, you can use:
find . -name *.eml.gz -print0 | xargs -0 zgrep "STRING"
You have to escape the first *
so that the shell does not interpret it. -print0
tells find to print a null character after each file it finds; xargs -0
reads from standard input and runs the command after it for each file; zgrep
works like grep
, but uncompresses the file first.
2
'-print0' and '-0' are not mandatory. xargs uses 'n' by default.
– Jaime M.
Jul 7 '15 at 8:50
1
They're necessary if there might be space characters in the paths; there's no reason other than complexity not to use them.
– Daniel Griscom
Sep 23 '15 at 14:38
2
zgrep
actually seems faster thangrep
run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.
– Geremia
Aug 19 '16 at 17:54
@JaimeM.xargs
uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily:find . -name '*.eml.gz' -exec zgrep "STRING" {} +
That gets the same many arguments per-launch ofxargs
, the safety of-print0
/-0
, and all without the overhead of an extra process launch and piping, and fairly concisely.-exec
with+
is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.
– ShadowRanger
Dec 9 '16 at 18:38
@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them.ABCLog04_18_18_2_21.gz
Is there a way to recursively look for files beginning with ABC*. I tried replacing*.eml.gz
in your example above withABCLog*
and get an error about file format.:find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
– DevelopingDeveloper
Apr 18 at 19:21
add a comment |
up vote
122
down vote
up vote
122
down vote
If you want to grep recursively in all .eml.gz files in the current directory, you can use:
find . -name *.eml.gz -print0 | xargs -0 zgrep "STRING"
You have to escape the first *
so that the shell does not interpret it. -print0
tells find to print a null character after each file it finds; xargs -0
reads from standard input and runs the command after it for each file; zgrep
works like grep
, but uncompresses the file first.
If you want to grep recursively in all .eml.gz files in the current directory, you can use:
find . -name *.eml.gz -print0 | xargs -0 zgrep "STRING"
You have to escape the first *
so that the shell does not interpret it. -print0
tells find to print a null character after each file it finds; xargs -0
reads from standard input and runs the command after it for each file; zgrep
works like grep
, but uncompresses the file first.
edited Nov 30 at 17:57
W. Bontrager
32
32
answered Mar 2 '15 at 16:20
Jared Stafford
1,376146
1,376146
2
'-print0' and '-0' are not mandatory. xargs uses 'n' by default.
– Jaime M.
Jul 7 '15 at 8:50
1
They're necessary if there might be space characters in the paths; there's no reason other than complexity not to use them.
– Daniel Griscom
Sep 23 '15 at 14:38
2
zgrep
actually seems faster thangrep
run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.
– Geremia
Aug 19 '16 at 17:54
@JaimeM.xargs
uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily:find . -name '*.eml.gz' -exec zgrep "STRING" {} +
That gets the same many arguments per-launch ofxargs
, the safety of-print0
/-0
, and all without the overhead of an extra process launch and piping, and fairly concisely.-exec
with+
is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.
– ShadowRanger
Dec 9 '16 at 18:38
@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them.ABCLog04_18_18_2_21.gz
Is there a way to recursively look for files beginning with ABC*. I tried replacing*.eml.gz
in your example above withABCLog*
and get an error about file format.:find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
– DevelopingDeveloper
Apr 18 at 19:21
add a comment |
2
'-print0' and '-0' are not mandatory. xargs uses 'n' by default.
– Jaime M.
Jul 7 '15 at 8:50
1
They're necessary if there might be space characters in the paths; there's no reason other than complexity not to use them.
– Daniel Griscom
Sep 23 '15 at 14:38
2
zgrep
actually seems faster thangrep
run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.
– Geremia
Aug 19 '16 at 17:54
@JaimeM.xargs
uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily:find . -name '*.eml.gz' -exec zgrep "STRING" {} +
That gets the same many arguments per-launch ofxargs
, the safety of-print0
/-0
, and all without the overhead of an extra process launch and piping, and fairly concisely.-exec
with+
is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.
– ShadowRanger
Dec 9 '16 at 18:38
@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them.ABCLog04_18_18_2_21.gz
Is there a way to recursively look for files beginning with ABC*. I tried replacing*.eml.gz
in your example above withABCLog*
and get an error about file format.:find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
– DevelopingDeveloper
Apr 18 at 19:21
2
2
'-print0' and '-0' are not mandatory. xargs uses 'n' by default.
– Jaime M.
Jul 7 '15 at 8:50
'-print0' and '-0' are not mandatory. xargs uses 'n' by default.
– Jaime M.
Jul 7 '15 at 8:50
1
1
They're necessary if there might be space characters in the paths; there's no reason other than complexity not to use them.
– Daniel Griscom
Sep 23 '15 at 14:38
They're necessary if there might be space characters in the paths; there's no reason other than complexity not to use them.
– Daniel Griscom
Sep 23 '15 at 14:38
2
2
zgrep
actually seems faster than grep
run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.– Geremia
Aug 19 '16 at 17:54
zgrep
actually seems faster than grep
run on uncompressed files. It must be because compressed files can be read off the HD and decompressed faster than reading an uncompressed file from the HD.– Geremia
Aug 19 '16 at 17:54
@JaimeM.
xargs
uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily: find . -name '*.eml.gz' -exec zgrep "STRING" {} +
That gets the same many arguments per-launch of xargs
, the safety of -print0
/-0
, and all without the overhead of an extra process launch and piping, and fairly concisely. -exec
with +
is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.– ShadowRanger
Dec 9 '16 at 18:38
@JaimeM.
xargs
uses blanks (whitespace) by default. Sure, files almost never have newlines in them, but spaces are not unheard of (even if most UNIXy types frown on them). That said, you can simplify without worrying about whitespace even more easily: find . -name '*.eml.gz' -exec zgrep "STRING" {} +
That gets the same many arguments per-launch of xargs
, the safety of -print0
/-0
, and all without the overhead of an extra process launch and piping, and fairly concisely. -exec
with +
is POSIX specified, so it should be on most semi-recent UNIX-like systems to my knowledge.– ShadowRanger
Dec 9 '16 at 18:38
@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them.
ABCLog04_18_18_2_21.gz
Is there a way to recursively look for files beginning with ABC*. I tried replacing *.eml.gz
in your example above with ABCLog*
and get an error about file format.: find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
– DevelopingDeveloper
Apr 18 at 19:21
@Jared Is there a way to do a wildcard search only knowing the beginning of the file pattern? For example, I have .gz files that have date/time stamps at the end of them.
ABCLog04_18_18_2_21.gz
Is there a way to recursively look for files beginning with ABC*. I tried replacing *.eml.gz
in your example above with ABCLog*
and get an error about file format.: find: paths must precede expression: ABCLog-2018-03-12-10-16-1.log.gz Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
– DevelopingDeveloper
Apr 18 at 19:21
add a comment |
up vote
54
down vote
There's a lot of confusion here because there isn't just one zgrep
. I have two versions on my system, zgrep
from gzip
and zgrep
from zutils
. The former is just a wrapper script that calls gzip -cdfq
. It doesn't support the -r, --recursive
switch.1
The latter is a c++
program and it supports the -r, --recursive
option.
Running zgrep --version | head -n 1
will reveal which one (if any) of them is the default:
zgrep (gzip) 1.6
is the wrapper script,
zgrep (zutils) 1.3
is the cpp
executable.
If you have the latter you could run:
zgrep 'pattern' -r --format=gz /path/to/dir
Anyway, as suggested, find
+ zgrep
will work equally well with either version of zgrep
:
find /path/to/dir -name '*.gz' -exec zgrep -- 'pattern' {} +
If zgrep
is missing from your system (highly unlikely) you could try with:
find /path/to/dir -name '*.gz' -exec sh -c 'gzip -cd "$0" | grep -- "pattern"' {} ;
but there's a major downside: you won't know where the matches are as there's no file name prepended to the matching lines .
1: because it would be problematic
1
ifzgrep
from zutils is not available you can install it in Ubuntu withsudo apt-get install zutils
.
– therealmarv
Jul 27 '15 at 1:46
1
Continued from @therealmarv ... and then Ubuntu will use the zutils zgrep instead of the gzip one. Then -r works!
– Elijah Lynn
Mar 8 '17 at 22:08
Is there a way to print the line number of the file the pattern is matched on?
– DogEatDog
Nov 8 '17 at 18:48
@DogEatDog - just likegrep -n
,zgrep -n
will print line no.s. It's in the manual...
– don_crissti
Nov 9 '17 at 22:55
add a comment |
up vote
54
down vote
There's a lot of confusion here because there isn't just one zgrep
. I have two versions on my system, zgrep
from gzip
and zgrep
from zutils
. The former is just a wrapper script that calls gzip -cdfq
. It doesn't support the -r, --recursive
switch.1
The latter is a c++
program and it supports the -r, --recursive
option.
Running zgrep --version | head -n 1
will reveal which one (if any) of them is the default:
zgrep (gzip) 1.6
is the wrapper script,
zgrep (zutils) 1.3
is the cpp
executable.
If you have the latter you could run:
zgrep 'pattern' -r --format=gz /path/to/dir
Anyway, as suggested, find
+ zgrep
will work equally well with either version of zgrep
:
find /path/to/dir -name '*.gz' -exec zgrep -- 'pattern' {} +
If zgrep
is missing from your system (highly unlikely) you could try with:
find /path/to/dir -name '*.gz' -exec sh -c 'gzip -cd "$0" | grep -- "pattern"' {} ;
but there's a major downside: you won't know where the matches are as there's no file name prepended to the matching lines .
1: because it would be problematic
1
ifzgrep
from zutils is not available you can install it in Ubuntu withsudo apt-get install zutils
.
– therealmarv
Jul 27 '15 at 1:46
1
Continued from @therealmarv ... and then Ubuntu will use the zutils zgrep instead of the gzip one. Then -r works!
– Elijah Lynn
Mar 8 '17 at 22:08
Is there a way to print the line number of the file the pattern is matched on?
– DogEatDog
Nov 8 '17 at 18:48
@DogEatDog - just likegrep -n
,zgrep -n
will print line no.s. It's in the manual...
– don_crissti
Nov 9 '17 at 22:55
add a comment |
up vote
54
down vote
up vote
54
down vote
There's a lot of confusion here because there isn't just one zgrep
. I have two versions on my system, zgrep
from gzip
and zgrep
from zutils
. The former is just a wrapper script that calls gzip -cdfq
. It doesn't support the -r, --recursive
switch.1
The latter is a c++
program and it supports the -r, --recursive
option.
Running zgrep --version | head -n 1
will reveal which one (if any) of them is the default:
zgrep (gzip) 1.6
is the wrapper script,
zgrep (zutils) 1.3
is the cpp
executable.
If you have the latter you could run:
zgrep 'pattern' -r --format=gz /path/to/dir
Anyway, as suggested, find
+ zgrep
will work equally well with either version of zgrep
:
find /path/to/dir -name '*.gz' -exec zgrep -- 'pattern' {} +
If zgrep
is missing from your system (highly unlikely) you could try with:
find /path/to/dir -name '*.gz' -exec sh -c 'gzip -cd "$0" | grep -- "pattern"' {} ;
but there's a major downside: you won't know where the matches are as there's no file name prepended to the matching lines .
1: because it would be problematic
There's a lot of confusion here because there isn't just one zgrep
. I have two versions on my system, zgrep
from gzip
and zgrep
from zutils
. The former is just a wrapper script that calls gzip -cdfq
. It doesn't support the -r, --recursive
switch.1
The latter is a c++
program and it supports the -r, --recursive
option.
Running zgrep --version | head -n 1
will reveal which one (if any) of them is the default:
zgrep (gzip) 1.6
is the wrapper script,
zgrep (zutils) 1.3
is the cpp
executable.
If you have the latter you could run:
zgrep 'pattern' -r --format=gz /path/to/dir
Anyway, as suggested, find
+ zgrep
will work equally well with either version of zgrep
:
find /path/to/dir -name '*.gz' -exec zgrep -- 'pattern' {} +
If zgrep
is missing from your system (highly unlikely) you could try with:
find /path/to/dir -name '*.gz' -exec sh -c 'gzip -cd "$0" | grep -- "pattern"' {} ;
but there's a major downside: you won't know where the matches are as there's no file name prepended to the matching lines .
1: because it would be problematic
edited Nov 13 '17 at 14:08
answered Mar 3 '15 at 10:18
don_crissti
49k15129157
49k15129157
1
ifzgrep
from zutils is not available you can install it in Ubuntu withsudo apt-get install zutils
.
– therealmarv
Jul 27 '15 at 1:46
1
Continued from @therealmarv ... and then Ubuntu will use the zutils zgrep instead of the gzip one. Then -r works!
– Elijah Lynn
Mar 8 '17 at 22:08
Is there a way to print the line number of the file the pattern is matched on?
– DogEatDog
Nov 8 '17 at 18:48
@DogEatDog - just likegrep -n
,zgrep -n
will print line no.s. It's in the manual...
– don_crissti
Nov 9 '17 at 22:55
add a comment |
1
ifzgrep
from zutils is not available you can install it in Ubuntu withsudo apt-get install zutils
.
– therealmarv
Jul 27 '15 at 1:46
1
Continued from @therealmarv ... and then Ubuntu will use the zutils zgrep instead of the gzip one. Then -r works!
– Elijah Lynn
Mar 8 '17 at 22:08
Is there a way to print the line number of the file the pattern is matched on?
– DogEatDog
Nov 8 '17 at 18:48
@DogEatDog - just likegrep -n
,zgrep -n
will print line no.s. It's in the manual...
– don_crissti
Nov 9 '17 at 22:55
1
1
if
zgrep
from zutils is not available you can install it in Ubuntu with sudo apt-get install zutils
.– therealmarv
Jul 27 '15 at 1:46
if
zgrep
from zutils is not available you can install it in Ubuntu with sudo apt-get install zutils
.– therealmarv
Jul 27 '15 at 1:46
1
1
Continued from @therealmarv ... and then Ubuntu will use the zutils zgrep instead of the gzip one. Then -r works!
– Elijah Lynn
Mar 8 '17 at 22:08
Continued from @therealmarv ... and then Ubuntu will use the zutils zgrep instead of the gzip one. Then -r works!
– Elijah Lynn
Mar 8 '17 at 22:08
Is there a way to print the line number of the file the pattern is matched on?
– DogEatDog
Nov 8 '17 at 18:48
Is there a way to print the line number of the file the pattern is matched on?
– DogEatDog
Nov 8 '17 at 18:48
@DogEatDog - just like
grep -n
, zgrep -n
will print line no.s. It's in the manual...– don_crissti
Nov 9 '17 at 22:55
@DogEatDog - just like
grep -n
, zgrep -n
will print line no.s. It's in the manual...– don_crissti
Nov 9 '17 at 22:55
add a comment |
up vote
6
down vote
ag
is a variant of grep
, with some nice extra features.
- has -z option for compressed files,
- has many of ack features.
- it is fast
So:
ag -r -z your-pattern-goes-here folder
If not installed,
apt-get install silversearcher-ag (debian and friends)
yum install the_silver_searcher (fedora)
brew install the_silver_searcher (mac)
1
I getag: truncated file: Success
as a result. Any other flag should I add?
– Yar
Sep 11 '17 at 21:10
add a comment |
up vote
6
down vote
ag
is a variant of grep
, with some nice extra features.
- has -z option for compressed files,
- has many of ack features.
- it is fast
So:
ag -r -z your-pattern-goes-here folder
If not installed,
apt-get install silversearcher-ag (debian and friends)
yum install the_silver_searcher (fedora)
brew install the_silver_searcher (mac)
1
I getag: truncated file: Success
as a result. Any other flag should I add?
– Yar
Sep 11 '17 at 21:10
add a comment |
up vote
6
down vote
up vote
6
down vote
ag
is a variant of grep
, with some nice extra features.
- has -z option for compressed files,
- has many of ack features.
- it is fast
So:
ag -r -z your-pattern-goes-here folder
If not installed,
apt-get install silversearcher-ag (debian and friends)
yum install the_silver_searcher (fedora)
brew install the_silver_searcher (mac)
ag
is a variant of grep
, with some nice extra features.
- has -z option for compressed files,
- has many of ack features.
- it is fast
So:
ag -r -z your-pattern-goes-here folder
If not installed,
apt-get install silversearcher-ag (debian and friends)
yum install the_silver_searcher (fedora)
brew install the_silver_searcher (mac)
edited Mar 7 '15 at 12:58
answered Mar 2 '15 at 23:43
JJoao
6,9941827
6,9941827
1
I getag: truncated file: Success
as a result. Any other flag should I add?
– Yar
Sep 11 '17 at 21:10
add a comment |
1
I getag: truncated file: Success
as a result. Any other flag should I add?
– Yar
Sep 11 '17 at 21:10
1
1
I get
ag: truncated file: Success
as a result. Any other flag should I add?– Yar
Sep 11 '17 at 21:10
I get
ag: truncated file: Success
as a result. Any other flag should I add?– Yar
Sep 11 '17 at 21:10
add a comment |
up vote
4
down vote
Recursion alone is easy:
-r, --recursive
Read all files under each directory, recursively, following
symbolic links only if they are on the command line. This is
equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
However, for compressed files you need something like:
shopt globstar
for file in /path/to/directory/**/*gz; do zcat ""$file" | grep pattern; done
path/to/directory
should be the parent directory that contains the subdirectories for each day.
zgrep
is the obvious answer but, unfortunately, it does not support the -r
flag. From man zgrep
:
These grep options will cause zgrep to terminate with an error code: (-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*).
add a comment |
up vote
4
down vote
Recursion alone is easy:
-r, --recursive
Read all files under each directory, recursively, following
symbolic links only if they are on the command line. This is
equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
However, for compressed files you need something like:
shopt globstar
for file in /path/to/directory/**/*gz; do zcat ""$file" | grep pattern; done
path/to/directory
should be the parent directory that contains the subdirectories for each day.
zgrep
is the obvious answer but, unfortunately, it does not support the -r
flag. From man zgrep
:
These grep options will cause zgrep to terminate with an error code: (-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*).
add a comment |
up vote
4
down vote
up vote
4
down vote
Recursion alone is easy:
-r, --recursive
Read all files under each directory, recursively, following
symbolic links only if they are on the command line. This is
equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
However, for compressed files you need something like:
shopt globstar
for file in /path/to/directory/**/*gz; do zcat ""$file" | grep pattern; done
path/to/directory
should be the parent directory that contains the subdirectories for each day.
zgrep
is the obvious answer but, unfortunately, it does not support the -r
flag. From man zgrep
:
These grep options will cause zgrep to terminate with an error code: (-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*).
Recursion alone is easy:
-r, --recursive
Read all files under each directory, recursively, following
symbolic links only if they are on the command line. This is
equivalent to the -d recurse option.
-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.
However, for compressed files you need something like:
shopt globstar
for file in /path/to/directory/**/*gz; do zcat ""$file" | grep pattern; done
path/to/directory
should be the parent directory that contains the subdirectories for each day.
zgrep
is the obvious answer but, unfortunately, it does not support the -r
flag. From man zgrep
:
These grep options will cause zgrep to terminate with an error code: (-[drRzZ]|--di*|--exc*|--inc*|--rec*|--nu*).
edited Mar 3 '15 at 0:19
answered Mar 2 '15 at 16:14
terdon♦
127k31245421
127k31245421
add a comment |
add a comment |
up vote
3
down vote
If your system has zgrep, you can simply
zgrep -irs your-pattern-goes-here the-folder-to-search-goes-here/
If your system does not have zgrep, you can use the find command to run zcat and grep against each file like so:
find the-folder-to-search-goes-here/ -name '*.gz'
-exec sh -c 'echo "Searching {}" ; zcat "{}" | grep your-pattern-goes-here ' ;
Forgive me greeness on this... the files to be searched through are a couple of layers deep. ~/gmvault-db/db/2015-02 contains a folder for each month archived, and then underneath that the .gz files for that month are stored. If I'm search for .mil within that whole tree, is that what I would do? find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:28
1
That's fine - the "r" in -irs will cause zgrep to search recursively. The find command operates recursively by default, so any file which ends in .gz will be zcatted and passed into grep. (and the {} will be expanded to the relative path of the file which is about to be searched). So when you get a hit, it will be preceded bySearching ~/gmvault-db/db/2015-02/03/whatever.gz
– Nate from Kalamazoo
Mar 2 '15 at 16:29
Here's what I get back: find: "paths must precede expression: -exec" Here's the command I used: find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:36
take out the backslash between the '*.gz' and the -exec.
– Nate from Kalamazoo
Mar 2 '15 at 16:37
4
zgrep
won't take the-r
flag for some reason. That's mention inman zgrep
(also see my answer).
– terdon♦
Mar 2 '15 at 17:12
|
show 2 more comments
up vote
3
down vote
If your system has zgrep, you can simply
zgrep -irs your-pattern-goes-here the-folder-to-search-goes-here/
If your system does not have zgrep, you can use the find command to run zcat and grep against each file like so:
find the-folder-to-search-goes-here/ -name '*.gz'
-exec sh -c 'echo "Searching {}" ; zcat "{}" | grep your-pattern-goes-here ' ;
Forgive me greeness on this... the files to be searched through are a couple of layers deep. ~/gmvault-db/db/2015-02 contains a folder for each month archived, and then underneath that the .gz files for that month are stored. If I'm search for .mil within that whole tree, is that what I would do? find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:28
1
That's fine - the "r" in -irs will cause zgrep to search recursively. The find command operates recursively by default, so any file which ends in .gz will be zcatted and passed into grep. (and the {} will be expanded to the relative path of the file which is about to be searched). So when you get a hit, it will be preceded bySearching ~/gmvault-db/db/2015-02/03/whatever.gz
– Nate from Kalamazoo
Mar 2 '15 at 16:29
Here's what I get back: find: "paths must precede expression: -exec" Here's the command I used: find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:36
take out the backslash between the '*.gz' and the -exec.
– Nate from Kalamazoo
Mar 2 '15 at 16:37
4
zgrep
won't take the-r
flag for some reason. That's mention inman zgrep
(also see my answer).
– terdon♦
Mar 2 '15 at 17:12
|
show 2 more comments
up vote
3
down vote
up vote
3
down vote
If your system has zgrep, you can simply
zgrep -irs your-pattern-goes-here the-folder-to-search-goes-here/
If your system does not have zgrep, you can use the find command to run zcat and grep against each file like so:
find the-folder-to-search-goes-here/ -name '*.gz'
-exec sh -c 'echo "Searching {}" ; zcat "{}" | grep your-pattern-goes-here ' ;
If your system has zgrep, you can simply
zgrep -irs your-pattern-goes-here the-folder-to-search-goes-here/
If your system does not have zgrep, you can use the find command to run zcat and grep against each file like so:
find the-folder-to-search-goes-here/ -name '*.gz'
-exec sh -c 'echo "Searching {}" ; zcat "{}" | grep your-pattern-goes-here ' ;
answered Mar 2 '15 at 16:22
Nate from Kalamazoo
91657
91657
Forgive me greeness on this... the files to be searched through are a couple of layers deep. ~/gmvault-db/db/2015-02 contains a folder for each month archived, and then underneath that the .gz files for that month are stored. If I'm search for .mil within that whole tree, is that what I would do? find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:28
1
That's fine - the "r" in -irs will cause zgrep to search recursively. The find command operates recursively by default, so any file which ends in .gz will be zcatted and passed into grep. (and the {} will be expanded to the relative path of the file which is about to be searched). So when you get a hit, it will be preceded bySearching ~/gmvault-db/db/2015-02/03/whatever.gz
– Nate from Kalamazoo
Mar 2 '15 at 16:29
Here's what I get back: find: "paths must precede expression: -exec" Here's the command I used: find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:36
take out the backslash between the '*.gz' and the -exec.
– Nate from Kalamazoo
Mar 2 '15 at 16:37
4
zgrep
won't take the-r
flag for some reason. That's mention inman zgrep
(also see my answer).
– terdon♦
Mar 2 '15 at 17:12
|
show 2 more comments
Forgive me greeness on this... the files to be searched through are a couple of layers deep. ~/gmvault-db/db/2015-02 contains a folder for each month archived, and then underneath that the .gz files for that month are stored. If I'm search for .mil within that whole tree, is that what I would do? find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:28
1
That's fine - the "r" in -irs will cause zgrep to search recursively. The find command operates recursively by default, so any file which ends in .gz will be zcatted and passed into grep. (and the {} will be expanded to the relative path of the file which is about to be searched). So when you get a hit, it will be preceded bySearching ~/gmvault-db/db/2015-02/03/whatever.gz
– Nate from Kalamazoo
Mar 2 '15 at 16:29
Here's what I get back: find: "paths must precede expression: -exec" Here's the command I used: find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:36
take out the backslash between the '*.gz' and the -exec.
– Nate from Kalamazoo
Mar 2 '15 at 16:37
4
zgrep
won't take the-r
flag for some reason. That's mention inman zgrep
(also see my answer).
– terdon♦
Mar 2 '15 at 17:12
Forgive me greeness on this... the files to be searched through are a couple of layers deep. ~/gmvault-db/db/2015-02 contains a folder for each month archived, and then underneath that the .gz files for that month are stored. If I'm search for .mil within that whole tree, is that what I would do? find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:28
Forgive me greeness on this... the files to be searched through are a couple of layers deep. ~/gmvault-db/db/2015-02 contains a folder for each month archived, and then underneath that the .gz files for that month are stored. If I'm search for .mil within that whole tree, is that what I would do? find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:28
1
1
That's fine - the "r" in -irs will cause zgrep to search recursively. The find command operates recursively by default, so any file which ends in .gz will be zcatted and passed into grep. (and the {} will be expanded to the relative path of the file which is about to be searched). So when you get a hit, it will be preceded by
Searching ~/gmvault-db/db/2015-02/03/whatever.gz
– Nate from Kalamazoo
Mar 2 '15 at 16:29
That's fine - the "r" in -irs will cause zgrep to search recursively. The find command operates recursively by default, so any file which ends in .gz will be zcatted and passed into grep. (and the {} will be expanded to the relative path of the file which is about to be searched). So when you get a hit, it will be preceded by
Searching ~/gmvault-db/db/2015-02/03/whatever.gz
– Nate from Kalamazoo
Mar 2 '15 at 16:29
Here's what I get back: find: "paths must precede expression: -exec" Here's the command I used: find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:36
Here's what I get back: find: "paths must precede expression: -exec" Here's the command I used: find ~/gmvault-db/db/ -name '*.gz' -exec sh -c 'echo "Searching {}" ; zcat "{}" | grep .mil ' ;
– Kendor
Mar 2 '15 at 16:36
take out the backslash between the '*.gz' and the -exec.
– Nate from Kalamazoo
Mar 2 '15 at 16:37
take out the backslash between the '*.gz' and the -exec.
– Nate from Kalamazoo
Mar 2 '15 at 16:37
4
4
zgrep
won't take the -r
flag for some reason. That's mention in man zgrep
(also see my answer).– terdon♦
Mar 2 '15 at 17:12
zgrep
won't take the -r
flag for some reason. That's mention in man zgrep
(also see my answer).– terdon♦
Mar 2 '15 at 17:12
|
show 2 more comments
up vote
0
down vote
xzgrep -l "string" ./*/*.eml.gz
xzgrep is a derivative of the zgrep utils (less /bin/xzgrep)
From the Man page:
xzgrep invokes grep(1) on files which may be either uncompressed or compressed with xz(1), lzma(1), gzip(1), bzip2(1), or lzop(1). All
options specified are passed directly to grep(1).
-l print the matching file name
-R for recursion will not work as it's specifically prohibited in the script, however simple shell globbing should get us there
./*/*.eml.gz
from a relative path where ./today/sample.eml.gz, match on all instances of that are one level below our relative position in the shell, that ends with ".eml.gz"
add a comment |
up vote
0
down vote
xzgrep -l "string" ./*/*.eml.gz
xzgrep is a derivative of the zgrep utils (less /bin/xzgrep)
From the Man page:
xzgrep invokes grep(1) on files which may be either uncompressed or compressed with xz(1), lzma(1), gzip(1), bzip2(1), or lzop(1). All
options specified are passed directly to grep(1).
-l print the matching file name
-R for recursion will not work as it's specifically prohibited in the script, however simple shell globbing should get us there
./*/*.eml.gz
from a relative path where ./today/sample.eml.gz, match on all instances of that are one level below our relative position in the shell, that ends with ".eml.gz"
add a comment |
up vote
0
down vote
up vote
0
down vote
xzgrep -l "string" ./*/*.eml.gz
xzgrep is a derivative of the zgrep utils (less /bin/xzgrep)
From the Man page:
xzgrep invokes grep(1) on files which may be either uncompressed or compressed with xz(1), lzma(1), gzip(1), bzip2(1), or lzop(1). All
options specified are passed directly to grep(1).
-l print the matching file name
-R for recursion will not work as it's specifically prohibited in the script, however simple shell globbing should get us there
./*/*.eml.gz
from a relative path where ./today/sample.eml.gz, match on all instances of that are one level below our relative position in the shell, that ends with ".eml.gz"
xzgrep -l "string" ./*/*.eml.gz
xzgrep is a derivative of the zgrep utils (less /bin/xzgrep)
From the Man page:
xzgrep invokes grep(1) on files which may be either uncompressed or compressed with xz(1), lzma(1), gzip(1), bzip2(1), or lzop(1). All
options specified are passed directly to grep(1).
-l print the matching file name
-R for recursion will not work as it's specifically prohibited in the script, however simple shell globbing should get us there
./*/*.eml.gz
from a relative path where ./today/sample.eml.gz, match on all instances of that are one level below our relative position in the shell, that ends with ".eml.gz"
answered Nov 13 '17 at 18:33
John
449211
449211
add a comment |
add a comment |
protected by Anthon Apr 22 '16 at 10:30
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
13
use
zgrep
:zgrep - search possibly compressed files for a regular expression
– Arkadiusz Drabczyk
Mar 2 '15 at 16:10