How can I get the compressed file size of all the lines returned by zgrep on a .gz file? [on hold]
up vote
0
down vote
favorite
If I grep something on a text file then pipe to wc -c, I can see the size of all the lines returned in bytes. How can I get the compressed file size of all the lines returned by zgrep on a .gz file?
For example, I have a file named a.gz:
zgrep abc a.gz | wc -c
bytes of abc in gz, 395714
ll *.gz gives me:
bytes of *.gz file, 113276
ll a (the uncompressed file) gives me:
bytes of a, 1501625
How can I find the compressed size of all the lines returned by zgrep abc a.gz? I've tried to pipe to wc -c above and it gives me the uncompressed size (since 395714 is bigger than 113276).
linux grep gzip wc
put on hold as unclear what you're asking by Jeff Schaller, JigglyNaga, Archemar, X Tian, thrig Nov 28 at 20:17
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
|
show 7 more comments
up vote
0
down vote
favorite
If I grep something on a text file then pipe to wc -c, I can see the size of all the lines returned in bytes. How can I get the compressed file size of all the lines returned by zgrep on a .gz file?
For example, I have a file named a.gz:
zgrep abc a.gz | wc -c
bytes of abc in gz, 395714
ll *.gz gives me:
bytes of *.gz file, 113276
ll a (the uncompressed file) gives me:
bytes of a, 1501625
How can I find the compressed size of all the lines returned by zgrep abc a.gz? I've tried to pipe to wc -c above and it gives me the uncompressed size (since 395714 is bigger than 113276).
linux grep gzip wc
put on hold as unclear what you're asking by Jeff Schaller, JigglyNaga, Archemar, X Tian, thrig Nov 28 at 20:17
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
1
The size of those lines as compressed in the original file, or re-compressed?
– Jeff Schaller
Nov 28 at 1:38
2
I'm not sure, but I think gzip compresses data in blocks, not line-by-line. Also, zgrep's output is the decompressed string (abc
in your example), not the compressed version.
– Jeff Schaller
Nov 28 at 1:49
3
Unless your original text is large enough to be a block size, I think it'll be compressed with data around it, potentially throwing off your calculations...?
– Jeff Schaller
Nov 28 at 1:51
1
That's a good point. It'll be hard to strip the data compressed around it. Also, the compressing algorithm might compress each of the same pattern differently depending on the data around it.
– kouichi
Nov 28 at 2:00
1
gzip doesn't know or care about lines. You can get a rough estimate of how much those lines add to the size by comparing the output ofwc -c a.gz
tozgrep -v pattern a.gz | gzip -c | wc -c
. Notice the-v
option tozgrep
.
– mosvy
Nov 28 at 3:46
|
show 7 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
If I grep something on a text file then pipe to wc -c, I can see the size of all the lines returned in bytes. How can I get the compressed file size of all the lines returned by zgrep on a .gz file?
For example, I have a file named a.gz:
zgrep abc a.gz | wc -c
bytes of abc in gz, 395714
ll *.gz gives me:
bytes of *.gz file, 113276
ll a (the uncompressed file) gives me:
bytes of a, 1501625
How can I find the compressed size of all the lines returned by zgrep abc a.gz? I've tried to pipe to wc -c above and it gives me the uncompressed size (since 395714 is bigger than 113276).
linux grep gzip wc
If I grep something on a text file then pipe to wc -c, I can see the size of all the lines returned in bytes. How can I get the compressed file size of all the lines returned by zgrep on a .gz file?
For example, I have a file named a.gz:
zgrep abc a.gz | wc -c
bytes of abc in gz, 395714
ll *.gz gives me:
bytes of *.gz file, 113276
ll a (the uncompressed file) gives me:
bytes of a, 1501625
How can I find the compressed size of all the lines returned by zgrep abc a.gz? I've tried to pipe to wc -c above and it gives me the uncompressed size (since 395714 is bigger than 113276).
linux grep gzip wc
linux grep gzip wc
edited Nov 28 at 1:37
Jeff Schaller
37k1052121
37k1052121
asked Nov 28 at 1:24
kouichi
338
338
put on hold as unclear what you're asking by Jeff Schaller, JigglyNaga, Archemar, X Tian, thrig Nov 28 at 20:17
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as unclear what you're asking by Jeff Schaller, JigglyNaga, Archemar, X Tian, thrig Nov 28 at 20:17
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
1
The size of those lines as compressed in the original file, or re-compressed?
– Jeff Schaller
Nov 28 at 1:38
2
I'm not sure, but I think gzip compresses data in blocks, not line-by-line. Also, zgrep's output is the decompressed string (abc
in your example), not the compressed version.
– Jeff Schaller
Nov 28 at 1:49
3
Unless your original text is large enough to be a block size, I think it'll be compressed with data around it, potentially throwing off your calculations...?
– Jeff Schaller
Nov 28 at 1:51
1
That's a good point. It'll be hard to strip the data compressed around it. Also, the compressing algorithm might compress each of the same pattern differently depending on the data around it.
– kouichi
Nov 28 at 2:00
1
gzip doesn't know or care about lines. You can get a rough estimate of how much those lines add to the size by comparing the output ofwc -c a.gz
tozgrep -v pattern a.gz | gzip -c | wc -c
. Notice the-v
option tozgrep
.
– mosvy
Nov 28 at 3:46
|
show 7 more comments
1
The size of those lines as compressed in the original file, or re-compressed?
– Jeff Schaller
Nov 28 at 1:38
2
I'm not sure, but I think gzip compresses data in blocks, not line-by-line. Also, zgrep's output is the decompressed string (abc
in your example), not the compressed version.
– Jeff Schaller
Nov 28 at 1:49
3
Unless your original text is large enough to be a block size, I think it'll be compressed with data around it, potentially throwing off your calculations...?
– Jeff Schaller
Nov 28 at 1:51
1
That's a good point. It'll be hard to strip the data compressed around it. Also, the compressing algorithm might compress each of the same pattern differently depending on the data around it.
– kouichi
Nov 28 at 2:00
1
gzip doesn't know or care about lines. You can get a rough estimate of how much those lines add to the size by comparing the output ofwc -c a.gz
tozgrep -v pattern a.gz | gzip -c | wc -c
. Notice the-v
option tozgrep
.
– mosvy
Nov 28 at 3:46
1
1
The size of those lines as compressed in the original file, or re-compressed?
– Jeff Schaller
Nov 28 at 1:38
The size of those lines as compressed in the original file, or re-compressed?
– Jeff Schaller
Nov 28 at 1:38
2
2
I'm not sure, but I think gzip compresses data in blocks, not line-by-line. Also, zgrep's output is the decompressed string (
abc
in your example), not the compressed version.– Jeff Schaller
Nov 28 at 1:49
I'm not sure, but I think gzip compresses data in blocks, not line-by-line. Also, zgrep's output is the decompressed string (
abc
in your example), not the compressed version.– Jeff Schaller
Nov 28 at 1:49
3
3
Unless your original text is large enough to be a block size, I think it'll be compressed with data around it, potentially throwing off your calculations...?
– Jeff Schaller
Nov 28 at 1:51
Unless your original text is large enough to be a block size, I think it'll be compressed with data around it, potentially throwing off your calculations...?
– Jeff Schaller
Nov 28 at 1:51
1
1
That's a good point. It'll be hard to strip the data compressed around it. Also, the compressing algorithm might compress each of the same pattern differently depending on the data around it.
– kouichi
Nov 28 at 2:00
That's a good point. It'll be hard to strip the data compressed around it. Also, the compressing algorithm might compress each of the same pattern differently depending on the data around it.
– kouichi
Nov 28 at 2:00
1
1
gzip doesn't know or care about lines. You can get a rough estimate of how much those lines add to the size by comparing the output of
wc -c a.gz
to zgrep -v pattern a.gz | gzip -c | wc -c
. Notice the -v
option to zgrep
.– mosvy
Nov 28 at 3:46
gzip doesn't know or care about lines. You can get a rough estimate of how much those lines add to the size by comparing the output of
wc -c a.gz
to zgrep -v pattern a.gz | gzip -c | wc -c
. Notice the -v
option to zgrep
.– mosvy
Nov 28 at 3:46
|
show 7 more comments
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
1
The size of those lines as compressed in the original file, or re-compressed?
– Jeff Schaller
Nov 28 at 1:38
2
I'm not sure, but I think gzip compresses data in blocks, not line-by-line. Also, zgrep's output is the decompressed string (
abc
in your example), not the compressed version.– Jeff Schaller
Nov 28 at 1:49
3
Unless your original text is large enough to be a block size, I think it'll be compressed with data around it, potentially throwing off your calculations...?
– Jeff Schaller
Nov 28 at 1:51
1
That's a good point. It'll be hard to strip the data compressed around it. Also, the compressing algorithm might compress each of the same pattern differently depending on the data around it.
– kouichi
Nov 28 at 2:00
1
gzip doesn't know or care about lines. You can get a rough estimate of how much those lines add to the size by comparing the output of
wc -c a.gz
tozgrep -v pattern a.gz | gzip -c | wc -c
. Notice the-v
option tozgrep
.– mosvy
Nov 28 at 3:46