xargs interface for tar balls
up vote
0
down vote
favorite
I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar
which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.
Here it is, xargs-tar
:
#!/bin/bash
TAR_FILE="$1"
shift
TMP_ROOT="/dev/shm" # ...or /tmp
TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"
UNTAR_DIR="$TMP_DIR/untar"
FILE_LIST="$TMP_DIR/files-list"
EXEC_FILE="$TMP_DIR/exec-file"
mkdir -p "$UNTAR_DIR"
mkfifo "$FILE_LIST"
(
# single quotes for user's command and args
for i in "$@"; do
echo -n "'$i' "
done
echo '"$@"'
echo 'rm "$@"'
) > "$EXEC_FILE"
chmod u+x "$EXEC_FILE"
# Background untar. Write file list (zero terminated, no directories) to named
# pipe. The output is one line delayed to make sure we print only finished file
# names.
(
tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE"
| awk 'BEGIN {last = "";}
!//$/ {if (last != "") print last; last=$0;}
END {if (last != "") print last;}'
| tr 'n' ''
) > "$FILE_LIST" &
cd "$UNTAR_DIR"
xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"
rm -rf "$TMP_DIR"
Example usage:
./xargs-tar palemoon.tar.bz2 wc -l
546 palemoon/libsoftokn3.so
1437 palemoon/libnss3.so
[...]
267 palemoon/libnssutil3.so
220 palemoon/libsmime3.so
6 palemoon/defaults/pref/channel-prefs.js
379727 total
which should be equivalent (but faster and less disk-usage) to:
tar -xf palemoon.tar.bz2
find palemoon -type f -print0 | xargs -0 wc -l
rm -rf palemoon
Of course my xargs-tar
prototype needs a lot of improvements, like
- interface to limit max temp space (regarding available memory)
- pausing untar when the consumer is too slow
- error handling (duplicate files, whatever ...)
- supporting other archive formats, not only tar
- etc.
That's why I'm thinking about starting a serious project, implementation in C.
My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?
I know tarfs
, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.
shell find pipe tar xargs
|
show 1 more comment
up vote
0
down vote
favorite
I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar
which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.
Here it is, xargs-tar
:
#!/bin/bash
TAR_FILE="$1"
shift
TMP_ROOT="/dev/shm" # ...or /tmp
TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"
UNTAR_DIR="$TMP_DIR/untar"
FILE_LIST="$TMP_DIR/files-list"
EXEC_FILE="$TMP_DIR/exec-file"
mkdir -p "$UNTAR_DIR"
mkfifo "$FILE_LIST"
(
# single quotes for user's command and args
for i in "$@"; do
echo -n "'$i' "
done
echo '"$@"'
echo 'rm "$@"'
) > "$EXEC_FILE"
chmod u+x "$EXEC_FILE"
# Background untar. Write file list (zero terminated, no directories) to named
# pipe. The output is one line delayed to make sure we print only finished file
# names.
(
tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE"
| awk 'BEGIN {last = "";}
!//$/ {if (last != "") print last; last=$0;}
END {if (last != "") print last;}'
| tr 'n' ''
) > "$FILE_LIST" &
cd "$UNTAR_DIR"
xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"
rm -rf "$TMP_DIR"
Example usage:
./xargs-tar palemoon.tar.bz2 wc -l
546 palemoon/libsoftokn3.so
1437 palemoon/libnss3.so
[...]
267 palemoon/libnssutil3.so
220 palemoon/libsmime3.so
6 palemoon/defaults/pref/channel-prefs.js
379727 total
which should be equivalent (but faster and less disk-usage) to:
tar -xf palemoon.tar.bz2
find palemoon -type f -print0 | xargs -0 wc -l
rm -rf palemoon
Of course my xargs-tar
prototype needs a lot of improvements, like
- interface to limit max temp space (regarding available memory)
- pausing untar when the consumer is too slow
- error handling (duplicate files, whatever ...)
- supporting other archive formats, not only tar
- etc.
That's why I'm thinking about starting a serious project, implementation in C.
My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?
I know tarfs
, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.
shell find pipe tar xargs
2
Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29
I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy:tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l
(or... | grep ^- | ...
, depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches fortar
files.
– Satō Katsura
May 13 '17 at 5:31
@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35
@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17
2
Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26
|
show 1 more comment
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar
which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.
Here it is, xargs-tar
:
#!/bin/bash
TAR_FILE="$1"
shift
TMP_ROOT="/dev/shm" # ...or /tmp
TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"
UNTAR_DIR="$TMP_DIR/untar"
FILE_LIST="$TMP_DIR/files-list"
EXEC_FILE="$TMP_DIR/exec-file"
mkdir -p "$UNTAR_DIR"
mkfifo "$FILE_LIST"
(
# single quotes for user's command and args
for i in "$@"; do
echo -n "'$i' "
done
echo '"$@"'
echo 'rm "$@"'
) > "$EXEC_FILE"
chmod u+x "$EXEC_FILE"
# Background untar. Write file list (zero terminated, no directories) to named
# pipe. The output is one line delayed to make sure we print only finished file
# names.
(
tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE"
| awk 'BEGIN {last = "";}
!//$/ {if (last != "") print last; last=$0;}
END {if (last != "") print last;}'
| tr 'n' ''
) > "$FILE_LIST" &
cd "$UNTAR_DIR"
xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"
rm -rf "$TMP_DIR"
Example usage:
./xargs-tar palemoon.tar.bz2 wc -l
546 palemoon/libsoftokn3.so
1437 palemoon/libnss3.so
[...]
267 palemoon/libnssutil3.so
220 palemoon/libsmime3.so
6 palemoon/defaults/pref/channel-prefs.js
379727 total
which should be equivalent (but faster and less disk-usage) to:
tar -xf palemoon.tar.bz2
find palemoon -type f -print0 | xargs -0 wc -l
rm -rf palemoon
Of course my xargs-tar
prototype needs a lot of improvements, like
- interface to limit max temp space (regarding available memory)
- pausing untar when the consumer is too slow
- error handling (duplicate files, whatever ...)
- supporting other archive formats, not only tar
- etc.
That's why I'm thinking about starting a serious project, implementation in C.
My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?
I know tarfs
, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.
shell find pipe tar xargs
I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar
which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.
Here it is, xargs-tar
:
#!/bin/bash
TAR_FILE="$1"
shift
TMP_ROOT="/dev/shm" # ...or /tmp
TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"
UNTAR_DIR="$TMP_DIR/untar"
FILE_LIST="$TMP_DIR/files-list"
EXEC_FILE="$TMP_DIR/exec-file"
mkdir -p "$UNTAR_DIR"
mkfifo "$FILE_LIST"
(
# single quotes for user's command and args
for i in "$@"; do
echo -n "'$i' "
done
echo '"$@"'
echo 'rm "$@"'
) > "$EXEC_FILE"
chmod u+x "$EXEC_FILE"
# Background untar. Write file list (zero terminated, no directories) to named
# pipe. The output is one line delayed to make sure we print only finished file
# names.
(
tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE"
| awk 'BEGIN {last = "";}
!//$/ {if (last != "") print last; last=$0;}
END {if (last != "") print last;}'
| tr 'n' ''
) > "$FILE_LIST" &
cd "$UNTAR_DIR"
xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"
rm -rf "$TMP_DIR"
Example usage:
./xargs-tar palemoon.tar.bz2 wc -l
546 palemoon/libsoftokn3.so
1437 palemoon/libnss3.so
[...]
267 palemoon/libnssutil3.so
220 palemoon/libsmime3.so
6 palemoon/defaults/pref/channel-prefs.js
379727 total
which should be equivalent (but faster and less disk-usage) to:
tar -xf palemoon.tar.bz2
find palemoon -type f -print0 | xargs -0 wc -l
rm -rf palemoon
Of course my xargs-tar
prototype needs a lot of improvements, like
- interface to limit max temp space (regarding available memory)
- pausing untar when the consumer is too slow
- error handling (duplicate files, whatever ...)
- supporting other archive formats, not only tar
- etc.
That's why I'm thinking about starting a serious project, implementation in C.
My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?
I know tarfs
, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.
shell find pipe tar xargs
shell find pipe tar xargs
edited yesterday
Rui F Ribeiro
38.6k1479128
38.6k1479128
asked May 13 '17 at 0:59
rudimeier
5,3871732
5,3871732
2
Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29
I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy:tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l
(or... | grep ^- | ...
, depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches fortar
files.
– Satō Katsura
May 13 '17 at 5:31
@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35
@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17
2
Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26
|
show 1 more comment
2
Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29
I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy:tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l
(or... | grep ^- | ...
, depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches fortar
files.
– Satō Katsura
May 13 '17 at 5:31
@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35
@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17
2
Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26
2
2
Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29
Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29
I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy:
tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l
(or ... | grep ^- | ...
, depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar
files.– Satō Katsura
May 13 '17 at 5:31
I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy:
tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l
(or ... | grep ^- | ...
, depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar
files.– Satō Katsura
May 13 '17 at 5:31
@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35
@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35
@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17
@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17
2
2
Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26
Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26
|
show 1 more comment
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f364778%2fxargs-interface-for-tar-balls%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f364778%2fxargs-interface-for-tar-balls%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29
I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy:
tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l
(or... | grep ^- | ...
, depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches fortar
files.– Satō Katsura
May 13 '17 at 5:31
@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35
@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17
2
Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26