xargs interface for tar balls











up vote
0
down vote

favorite












I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.



Here it is, xargs-tar:



#!/bin/bash

TAR_FILE="$1"
shift

TMP_ROOT="/dev/shm" # ...or /tmp
TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"
UNTAR_DIR="$TMP_DIR/untar"
FILE_LIST="$TMP_DIR/files-list"
EXEC_FILE="$TMP_DIR/exec-file"

mkdir -p "$UNTAR_DIR"
mkfifo "$FILE_LIST"

(
# single quotes for user's command and args
for i in "$@"; do
echo -n "'$i' "
done
echo '"$@"'
echo 'rm "$@"'
) > "$EXEC_FILE"
chmod u+x "$EXEC_FILE"

# Background untar. Write file list (zero terminated, no directories) to named
# pipe. The output is one line delayed to make sure we print only finished file
# names.
(
tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE"
| awk 'BEGIN {last = "";}
!//$/ {if (last != "") print last; last=$0;}
END {if (last != "") print last;}'
| tr 'n' ''
) > "$FILE_LIST" &

cd "$UNTAR_DIR"
xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"

rm -rf "$TMP_DIR"


Example usage:



./xargs-tar  palemoon.tar.bz2  wc -l
546 palemoon/libsoftokn3.so
1437 palemoon/libnss3.so
[...]
267 palemoon/libnssutil3.so
220 palemoon/libsmime3.so
6 palemoon/defaults/pref/channel-prefs.js
379727 total


which should be equivalent (but faster and less disk-usage) to:



tar -xf palemoon.tar.bz2
find palemoon -type f -print0 | xargs -0 wc -l
rm -rf palemoon


Of course my xargs-tar prototype needs a lot of improvements, like




  • interface to limit max temp space (regarding available memory)

  • pausing untar when the consumer is too slow

  • error handling (duplicate files, whatever ...)

  • supporting other archive formats, not only tar

  • etc.


That's why I'm thinking about starting a serious project, implementation in C.



My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?



I know tarfs, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.










share|improve this question




















  • 2




    Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
    – Romeo Ninov
    May 13 '17 at 4:29










  • I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
    – Satō Katsura
    May 13 '17 at 5:31










  • @SatoKatsura, they didn't count the files, but the lines in each file.
    – ilkkachu
    May 13 '17 at 10:35










  • @ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
    – Satō Katsura
    May 13 '17 at 14:17






  • 2




    Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
    – Gilles
    May 13 '17 at 22:26















up vote
0
down vote

favorite












I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.



Here it is, xargs-tar:



#!/bin/bash

TAR_FILE="$1"
shift

TMP_ROOT="/dev/shm" # ...or /tmp
TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"
UNTAR_DIR="$TMP_DIR/untar"
FILE_LIST="$TMP_DIR/files-list"
EXEC_FILE="$TMP_DIR/exec-file"

mkdir -p "$UNTAR_DIR"
mkfifo "$FILE_LIST"

(
# single quotes for user's command and args
for i in "$@"; do
echo -n "'$i' "
done
echo '"$@"'
echo 'rm "$@"'
) > "$EXEC_FILE"
chmod u+x "$EXEC_FILE"

# Background untar. Write file list (zero terminated, no directories) to named
# pipe. The output is one line delayed to make sure we print only finished file
# names.
(
tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE"
| awk 'BEGIN {last = "";}
!//$/ {if (last != "") print last; last=$0;}
END {if (last != "") print last;}'
| tr 'n' ''
) > "$FILE_LIST" &

cd "$UNTAR_DIR"
xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"

rm -rf "$TMP_DIR"


Example usage:



./xargs-tar  palemoon.tar.bz2  wc -l
546 palemoon/libsoftokn3.so
1437 palemoon/libnss3.so
[...]
267 palemoon/libnssutil3.so
220 palemoon/libsmime3.so
6 palemoon/defaults/pref/channel-prefs.js
379727 total


which should be equivalent (but faster and less disk-usage) to:



tar -xf palemoon.tar.bz2
find palemoon -type f -print0 | xargs -0 wc -l
rm -rf palemoon


Of course my xargs-tar prototype needs a lot of improvements, like




  • interface to limit max temp space (regarding available memory)

  • pausing untar when the consumer is too slow

  • error handling (duplicate files, whatever ...)

  • supporting other archive formats, not only tar

  • etc.


That's why I'm thinking about starting a serious project, implementation in C.



My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?



I know tarfs, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.










share|improve this question




















  • 2




    Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
    – Romeo Ninov
    May 13 '17 at 4:29










  • I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
    – Satō Katsura
    May 13 '17 at 5:31










  • @SatoKatsura, they didn't count the files, but the lines in each file.
    – ilkkachu
    May 13 '17 at 10:35










  • @ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
    – Satō Katsura
    May 13 '17 at 14:17






  • 2




    Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
    – Gilles
    May 13 '17 at 22:26













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.



Here it is, xargs-tar:



#!/bin/bash

TAR_FILE="$1"
shift

TMP_ROOT="/dev/shm" # ...or /tmp
TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"
UNTAR_DIR="$TMP_DIR/untar"
FILE_LIST="$TMP_DIR/files-list"
EXEC_FILE="$TMP_DIR/exec-file"

mkdir -p "$UNTAR_DIR"
mkfifo "$FILE_LIST"

(
# single quotes for user's command and args
for i in "$@"; do
echo -n "'$i' "
done
echo '"$@"'
echo 'rm "$@"'
) > "$EXEC_FILE"
chmod u+x "$EXEC_FILE"

# Background untar. Write file list (zero terminated, no directories) to named
# pipe. The output is one line delayed to make sure we print only finished file
# names.
(
tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE"
| awk 'BEGIN {last = "";}
!//$/ {if (last != "") print last; last=$0;}
END {if (last != "") print last;}'
| tr 'n' ''
) > "$FILE_LIST" &

cd "$UNTAR_DIR"
xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"

rm -rf "$TMP_DIR"


Example usage:



./xargs-tar  palemoon.tar.bz2  wc -l
546 palemoon/libsoftokn3.so
1437 palemoon/libnss3.so
[...]
267 palemoon/libnssutil3.so
220 palemoon/libsmime3.so
6 palemoon/defaults/pref/channel-prefs.js
379727 total


which should be equivalent (but faster and less disk-usage) to:



tar -xf palemoon.tar.bz2
find palemoon -type f -print0 | xargs -0 wc -l
rm -rf palemoon


Of course my xargs-tar prototype needs a lot of improvements, like




  • interface to limit max temp space (regarding available memory)

  • pausing untar when the consumer is too slow

  • error handling (duplicate files, whatever ...)

  • supporting other archive formats, not only tar

  • etc.


That's why I'm thinking about starting a serious project, implementation in C.



My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?



I know tarfs, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.










share|improve this question















I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.



Here it is, xargs-tar:



#!/bin/bash

TAR_FILE="$1"
shift

TMP_ROOT="/dev/shm" # ...or /tmp
TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"
UNTAR_DIR="$TMP_DIR/untar"
FILE_LIST="$TMP_DIR/files-list"
EXEC_FILE="$TMP_DIR/exec-file"

mkdir -p "$UNTAR_DIR"
mkfifo "$FILE_LIST"

(
# single quotes for user's command and args
for i in "$@"; do
echo -n "'$i' "
done
echo '"$@"'
echo 'rm "$@"'
) > "$EXEC_FILE"
chmod u+x "$EXEC_FILE"

# Background untar. Write file list (zero terminated, no directories) to named
# pipe. The output is one line delayed to make sure we print only finished file
# names.
(
tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE"
| awk 'BEGIN {last = "";}
!//$/ {if (last != "") print last; last=$0;}
END {if (last != "") print last;}'
| tr 'n' ''
) > "$FILE_LIST" &

cd "$UNTAR_DIR"
xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"

rm -rf "$TMP_DIR"


Example usage:



./xargs-tar  palemoon.tar.bz2  wc -l
546 palemoon/libsoftokn3.so
1437 palemoon/libnss3.so
[...]
267 palemoon/libnssutil3.so
220 palemoon/libsmime3.so
6 palemoon/defaults/pref/channel-prefs.js
379727 total


which should be equivalent (but faster and less disk-usage) to:



tar -xf palemoon.tar.bz2
find palemoon -type f -print0 | xargs -0 wc -l
rm -rf palemoon


Of course my xargs-tar prototype needs a lot of improvements, like




  • interface to limit max temp space (regarding available memory)

  • pausing untar when the consumer is too slow

  • error handling (duplicate files, whatever ...)

  • supporting other archive formats, not only tar

  • etc.


That's why I'm thinking about starting a serious project, implementation in C.



My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?



I know tarfs, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.







shell find pipe tar xargs






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









Rui F Ribeiro

38.6k1479128




38.6k1479128










asked May 13 '17 at 0:59









rudimeier

5,3871732




5,3871732








  • 2




    Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
    – Romeo Ninov
    May 13 '17 at 4:29










  • I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
    – Satō Katsura
    May 13 '17 at 5:31










  • @SatoKatsura, they didn't count the files, but the lines in each file.
    – ilkkachu
    May 13 '17 at 10:35










  • @ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
    – Satō Katsura
    May 13 '17 at 14:17






  • 2




    Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
    – Gilles
    May 13 '17 at 22:26














  • 2




    Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
    – Romeo Ninov
    May 13 '17 at 4:29










  • I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
    – Satō Katsura
    May 13 '17 at 5:31










  • @SatoKatsura, they didn't count the files, but the lines in each file.
    – ilkkachu
    May 13 '17 at 10:35










  • @ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
    – Satō Katsura
    May 13 '17 at 14:17






  • 2




    Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
    – Gilles
    May 13 '17 at 22:26








2




2




Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29




Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29












I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
– Satō Katsura
May 13 '17 at 5:31




I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
– Satō Katsura
May 13 '17 at 5:31












@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35




@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35












@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17




@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17




2




2




Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26




Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26















active

oldest

votes











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f364778%2fxargs-interface-for-tar-balls%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f364778%2fxargs-interface-for-tar-balls%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

サソリ

広島県道265号伴広島線

Setup Asymptote in Texstudio