xargs interface for tar balls

up vote
0
down vote

favorite

I want to have an xargs-like interface to operate transparently on large tar balls without unpacking the whole archive at once. I've made already this shell script prototype xargs-tar which unpacks a tar ball into RAM-disk (/dev/shm), processes all files when they appear, and deletes all processed files immediately.

Here it is, xargs-tar:

#!/bin/bash



TAR_FILE="$1"

shift



TMP_ROOT="/dev/shm" # ...or /tmp

TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"

UNTAR_DIR="$TMP_DIR/untar"

FILE_LIST="$TMP_DIR/files-list"

EXEC_FILE="$TMP_DIR/exec-file"



mkdir -p "$UNTAR_DIR"

mkfifo "$FILE_LIST"



(

    # single quotes for user's command and args

    for i in "$@"; do

            echo -n "'$i' "

    done

    echo '"$@"'

    echo 'rm "$@"'

) > "$EXEC_FILE"

chmod u+x "$EXEC_FILE"



# Background untar. Write file list (zero terminated, no directories) to named

# pipe. The output is one line delayed to make sure we print only finished file

# names.

(

    tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE" 

            | awk 'BEGIN {last = "";}

                   !//$/ {if (last != "") print last; last=$0;}

                   END {if (last != "") print last;}' 

            | tr 'n' ''

) > "$FILE_LIST" &



cd "$UNTAR_DIR"

xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"



rm -rf "$TMP_DIR"

Example usage:

./xargs-tar  palemoon.tar.bz2  wc -l

      546 palemoon/libsoftokn3.so

     1437 palemoon/libnss3.so

[...]

      267 palemoon/libnssutil3.so

      220 palemoon/libsmime3.so

        6 palemoon/defaults/pref/channel-prefs.js

   379727 total

which should be equivalent (but faster and less disk-usage) to:

tar -xf palemoon.tar.bz2

find palemoon -type f -print0 | xargs -0 wc -l

rm -rf palemoon

Of course my xargs-tar prototype needs a lot of improvements, like

interface to limit max temp space (regarding available memory)

pausing untar when the consumer is too slow

error handling (duplicate files, whatever ...)

supporting other archive formats, not only tar

etc.

That's why I'm thinking about starting a serious project, implementation in C.

My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?

I know tarfs, really useful but not exactly what I want. I want fast, simple piped command lines, portable implementation. The key is: The un-tarred files are processed while they are still cached, and then deleted immediately.

edited yesterday

Rui F Ribeiro

38.6k1479128

asked May 13 '17 at 0:59

rudimeier

5,3871732

2

Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29

I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
– Satō Katsura
May 13 '17 at 5:31

@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35

@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17

2

Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26

|
show 1 more comment

up vote
0
down vote

favorite

Here it is, xargs-tar:

#!/bin/bash



TAR_FILE="$1"

shift



TMP_ROOT="/dev/shm" # ...or /tmp

TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"

UNTAR_DIR="$TMP_DIR/untar"

FILE_LIST="$TMP_DIR/files-list"

EXEC_FILE="$TMP_DIR/exec-file"



mkdir -p "$UNTAR_DIR"

mkfifo "$FILE_LIST"



(

    # single quotes for user's command and args

    for i in "$@"; do

            echo -n "'$i' "

    done

    echo '"$@"'

    echo 'rm "$@"'

) > "$EXEC_FILE"

chmod u+x "$EXEC_FILE"



# Background untar. Write file list (zero terminated, no directories) to named

# pipe. The output is one line delayed to make sure we print only finished file

# names.

(

    tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE" 

            | awk 'BEGIN {last = "";}

                   !//$/ {if (last != "") print last; last=$0;}

                   END {if (last != "") print last;}' 

            | tr 'n' ''

) > "$FILE_LIST" &



cd "$UNTAR_DIR"

xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"



rm -rf "$TMP_DIR"

Example usage:

./xargs-tar  palemoon.tar.bz2  wc -l

      546 palemoon/libsoftokn3.so

     1437 palemoon/libnss3.so

[...]

      267 palemoon/libnssutil3.so

      220 palemoon/libsmime3.so

        6 palemoon/defaults/pref/channel-prefs.js

   379727 total

which should be equivalent (but faster and less disk-usage) to:

tar -xf palemoon.tar.bz2

find palemoon -type f -print0 | xargs -0 wc -l

rm -rf palemoon

Of course my xargs-tar prototype needs a lot of improvements, like

interface to limit max temp space (regarding available memory)

pausing untar when the consumer is too slow

error handling (duplicate files, whatever ...)

supporting other archive formats, not only tar

etc.

That's why I'm thinking about starting a serious project, implementation in C.

My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?

edited yesterday

Rui F Ribeiro

38.6k1479128

asked May 13 '17 at 0:59

rudimeier

5,3871732

2

Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29

I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
– Satō Katsura
May 13 '17 at 5:31

@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35

@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17

2

Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26

|
show 1 more comment

up vote
0
down vote

favorite

Here it is, xargs-tar:

#!/bin/bash



TAR_FILE="$1"

shift



TMP_ROOT="/dev/shm" # ...or /tmp

TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"

UNTAR_DIR="$TMP_DIR/untar"

FILE_LIST="$TMP_DIR/files-list"

EXEC_FILE="$TMP_DIR/exec-file"



mkdir -p "$UNTAR_DIR"

mkfifo "$FILE_LIST"



(

    # single quotes for user's command and args

    for i in "$@"; do

            echo -n "'$i' "

    done

    echo '"$@"'

    echo 'rm "$@"'

) > "$EXEC_FILE"

chmod u+x "$EXEC_FILE"



# Background untar. Write file list (zero terminated, no directories) to named

# pipe. The output is one line delayed to make sure we print only finished file

# names.

(

    tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE" 

            | awk 'BEGIN {last = "";}

                   !//$/ {if (last != "") print last; last=$0;}

                   END {if (last != "") print last;}' 

            | tr 'n' ''

) > "$FILE_LIST" &



cd "$UNTAR_DIR"

xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"



rm -rf "$TMP_DIR"

Example usage:

./xargs-tar  palemoon.tar.bz2  wc -l

      546 palemoon/libsoftokn3.so

     1437 palemoon/libnss3.so

[...]

      267 palemoon/libnssutil3.so

      220 palemoon/libsmime3.so

        6 palemoon/defaults/pref/channel-prefs.js

   379727 total

which should be equivalent (but faster and less disk-usage) to:

tar -xf palemoon.tar.bz2

find palemoon -type f -print0 | xargs -0 wc -l

rm -rf palemoon

Of course my xargs-tar prototype needs a lot of improvements, like

interface to limit max temp space (regarding available memory)

pausing untar when the consumer is too slow

error handling (duplicate files, whatever ...)

supporting other archive formats, not only tar

etc.

That's why I'm thinking about starting a serious project, implementation in C.

My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?

edited yesterday

Rui F Ribeiro

38.6k1479128

asked May 13 '17 at 0:59

rudimeier

5,3871732

Here it is, xargs-tar:

#!/bin/bash



TAR_FILE="$1"

shift



TMP_ROOT="/dev/shm" # ...or /tmp

TMP_DIR="$(mktemp -d "$TMP_ROOT/xargs-tar-XXXXXX")"

UNTAR_DIR="$TMP_DIR/untar"

FILE_LIST="$TMP_DIR/files-list"

EXEC_FILE="$TMP_DIR/exec-file"



mkdir -p "$UNTAR_DIR"

mkfifo "$FILE_LIST"



(

    # single quotes for user's command and args

    for i in "$@"; do

            echo -n "'$i' "

    done

    echo '"$@"'

    echo 'rm "$@"'

) > "$EXEC_FILE"

chmod u+x "$EXEC_FILE"



# Background untar. Write file list (zero terminated, no directories) to named

# pipe. The output is one line delayed to make sure we print only finished file

# names.

(

    tar -v -C "$UNTAR_DIR" -xf "$TAR_FILE" 

            | awk 'BEGIN {last = "";}

                   !//$/ {if (last != "") print last; last=$0;}

                   END {if (last != "") print last;}' 

            | tr 'n' ''

) > "$FILE_LIST" &



cd "$UNTAR_DIR"

xargs --null -r sh "$EXEC_FILE" < "$FILE_LIST"



rm -rf "$TMP_DIR"

Example usage:

./xargs-tar  palemoon.tar.bz2  wc -l

      546 palemoon/libsoftokn3.so

     1437 palemoon/libnss3.so

[...]

      267 palemoon/libnssutil3.so

      220 palemoon/libsmime3.so

        6 palemoon/defaults/pref/channel-prefs.js

   379727 total

which should be equivalent (but faster and less disk-usage) to:

tar -xf palemoon.tar.bz2

find palemoon -type f -print0 | xargs -0 wc -l

rm -rf palemoon

Of course my xargs-tar prototype needs a lot of improvements, like

interface to limit max temp space (regarding available memory)

pausing untar when the consumer is too slow

error handling (duplicate files, whatever ...)

supporting other archive formats, not only tar

etc.

That's why I'm thinking about starting a serious project, implementation in C.

My question is now: Does something like this already exist? Would other people find it useful too? Do I waste my time?

shell find pipe tar xargs

edited yesterday

Rui F Ribeiro

38.6k1479128

asked May 13 '17 at 0:59

rudimeier

5,3871732

edited yesterday

Rui F Ribeiro

38.6k1479128

asked May 13 '17 at 0:59

rudimeier

5,3871732

edited yesterday

Rui F Ribeiro

38.6k1479128

edited yesterday

Rui F Ribeiro

38.6k1479128

edited yesterday

Rui F Ribeiro

38.6k1479128

asked May 13 '17 at 0:59

rudimeier

5,3871732

asked May 13 '17 at 0:59

rudimeier

5,3871732

asked May 13 '17 at 0:59

rudimeier

5,3871732

2

Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29

I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
– Satō Katsura
May 13 '17 at 5:31

@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35

@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17

2

Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26

|
show 1 more comment

2

Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29

I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
– Satō Katsura
May 13 '17 at 5:31

@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35

@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17

2

Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26

Let me ask you one question: Do you do those tars every minute or more often? If you do them few/several times per day I do not see any reason to put effort in such activity. At the end 99% of the computers are loaded 1 or 2% only
– Romeo Ninov
May 13 '17 at 4:29

I thought about this for a while, and I haven't found even a single use case for it. Counting files in an archive? Easy: tar tJvf palemoon.tar.bz2 | grep -v /$ | wc -l (or ... | grep ^- | ..., depending on what you want to count and how you want to deal with symlinks). All in all, RAM is much better left to the OS to use than filled up with temporary caches for tar files.
– Satō Katsura
May 13 '17 at 5:31

@SatoKatsura, they didn't count the files, but the lines in each file.
– ilkkachu
May 13 '17 at 10:35

@ilkkachu It's still a one-time operation. Each file is read at most once. Caching would start to make sense once each file was read at least twice. Caching in RAM would start to make sense once each file was read dozens of times.
– Satō Katsura
May 13 '17 at 14:17

Rather than “here's one way, are there others?”, I think you should ask “how do I do this?” and post your method as an answer.
– Gilles
May 13 '17 at 22:26

|
show 1 more comment

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f364778%2fxargs-interface-for-tar-balls%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

xm5fz,yH4cpCt48tjJrkIUAUNMhjJPpza9 jVRuZGqWDjG,Gqv

搜尋此網誌

Sstrhsrtj