compare and match multiple files by pattern
up vote
2
down vote
favorite
I need to work on these text files. (fields are separated by commas)
$ cat File1.seed
389,0,
390,1,
391,0,
392,0,
393,0,SEED
394,0,
395,1,
$ cat File2.seed
223,0,
224,1,
225,0,
226,1,
227,0,SEED
228,1,
$ cat File3.seed
55,0,
56,0,SEED
57,1,
58,0,
59,1,
60,0,
and the desired output would be:
389,0,,223,0,,,,,0
390,1,,224,1,,,,,2
391,0,,225,0,,,,,0
392,0,,226,1,,55,0,,1
393,0,SEED,227,0,SEED,56,0,SEED,0
394,0,,228,1,,57,1,,2
395,1,,,,,58,0,,1
,,,,,,59,1,,1
,,,,,,60,0,,0
As you can see the files are aligned by the pattern "SEED", and then sum all the 2nd columns of the files horizontally adding the result in a last column.
text-processing files scripting
add a comment |
up vote
2
down vote
favorite
I need to work on these text files. (fields are separated by commas)
$ cat File1.seed
389,0,
390,1,
391,0,
392,0,
393,0,SEED
394,0,
395,1,
$ cat File2.seed
223,0,
224,1,
225,0,
226,1,
227,0,SEED
228,1,
$ cat File3.seed
55,0,
56,0,SEED
57,1,
58,0,
59,1,
60,0,
and the desired output would be:
389,0,,223,0,,,,,0
390,1,,224,1,,,,,2
391,0,,225,0,,,,,0
392,0,,226,1,,55,0,,1
393,0,SEED,227,0,SEED,56,0,SEED,0
394,0,,228,1,,57,1,,2
395,1,,,,,58,0,,1
,,,,,,59,1,,1
,,,,,,60,0,,0
As you can see the files are aligned by the pattern "SEED", and then sum all the 2nd columns of the files horizontally adding the result in a last column.
text-processing files scripting
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I need to work on these text files. (fields are separated by commas)
$ cat File1.seed
389,0,
390,1,
391,0,
392,0,
393,0,SEED
394,0,
395,1,
$ cat File2.seed
223,0,
224,1,
225,0,
226,1,
227,0,SEED
228,1,
$ cat File3.seed
55,0,
56,0,SEED
57,1,
58,0,
59,1,
60,0,
and the desired output would be:
389,0,,223,0,,,,,0
390,1,,224,1,,,,,2
391,0,,225,0,,,,,0
392,0,,226,1,,55,0,,1
393,0,SEED,227,0,SEED,56,0,SEED,0
394,0,,228,1,,57,1,,2
395,1,,,,,58,0,,1
,,,,,,59,1,,1
,,,,,,60,0,,0
As you can see the files are aligned by the pattern "SEED", and then sum all the 2nd columns of the files horizontally adding the result in a last column.
text-processing files scripting
I need to work on these text files. (fields are separated by commas)
$ cat File1.seed
389,0,
390,1,
391,0,
392,0,
393,0,SEED
394,0,
395,1,
$ cat File2.seed
223,0,
224,1,
225,0,
226,1,
227,0,SEED
228,1,
$ cat File3.seed
55,0,
56,0,SEED
57,1,
58,0,
59,1,
60,0,
and the desired output would be:
389,0,,223,0,,,,,0
390,1,,224,1,,,,,2
391,0,,225,0,,,,,0
392,0,,226,1,,55,0,,1
393,0,SEED,227,0,SEED,56,0,SEED,0
394,0,,228,1,,57,1,,2
395,1,,,,,58,0,,1
,,,,,,59,1,,1
,,,,,,60,0,,0
As you can see the files are aligned by the pattern "SEED", and then sum all the 2nd columns of the files horizontally adding the result in a last column.
text-processing files scripting
text-processing files scripting
edited Nov 21 at 21:36
Rui F Ribeiro
38.2k1475125
38.2k1475125
asked Feb 10 '17 at 17:32
LLM
111
111
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
2
down vote
This is a working solution using perl:
Create a file for example mergeseeds.pl:
#!/bin/env perl
use List::Util qw[min max];
use Data::Dumper;
use constant COLUMNS=>3;
use constant SUM_COL=>2;
sub readfile($)
{
my $f=shift;
my @lines = `cat $f`;
chomp @lines;
return @lines;
}
sub findseed($)
{
my $arr = shift;
my $line = 0;
for(@$arr)
{
return $line if(/SEED/);
$line++;
}
return $line;
}
sub process($$$)
{
my ($colnum,$numfiles,$line)=@_;
my $sum = 0;
my @nums = (split(/,/,$line.",END"));
while($colnum < scalar @nums-1)
{
$sum += $nums[$colnum-1];
$colnum+=COLUMNS;
}
print $line.",".$sum."n";
}
sub popvalue($;@)
{
my ($arr,@filler)=@_;
return @$arr ? (shift @$arr) : (@filler);
}
sub pad_array($$$)
{
my ($arr,$pad,$filler)=@_;
while ($pad--)
{
unshift @$arr, $filler;
}
}
sub pad_arrays($$)
{
my ($files,$pads)=@_;
for(@{$files})
{
pad_array($_,shift @$pads,",,");
}
}
sub merge_files(@)
{
my @files=@_;
my $numfiles = scalar @files;
my @seedsfound = map { findseed($_); } @files;
my $maxseed = max(@seedsfound);
my @padcounts = map { ($maxseed - $_); } @seedsfound;
pad_arrays(@files,@padcounts);
my $maxlines = max( map { scalar @$_; } @files);
my $line= 0;
while($line < $maxlines)
{
my @items = map {popvalue($_,",,"); } (@files);
process(SUM_COL,$numfiles,join(",",@items));
$line++;
}
}
sub read_files(@)
{
my @filenames=@_;
my @files = map { readfile($_); } @filenames;
return @files;
}
sub usage($)
{
my ($msg)=(@_);
print STDERR "usage: $0 ≤Filename>...n";
print STDERR $msg."n";
exit 1;
}
sub main(@)
{
my @fnames=();
for my $f (@_)
{
if(! -f $f)
{
usage( "ERROR: not a file:$fn");
}
push(@fnames,$f);
}
merge_files(read_files(@fnames));
}
main(@ARGV);
Then call from the commandline:
perl mergeseeds.py File*.seed
Thanks Chunko, it is a very good solution!!!.....I tried it with some files and it worked. However I have more than 500 files, which would be the modifications to make it work with several files *.seed??... Thanks again!
– LLM
Feb 13 '17 at 20:58
I have updated the solution with a way for you to specify your files on the bash commandline with a pattern. However I am not able to test the new code at the moment
– Chunko
Feb 13 '17 at 22:05
LLM, I got home and tested the change I did this morning. It required one fix but should work now.
– Chunko
Feb 14 '17 at 11:32
Thanks Chunko again; however it gives the following error: "Undefined subroutine &main::readfiles called at merge_Seeds.pl line 111."
– LLM
Feb 17 '17 at 16:07
Sorry, that is a typo in the answer. please search/replace readfiles with read_files, I will edit the answer.
– Chunko
Feb 18 '17 at 1:44
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
This is a working solution using perl:
Create a file for example mergeseeds.pl:
#!/bin/env perl
use List::Util qw[min max];
use Data::Dumper;
use constant COLUMNS=>3;
use constant SUM_COL=>2;
sub readfile($)
{
my $f=shift;
my @lines = `cat $f`;
chomp @lines;
return @lines;
}
sub findseed($)
{
my $arr = shift;
my $line = 0;
for(@$arr)
{
return $line if(/SEED/);
$line++;
}
return $line;
}
sub process($$$)
{
my ($colnum,$numfiles,$line)=@_;
my $sum = 0;
my @nums = (split(/,/,$line.",END"));
while($colnum < scalar @nums-1)
{
$sum += $nums[$colnum-1];
$colnum+=COLUMNS;
}
print $line.",".$sum."n";
}
sub popvalue($;@)
{
my ($arr,@filler)=@_;
return @$arr ? (shift @$arr) : (@filler);
}
sub pad_array($$$)
{
my ($arr,$pad,$filler)=@_;
while ($pad--)
{
unshift @$arr, $filler;
}
}
sub pad_arrays($$)
{
my ($files,$pads)=@_;
for(@{$files})
{
pad_array($_,shift @$pads,",,");
}
}
sub merge_files(@)
{
my @files=@_;
my $numfiles = scalar @files;
my @seedsfound = map { findseed($_); } @files;
my $maxseed = max(@seedsfound);
my @padcounts = map { ($maxseed - $_); } @seedsfound;
pad_arrays(@files,@padcounts);
my $maxlines = max( map { scalar @$_; } @files);
my $line= 0;
while($line < $maxlines)
{
my @items = map {popvalue($_,",,"); } (@files);
process(SUM_COL,$numfiles,join(",",@items));
$line++;
}
}
sub read_files(@)
{
my @filenames=@_;
my @files = map { readfile($_); } @filenames;
return @files;
}
sub usage($)
{
my ($msg)=(@_);
print STDERR "usage: $0 ≤Filename>...n";
print STDERR $msg."n";
exit 1;
}
sub main(@)
{
my @fnames=();
for my $f (@_)
{
if(! -f $f)
{
usage( "ERROR: not a file:$fn");
}
push(@fnames,$f);
}
merge_files(read_files(@fnames));
}
main(@ARGV);
Then call from the commandline:
perl mergeseeds.py File*.seed
Thanks Chunko, it is a very good solution!!!.....I tried it with some files and it worked. However I have more than 500 files, which would be the modifications to make it work with several files *.seed??... Thanks again!
– LLM
Feb 13 '17 at 20:58
I have updated the solution with a way for you to specify your files on the bash commandline with a pattern. However I am not able to test the new code at the moment
– Chunko
Feb 13 '17 at 22:05
LLM, I got home and tested the change I did this morning. It required one fix but should work now.
– Chunko
Feb 14 '17 at 11:32
Thanks Chunko again; however it gives the following error: "Undefined subroutine &main::readfiles called at merge_Seeds.pl line 111."
– LLM
Feb 17 '17 at 16:07
Sorry, that is a typo in the answer. please search/replace readfiles with read_files, I will edit the answer.
– Chunko
Feb 18 '17 at 1:44
add a comment |
up vote
2
down vote
This is a working solution using perl:
Create a file for example mergeseeds.pl:
#!/bin/env perl
use List::Util qw[min max];
use Data::Dumper;
use constant COLUMNS=>3;
use constant SUM_COL=>2;
sub readfile($)
{
my $f=shift;
my @lines = `cat $f`;
chomp @lines;
return @lines;
}
sub findseed($)
{
my $arr = shift;
my $line = 0;
for(@$arr)
{
return $line if(/SEED/);
$line++;
}
return $line;
}
sub process($$$)
{
my ($colnum,$numfiles,$line)=@_;
my $sum = 0;
my @nums = (split(/,/,$line.",END"));
while($colnum < scalar @nums-1)
{
$sum += $nums[$colnum-1];
$colnum+=COLUMNS;
}
print $line.",".$sum."n";
}
sub popvalue($;@)
{
my ($arr,@filler)=@_;
return @$arr ? (shift @$arr) : (@filler);
}
sub pad_array($$$)
{
my ($arr,$pad,$filler)=@_;
while ($pad--)
{
unshift @$arr, $filler;
}
}
sub pad_arrays($$)
{
my ($files,$pads)=@_;
for(@{$files})
{
pad_array($_,shift @$pads,",,");
}
}
sub merge_files(@)
{
my @files=@_;
my $numfiles = scalar @files;
my @seedsfound = map { findseed($_); } @files;
my $maxseed = max(@seedsfound);
my @padcounts = map { ($maxseed - $_); } @seedsfound;
pad_arrays(@files,@padcounts);
my $maxlines = max( map { scalar @$_; } @files);
my $line= 0;
while($line < $maxlines)
{
my @items = map {popvalue($_,",,"); } (@files);
process(SUM_COL,$numfiles,join(",",@items));
$line++;
}
}
sub read_files(@)
{
my @filenames=@_;
my @files = map { readfile($_); } @filenames;
return @files;
}
sub usage($)
{
my ($msg)=(@_);
print STDERR "usage: $0 ≤Filename>...n";
print STDERR $msg."n";
exit 1;
}
sub main(@)
{
my @fnames=();
for my $f (@_)
{
if(! -f $f)
{
usage( "ERROR: not a file:$fn");
}
push(@fnames,$f);
}
merge_files(read_files(@fnames));
}
main(@ARGV);
Then call from the commandline:
perl mergeseeds.py File*.seed
Thanks Chunko, it is a very good solution!!!.....I tried it with some files and it worked. However I have more than 500 files, which would be the modifications to make it work with several files *.seed??... Thanks again!
– LLM
Feb 13 '17 at 20:58
I have updated the solution with a way for you to specify your files on the bash commandline with a pattern. However I am not able to test the new code at the moment
– Chunko
Feb 13 '17 at 22:05
LLM, I got home and tested the change I did this morning. It required one fix but should work now.
– Chunko
Feb 14 '17 at 11:32
Thanks Chunko again; however it gives the following error: "Undefined subroutine &main::readfiles called at merge_Seeds.pl line 111."
– LLM
Feb 17 '17 at 16:07
Sorry, that is a typo in the answer. please search/replace readfiles with read_files, I will edit the answer.
– Chunko
Feb 18 '17 at 1:44
add a comment |
up vote
2
down vote
up vote
2
down vote
This is a working solution using perl:
Create a file for example mergeseeds.pl:
#!/bin/env perl
use List::Util qw[min max];
use Data::Dumper;
use constant COLUMNS=>3;
use constant SUM_COL=>2;
sub readfile($)
{
my $f=shift;
my @lines = `cat $f`;
chomp @lines;
return @lines;
}
sub findseed($)
{
my $arr = shift;
my $line = 0;
for(@$arr)
{
return $line if(/SEED/);
$line++;
}
return $line;
}
sub process($$$)
{
my ($colnum,$numfiles,$line)=@_;
my $sum = 0;
my @nums = (split(/,/,$line.",END"));
while($colnum < scalar @nums-1)
{
$sum += $nums[$colnum-1];
$colnum+=COLUMNS;
}
print $line.",".$sum."n";
}
sub popvalue($;@)
{
my ($arr,@filler)=@_;
return @$arr ? (shift @$arr) : (@filler);
}
sub pad_array($$$)
{
my ($arr,$pad,$filler)=@_;
while ($pad--)
{
unshift @$arr, $filler;
}
}
sub pad_arrays($$)
{
my ($files,$pads)=@_;
for(@{$files})
{
pad_array($_,shift @$pads,",,");
}
}
sub merge_files(@)
{
my @files=@_;
my $numfiles = scalar @files;
my @seedsfound = map { findseed($_); } @files;
my $maxseed = max(@seedsfound);
my @padcounts = map { ($maxseed - $_); } @seedsfound;
pad_arrays(@files,@padcounts);
my $maxlines = max( map { scalar @$_; } @files);
my $line= 0;
while($line < $maxlines)
{
my @items = map {popvalue($_,",,"); } (@files);
process(SUM_COL,$numfiles,join(",",@items));
$line++;
}
}
sub read_files(@)
{
my @filenames=@_;
my @files = map { readfile($_); } @filenames;
return @files;
}
sub usage($)
{
my ($msg)=(@_);
print STDERR "usage: $0 ≤Filename>...n";
print STDERR $msg."n";
exit 1;
}
sub main(@)
{
my @fnames=();
for my $f (@_)
{
if(! -f $f)
{
usage( "ERROR: not a file:$fn");
}
push(@fnames,$f);
}
merge_files(read_files(@fnames));
}
main(@ARGV);
Then call from the commandline:
perl mergeseeds.py File*.seed
This is a working solution using perl:
Create a file for example mergeseeds.pl:
#!/bin/env perl
use List::Util qw[min max];
use Data::Dumper;
use constant COLUMNS=>3;
use constant SUM_COL=>2;
sub readfile($)
{
my $f=shift;
my @lines = `cat $f`;
chomp @lines;
return @lines;
}
sub findseed($)
{
my $arr = shift;
my $line = 0;
for(@$arr)
{
return $line if(/SEED/);
$line++;
}
return $line;
}
sub process($$$)
{
my ($colnum,$numfiles,$line)=@_;
my $sum = 0;
my @nums = (split(/,/,$line.",END"));
while($colnum < scalar @nums-1)
{
$sum += $nums[$colnum-1];
$colnum+=COLUMNS;
}
print $line.",".$sum."n";
}
sub popvalue($;@)
{
my ($arr,@filler)=@_;
return @$arr ? (shift @$arr) : (@filler);
}
sub pad_array($$$)
{
my ($arr,$pad,$filler)=@_;
while ($pad--)
{
unshift @$arr, $filler;
}
}
sub pad_arrays($$)
{
my ($files,$pads)=@_;
for(@{$files})
{
pad_array($_,shift @$pads,",,");
}
}
sub merge_files(@)
{
my @files=@_;
my $numfiles = scalar @files;
my @seedsfound = map { findseed($_); } @files;
my $maxseed = max(@seedsfound);
my @padcounts = map { ($maxseed - $_); } @seedsfound;
pad_arrays(@files,@padcounts);
my $maxlines = max( map { scalar @$_; } @files);
my $line= 0;
while($line < $maxlines)
{
my @items = map {popvalue($_,",,"); } (@files);
process(SUM_COL,$numfiles,join(",",@items));
$line++;
}
}
sub read_files(@)
{
my @filenames=@_;
my @files = map { readfile($_); } @filenames;
return @files;
}
sub usage($)
{
my ($msg)=(@_);
print STDERR "usage: $0 ≤Filename>...n";
print STDERR $msg."n";
exit 1;
}
sub main(@)
{
my @fnames=();
for my $f (@_)
{
if(! -f $f)
{
usage( "ERROR: not a file:$fn");
}
push(@fnames,$f);
}
merge_files(read_files(@fnames));
}
main(@ARGV);
Then call from the commandline:
perl mergeseeds.py File*.seed
edited Feb 18 '17 at 1:45
answered Feb 12 '17 at 3:17
Chunko
31117
31117
Thanks Chunko, it is a very good solution!!!.....I tried it with some files and it worked. However I have more than 500 files, which would be the modifications to make it work with several files *.seed??... Thanks again!
– LLM
Feb 13 '17 at 20:58
I have updated the solution with a way for you to specify your files on the bash commandline with a pattern. However I am not able to test the new code at the moment
– Chunko
Feb 13 '17 at 22:05
LLM, I got home and tested the change I did this morning. It required one fix but should work now.
– Chunko
Feb 14 '17 at 11:32
Thanks Chunko again; however it gives the following error: "Undefined subroutine &main::readfiles called at merge_Seeds.pl line 111."
– LLM
Feb 17 '17 at 16:07
Sorry, that is a typo in the answer. please search/replace readfiles with read_files, I will edit the answer.
– Chunko
Feb 18 '17 at 1:44
add a comment |
Thanks Chunko, it is a very good solution!!!.....I tried it with some files and it worked. However I have more than 500 files, which would be the modifications to make it work with several files *.seed??... Thanks again!
– LLM
Feb 13 '17 at 20:58
I have updated the solution with a way for you to specify your files on the bash commandline with a pattern. However I am not able to test the new code at the moment
– Chunko
Feb 13 '17 at 22:05
LLM, I got home and tested the change I did this morning. It required one fix but should work now.
– Chunko
Feb 14 '17 at 11:32
Thanks Chunko again; however it gives the following error: "Undefined subroutine &main::readfiles called at merge_Seeds.pl line 111."
– LLM
Feb 17 '17 at 16:07
Sorry, that is a typo in the answer. please search/replace readfiles with read_files, I will edit the answer.
– Chunko
Feb 18 '17 at 1:44
Thanks Chunko, it is a very good solution!!!.....I tried it with some files and it worked. However I have more than 500 files, which would be the modifications to make it work with several files *.seed??... Thanks again!
– LLM
Feb 13 '17 at 20:58
Thanks Chunko, it is a very good solution!!!.....I tried it with some files and it worked. However I have more than 500 files, which would be the modifications to make it work with several files *.seed??... Thanks again!
– LLM
Feb 13 '17 at 20:58
I have updated the solution with a way for you to specify your files on the bash commandline with a pattern. However I am not able to test the new code at the moment
– Chunko
Feb 13 '17 at 22:05
I have updated the solution with a way for you to specify your files on the bash commandline with a pattern. However I am not able to test the new code at the moment
– Chunko
Feb 13 '17 at 22:05
LLM, I got home and tested the change I did this morning. It required one fix but should work now.
– Chunko
Feb 14 '17 at 11:32
LLM, I got home and tested the change I did this morning. It required one fix but should work now.
– Chunko
Feb 14 '17 at 11:32
Thanks Chunko again; however it gives the following error: "Undefined subroutine &main::readfiles called at merge_Seeds.pl line 111."
– LLM
Feb 17 '17 at 16:07
Thanks Chunko again; however it gives the following error: "Undefined subroutine &main::readfiles called at merge_Seeds.pl line 111."
– LLM
Feb 17 '17 at 16:07
Sorry, that is a typo in the answer. please search/replace readfiles with read_files, I will edit the answer.
– Chunko
Feb 18 '17 at 1:44
Sorry, that is a typo in the answer. please search/replace readfiles with read_files, I will edit the answer.
– Chunko
Feb 18 '17 at 1:44
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f344070%2fcompare-and-match-multiple-files-by-pattern%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown