Splitting a file using perl











up vote
0
down vote

favorite












I have a csv file and like to split the file into smaller files based on column matching in the file using perl. I am working on Linux Rhel6.



example:



fruit1, fruit2,pricerate,quantity
orange, apple, 3,9
apple,lemon,8,1
orange, apple,3,8
pineapple,papaya,9,19
orange,apple,3,7
pineapple,papaya,9,10


Output is something like:



file1:



fruit1,fruit2,pricerate,quantity
orange,apple, 3,9
orange,apple,3,8
orange,apple,3,7


file2:



fruit1,fruit2,pricerate,quantity
pineapple,papaya,9,19
pineapple,papaya,9,10


the unmatched ones goes into a seperate file. Say file3.










share|improve this question




















  • 1




    Just a question, Why do you think this question is related to Linux/Unix?
    – VaTo
    Jun 8 '15 at 16:32










  • Also, where would apple,lemon go?
    – choroba
    Jun 8 '15 at 16:49












  • Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
    – namai
    Jun 8 '15 at 17:03










  • hi choroba - all the unmatched ones in a separate file
    – namai
    Jun 8 '15 at 17:04










  • @namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
    – VaTo
    Jun 8 '15 at 17:09

















up vote
0
down vote

favorite












I have a csv file and like to split the file into smaller files based on column matching in the file using perl. I am working on Linux Rhel6.



example:



fruit1, fruit2,pricerate,quantity
orange, apple, 3,9
apple,lemon,8,1
orange, apple,3,8
pineapple,papaya,9,19
orange,apple,3,7
pineapple,papaya,9,10


Output is something like:



file1:



fruit1,fruit2,pricerate,quantity
orange,apple, 3,9
orange,apple,3,8
orange,apple,3,7


file2:



fruit1,fruit2,pricerate,quantity
pineapple,papaya,9,19
pineapple,papaya,9,10


the unmatched ones goes into a seperate file. Say file3.










share|improve this question




















  • 1




    Just a question, Why do you think this question is related to Linux/Unix?
    – VaTo
    Jun 8 '15 at 16:32










  • Also, where would apple,lemon go?
    – choroba
    Jun 8 '15 at 16:49












  • Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
    – namai
    Jun 8 '15 at 17:03










  • hi choroba - all the unmatched ones in a separate file
    – namai
    Jun 8 '15 at 17:04










  • @namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
    – VaTo
    Jun 8 '15 at 17:09















up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a csv file and like to split the file into smaller files based on column matching in the file using perl. I am working on Linux Rhel6.



example:



fruit1, fruit2,pricerate,quantity
orange, apple, 3,9
apple,lemon,8,1
orange, apple,3,8
pineapple,papaya,9,19
orange,apple,3,7
pineapple,papaya,9,10


Output is something like:



file1:



fruit1,fruit2,pricerate,quantity
orange,apple, 3,9
orange,apple,3,8
orange,apple,3,7


file2:



fruit1,fruit2,pricerate,quantity
pineapple,papaya,9,19
pineapple,papaya,9,10


the unmatched ones goes into a seperate file. Say file3.










share|improve this question















I have a csv file and like to split the file into smaller files based on column matching in the file using perl. I am working on Linux Rhel6.



example:



fruit1, fruit2,pricerate,quantity
orange, apple, 3,9
apple,lemon,8,1
orange, apple,3,8
pineapple,papaya,9,19
orange,apple,3,7
pineapple,papaya,9,10


Output is something like:



file1:



fruit1,fruit2,pricerate,quantity
orange,apple, 3,9
orange,apple,3,8
orange,apple,3,7


file2:



fruit1,fruit2,pricerate,quantity
pineapple,papaya,9,19
pineapple,papaya,9,10


the unmatched ones goes into a seperate file. Say file3.







perl split csv-simple






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 at 14:59









Rui F Ribeiro

38.3k1475126




38.3k1475126










asked Jun 8 '15 at 16:26









namai

11




11








  • 1




    Just a question, Why do you think this question is related to Linux/Unix?
    – VaTo
    Jun 8 '15 at 16:32










  • Also, where would apple,lemon go?
    – choroba
    Jun 8 '15 at 16:49












  • Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
    – namai
    Jun 8 '15 at 17:03










  • hi choroba - all the unmatched ones in a separate file
    – namai
    Jun 8 '15 at 17:04










  • @namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
    – VaTo
    Jun 8 '15 at 17:09
















  • 1




    Just a question, Why do you think this question is related to Linux/Unix?
    – VaTo
    Jun 8 '15 at 16:32










  • Also, where would apple,lemon go?
    – choroba
    Jun 8 '15 at 16:49












  • Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
    – namai
    Jun 8 '15 at 17:03










  • hi choroba - all the unmatched ones in a separate file
    – namai
    Jun 8 '15 at 17:04










  • @namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
    – VaTo
    Jun 8 '15 at 17:09










1




1




Just a question, Why do you think this question is related to Linux/Unix?
– VaTo
Jun 8 '15 at 16:32




Just a question, Why do you think this question is related to Linux/Unix?
– VaTo
Jun 8 '15 at 16:32












Also, where would apple,lemon go?
– choroba
Jun 8 '15 at 16:49






Also, where would apple,lemon go?
– choroba
Jun 8 '15 at 16:49














Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
– namai
Jun 8 '15 at 17:03




Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
– namai
Jun 8 '15 at 17:03












hi choroba - all the unmatched ones in a separate file
– namai
Jun 8 '15 at 17:04




hi choroba - all the unmatched ones in a separate file
– namai
Jun 8 '15 at 17:04












@namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
– VaTo
Jun 8 '15 at 17:09






@namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
– VaTo
Jun 8 '15 at 17:09












1 Answer
1






active

oldest

votes

















up vote
0
down vote













One of the ways in which you can solve this is:




  • Open the input file

  • Store the first line of the input file (the header)


  • For every line in the input file after the header:




    • Read the first two columns

    • If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.

    • Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.




Here is some example code, which will match on the first two fields:



#!/usr/bin/perl

use strict;
use warnings;

my %filehandles=();
my $filenum=1;

open INPUT, "fruit.csv"
or die "Cannot open input file.";

my $header = <INPUT>;

while ( <INPUT> )
{ # Remove spaces from input
$_ =~ s/ //g;

my @fields = split ',', $_;

if ( ! $filehandles{$fields[0]}{$fields[1]} )
{ open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"
or die "Cannot open output file file$filenum.";
print {$filehandles{$fields[0]}{$fields[1]}} $header;
$filenum++;
}
print {$filehandles{$fields[0]}{$fields[1]}} $_;
}





share|improve this answer





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f208290%2fsplitting-a-file-using-perl%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    One of the ways in which you can solve this is:




    • Open the input file

    • Store the first line of the input file (the header)


    • For every line in the input file after the header:




      • Read the first two columns

      • If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.

      • Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.




    Here is some example code, which will match on the first two fields:



    #!/usr/bin/perl

    use strict;
    use warnings;

    my %filehandles=();
    my $filenum=1;

    open INPUT, "fruit.csv"
    or die "Cannot open input file.";

    my $header = <INPUT>;

    while ( <INPUT> )
    { # Remove spaces from input
    $_ =~ s/ //g;

    my @fields = split ',', $_;

    if ( ! $filehandles{$fields[0]}{$fields[1]} )
    { open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"
    or die "Cannot open output file file$filenum.";
    print {$filehandles{$fields[0]}{$fields[1]}} $header;
    $filenum++;
    }
    print {$filehandles{$fields[0]}{$fields[1]}} $_;
    }





    share|improve this answer

























      up vote
      0
      down vote













      One of the ways in which you can solve this is:




      • Open the input file

      • Store the first line of the input file (the header)


      • For every line in the input file after the header:




        • Read the first two columns

        • If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.

        • Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.




      Here is some example code, which will match on the first two fields:



      #!/usr/bin/perl

      use strict;
      use warnings;

      my %filehandles=();
      my $filenum=1;

      open INPUT, "fruit.csv"
      or die "Cannot open input file.";

      my $header = <INPUT>;

      while ( <INPUT> )
      { # Remove spaces from input
      $_ =~ s/ //g;

      my @fields = split ',', $_;

      if ( ! $filehandles{$fields[0]}{$fields[1]} )
      { open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"
      or die "Cannot open output file file$filenum.";
      print {$filehandles{$fields[0]}{$fields[1]}} $header;
      $filenum++;
      }
      print {$filehandles{$fields[0]}{$fields[1]}} $_;
      }





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        One of the ways in which you can solve this is:




        • Open the input file

        • Store the first line of the input file (the header)


        • For every line in the input file after the header:




          • Read the first two columns

          • If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.

          • Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.




        Here is some example code, which will match on the first two fields:



        #!/usr/bin/perl

        use strict;
        use warnings;

        my %filehandles=();
        my $filenum=1;

        open INPUT, "fruit.csv"
        or die "Cannot open input file.";

        my $header = <INPUT>;

        while ( <INPUT> )
        { # Remove spaces from input
        $_ =~ s/ //g;

        my @fields = split ',', $_;

        if ( ! $filehandles{$fields[0]}{$fields[1]} )
        { open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"
        or die "Cannot open output file file$filenum.";
        print {$filehandles{$fields[0]}{$fields[1]}} $header;
        $filenum++;
        }
        print {$filehandles{$fields[0]}{$fields[1]}} $_;
        }





        share|improve this answer












        One of the ways in which you can solve this is:




        • Open the input file

        • Store the first line of the input file (the header)


        • For every line in the input file after the header:




          • Read the first two columns

          • If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.

          • Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.




        Here is some example code, which will match on the first two fields:



        #!/usr/bin/perl

        use strict;
        use warnings;

        my %filehandles=();
        my $filenum=1;

        open INPUT, "fruit.csv"
        or die "Cannot open input file.";

        my $header = <INPUT>;

        while ( <INPUT> )
        { # Remove spaces from input
        $_ =~ s/ //g;

        my @fields = split ',', $_;

        if ( ! $filehandles{$fields[0]}{$fields[1]} )
        { open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"
        or die "Cannot open output file file$filenum.";
        print {$filehandles{$fields[0]}{$fields[1]}} $header;
        $filenum++;
        }
        print {$filehandles{$fields[0]}{$fields[1]}} $_;
        }






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 30 '15 at 13:06









        Sietse

        11




        11






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f208290%2fsplitting-a-file-using-perl%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Accessing regular linux commands in Huawei's Dopra Linux

            Can't connect RFCOMM socket: Host is down

            Kernel panic - not syncing: Fatal Exception in Interrupt