Splitting a file using perl

up vote
0
down vote

favorite

I have a csv file and like to split the file into smaller files based on column matching in the file using perl. I am working on Linux Rhel6.

example:

fruit1, fruit2,pricerate,quantity

orange, apple, 3,9

apple,lemon,8,1

orange, apple,3,8

pineapple,papaya,9,19

orange,apple,3,7

pineapple,papaya,9,10

Output is something like:

file1:

fruit1,fruit2,pricerate,quantity

orange,apple, 3,9

orange,apple,3,8

orange,apple,3,7

file2:

fruit1,fruit2,pricerate,quantity

pineapple,papaya,9,19

pineapple,papaya,9,10

the unmatched ones goes into a seperate file. Say file3.

edited Nov 25 at 14:59

Rui F Ribeiro

38.3k1475126

asked Jun 8 '15 at 16:26

namai

1

Just a question, Why do you think this question is related to Linux/Unix?
– VaTo
Jun 8 '15 at 16:32

Also, where would apple,lemon go?
– choroba
Jun 8 '15 at 16:49

Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
– namai
Jun 8 '15 at 17:03

hi choroba - all the unmatched ones in a separate file
– namai
Jun 8 '15 at 17:04

@namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
– VaTo
Jun 8 '15 at 17:09

add a comment |

up vote
0
down vote

favorite

I have a csv file and like to split the file into smaller files based on column matching in the file using perl. I am working on Linux Rhel6.

example:

fruit1, fruit2,pricerate,quantity

orange, apple, 3,9

apple,lemon,8,1

orange, apple,3,8

pineapple,papaya,9,19

orange,apple,3,7

pineapple,papaya,9,10

Output is something like:

file1:

fruit1,fruit2,pricerate,quantity

orange,apple, 3,9

orange,apple,3,8

orange,apple,3,7

file2:

fruit1,fruit2,pricerate,quantity

pineapple,papaya,9,19

pineapple,papaya,9,10

the unmatched ones goes into a seperate file. Say file3.

edited Nov 25 at 14:59

Rui F Ribeiro

38.3k1475126

asked Jun 8 '15 at 16:26

namai

1

Just a question, Why do you think this question is related to Linux/Unix?
– VaTo
Jun 8 '15 at 16:32

Also, where would apple,lemon go?
– choroba
Jun 8 '15 at 16:49

Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
– namai
Jun 8 '15 at 17:03

hi choroba - all the unmatched ones in a separate file
– namai
Jun 8 '15 at 17:04

@namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
– VaTo
Jun 8 '15 at 17:09

add a comment |

up vote
0
down vote

favorite

I have a csv file and like to split the file into smaller files based on column matching in the file using perl. I am working on Linux Rhel6.

example:

fruit1, fruit2,pricerate,quantity

orange, apple, 3,9

apple,lemon,8,1

orange, apple,3,8

pineapple,papaya,9,19

orange,apple,3,7

pineapple,papaya,9,10

Output is something like:

file1:

fruit1,fruit2,pricerate,quantity

orange,apple, 3,9

orange,apple,3,8

orange,apple,3,7

file2:

fruit1,fruit2,pricerate,quantity

pineapple,papaya,9,19

pineapple,papaya,9,10

the unmatched ones goes into a seperate file. Say file3.

edited Nov 25 at 14:59

Rui F Ribeiro

38.3k1475126

asked Jun 8 '15 at 16:26

namai

I have a csv file and like to split the file into smaller files based on column matching in the file using perl. I am working on Linux Rhel6.

example:

fruit1, fruit2,pricerate,quantity

orange, apple, 3,9

apple,lemon,8,1

orange, apple,3,8

pineapple,papaya,9,19

orange,apple,3,7

pineapple,papaya,9,10

Output is something like:

file1:

fruit1,fruit2,pricerate,quantity

orange,apple, 3,9

orange,apple,3,8

orange,apple,3,7

file2:

fruit1,fruit2,pricerate,quantity

pineapple,papaya,9,19

pineapple,papaya,9,10

the unmatched ones goes into a seperate file. Say file3.

perl split csv-simple

edited Nov 25 at 14:59

Rui F Ribeiro

38.3k1475126

asked Jun 8 '15 at 16:26

namai

edited Nov 25 at 14:59

Rui F Ribeiro

38.3k1475126

asked Jun 8 '15 at 16:26

namai

edited Nov 25 at 14:59

Rui F Ribeiro

38.3k1475126

edited Nov 25 at 14:59

Rui F Ribeiro

38.3k1475126

edited Nov 25 at 14:59

Rui F Ribeiro

38.3k1475126

asked Jun 8 '15 at 16:26

namai

asked Jun 8 '15 at 16:26

namai

asked Jun 8 '15 at 16:26

namai

1

Just a question, Why do you think this question is related to Linux/Unix?
– VaTo
Jun 8 '15 at 16:32

Also, where would apple,lemon go?
– choroba
Jun 8 '15 at 16:49

Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
– namai
Jun 8 '15 at 17:03

hi choroba - all the unmatched ones in a separate file
– namai
Jun 8 '15 at 17:04

@namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
– VaTo
Jun 8 '15 at 17:09

add a comment |

1

Just a question, Why do you think this question is related to Linux/Unix?
– VaTo
Jun 8 '15 at 16:32

Also, where would apple,lemon go?
– choroba
Jun 8 '15 at 16:49

Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
– namai
Jun 8 '15 at 17:03

hi choroba - all the unmatched ones in a separate file
– namai
Jun 8 '15 at 17:04

@namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
– VaTo
Jun 8 '15 at 17:09

Just a question, Why do you think this question is related to Linux/Unix?
– VaTo
Jun 8 '15 at 16:32

Also, where would apple,lemon go?
– choroba
Jun 8 '15 at 16:49

Apologies Saul - I should have mentioned earlier ..I am trying this in Linux rhel6.
– namai
Jun 8 '15 at 17:03

hi choroba - all the unmatched ones in a separate file
– namai
Jun 8 '15 at 17:04

@namai No problem, please feel free to go back to your question editing it. Include all that information along the information you think is relevant for people that see it so, it would be easier for them to help you out with your problem (the more details related to your problem the better). Failure to do this will make the people that want to help frustrated and not willing to help. Take this as a suggestion from my part.
– VaTo
Jun 8 '15 at 17:09

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

One of the ways in which you can solve this is:

Open the input file

Store the first line of the input file (the header)

For every line in the input file after the header:
- Read the first two columns
- If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.
- Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.

Here is some example code, which will match on the first two fields:

#!/usr/bin/perl



use strict;

use warnings;



my %filehandles=();

my $filenum=1;



open INPUT, "fruit.csv" 

    or die "Cannot open input file.";



my $header = <INPUT>;



while ( <INPUT> )

{   # Remove spaces from input 

    $_ =~ s/ //g;



    my @fields = split ',', $_;



    if ( ! $filehandles{$fields[0]}{$fields[1]} )

    {   open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"

            or die "Cannot open output file file$filenum.";

        print {$filehandles{$fields[0]}{$fields[1]}} $header;

        $filenum++;

    }

    print {$filehandles{$fields[0]}{$fields[1]}} $_;

}

answered Jun 30 '15 at 13:06

Sietse

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f208290%2fsplitting-a-file-using-perl%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

One of the ways in which you can solve this is:

Open the input file

Store the first line of the input file (the header)

For every line in the input file after the header:
- Read the first two columns
- If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.
- Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.

Here is some example code, which will match on the first two fields:

#!/usr/bin/perl



use strict;

use warnings;



my %filehandles=();

my $filenum=1;



open INPUT, "fruit.csv" 

    or die "Cannot open input file.";



my $header = <INPUT>;



while ( <INPUT> )

{   # Remove spaces from input 

    $_ =~ s/ //g;



    my @fields = split ',', $_;



    if ( ! $filehandles{$fields[0]}{$fields[1]} )

    {   open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"

            or die "Cannot open output file file$filenum.";

        print {$filehandles{$fields[0]}{$fields[1]}} $header;

        $filenum++;

    }

    print {$filehandles{$fields[0]}{$fields[1]}} $_;

}

answered Jun 30 '15 at 13:06

Sietse

add a comment |

up vote
0
down vote

One of the ways in which you can solve this is:

Open the input file

Store the first line of the input file (the header)

For every line in the input file after the header:
- Read the first two columns
- If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.
- Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.

Here is some example code, which will match on the first two fields:

#!/usr/bin/perl



use strict;

use warnings;



my %filehandles=();

my $filenum=1;



open INPUT, "fruit.csv" 

    or die "Cannot open input file.";



my $header = <INPUT>;



while ( <INPUT> )

{   # Remove spaces from input 

    $_ =~ s/ //g;



    my @fields = split ',', $_;



    if ( ! $filehandles{$fields[0]}{$fields[1]} )

    {   open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"

            or die "Cannot open output file file$filenum.";

        print {$filehandles{$fields[0]}{$fields[1]}} $header;

        $filenum++;

    }

    print {$filehandles{$fields[0]}{$fields[1]}} $_;

}

answered Jun 30 '15 at 13:06

Sietse

add a comment |

up vote
0
down vote

One of the ways in which you can solve this is:

Open the input file

Store the first line of the input file (the header)

For every line in the input file after the header:
- Read the first two columns
- If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.
- Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.

Here is some example code, which will match on the first two fields:

#!/usr/bin/perl



use strict;

use warnings;



my %filehandles=();

my $filenum=1;



open INPUT, "fruit.csv" 

    or die "Cannot open input file.";



my $header = <INPUT>;



while ( <INPUT> )

{   # Remove spaces from input 

    $_ =~ s/ //g;



    my @fields = split ',', $_;



    if ( ! $filehandles{$fields[0]}{$fields[1]} )

    {   open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"

            or die "Cannot open output file file$filenum.";

        print {$filehandles{$fields[0]}{$fields[1]}} $header;

        $filenum++;

    }

    print {$filehandles{$fields[0]}{$fields[1]}} $_;

}

answered Jun 30 '15 at 13:06

Sietse

One of the ways in which you can solve this is:

Open the input file

Store the first line of the input file (the header)

For every line in the input file after the header:
- Read the first two columns
- If we haven't opened an output file for the fields you want to match on yet, open a new output file and store its file handle in a hash. Write the header line to the new output file too.
- Fetch the handle of the output file in which we should store this line from the file handle hash. Write the line to that file.

Here is some example code, which will match on the first two fields:

#!/usr/bin/perl



use strict;

use warnings;



my %filehandles=();

my $filenum=1;



open INPUT, "fruit.csv" 

    or die "Cannot open input file.";



my $header = <INPUT>;



while ( <INPUT> )

{   # Remove spaces from input 

    $_ =~ s/ //g;



    my @fields = split ',', $_;



    if ( ! $filehandles{$fields[0]}{$fields[1]} )

    {   open $filehandles{$fields[0]}{$fields[1]} , ">file$filenum"

            or die "Cannot open output file file$filenum.";

        print {$filehandles{$fields[0]}{$fields[1]}} $header;

        $filenum++;

    }

    print {$filehandles{$fields[0]}{$fields[1]}} $_;

}

answered Jun 30 '15 at 13:06

Sietse

answered Jun 30 '15 at 13:06

Sietse

answered Jun 30 '15 at 13:06

Sietse

answered Jun 30 '15 at 13:06

Sietse

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj