extract data from text files to csv

up vote
1
down vote

favorite

I have a series of text files with a consistent format like:

FirstName: Mary

LastName: Smith

Address: 123 Anywhere St

City: Nowhere

State: TX

Zip: 77777

I need to extract several lines from these files and output them into a csv file in a format like:

<filename>,<FirstName>,<City>,<Zip>

I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked Aug 8 '16 at 21:07

Kevin Pearce

1

What's your expected output
– tachomi
Aug 8 '16 at 21:14

Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22

add a comment |

up vote
1
down vote

favorite

I have a series of text files with a consistent format like:

FirstName: Mary

LastName: Smith

Address: 123 Anywhere St

City: Nowhere

State: TX

Zip: 77777

I need to extract several lines from these files and output them into a csv file in a format like:

<filename>,<FirstName>,<City>,<Zip>

I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked Aug 8 '16 at 21:07

Kevin Pearce

1

What's your expected output
– tachomi
Aug 8 '16 at 21:14

Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22

add a comment |

up vote
1
down vote

favorite

I have a series of text files with a consistent format like:

FirstName: Mary

LastName: Smith

Address: 123 Anywhere St

City: Nowhere

State: TX

Zip: 77777

I need to extract several lines from these files and output them into a csv file in a format like:

<filename>,<FirstName>,<City>,<Zip>

I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked Aug 8 '16 at 21:07

Kevin Pearce

I have a series of text files with a consistent format like:

FirstName: Mary

LastName: Smith

Address: 123 Anywhere St

City: Nowhere

State: TX

Zip: 77777

I need to extract several lines from these files and output them into a csv file in a format like:

<filename>,<FirstName>,<City>,<Zip>

I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.

text-processing

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked Aug 8 '16 at 21:07

Kevin Pearce

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked Aug 8 '16 at 21:07

Kevin Pearce

edited 2 days ago

Rui F Ribeiro

38.2k1475123

edited 2 days ago

Rui F Ribeiro

38.2k1475123

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked Aug 8 '16 at 21:07

Kevin Pearce

asked Aug 8 '16 at 21:07

Kevin Pearce

asked Aug 8 '16 at 21:07

Kevin Pearce

1

What's your expected output
– tachomi
Aug 8 '16 at 21:14

Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22

add a comment |

1

What's your expected output
– tachomi
Aug 8 '16 at 21:14

Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22

What's your expected output
– tachomi
Aug 8 '16 at 21:14

Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22

add a comment |

3 Answers
3

active

oldest

votes

up vote
0
down vote

accepted

If you only have one record per file then this is a simple read loop.

#!/bin/bash



read_data()

{

  local first last addr city state zip



  file=$1



  while read -r header data

  do

    case $header in

      FirstName:) first=$data ;;

       LastName:) last=$data ;;

        Address:) addr=$data ;;

           City:) city=$data ;;

          State:) state=$data ;;

            Zip:) zip=$data ;;

               *) echo Ignoring bad line $header $data >&2

    esac

  done < $file

  echo "$file,$first,$last,$addr,$city,$state,$zip"

}



for file in *srcfiles*

do

  read_data $file

done

The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.

We call that function once for each source file via the for loop.

Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do

  echo ""$file","$first","$last","$addr","$city","$state","$zip""

as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.

Adjust the echo line to match the format you want.

edited Aug 8 '16 at 22:19

answered Aug 8 '16 at 22:09

Stephen Harris

22.8k24176

add a comment |

up vote
0
down vote

Quick and dirty approach, may suit your requirements.

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

Example:

grep . *

f1.txt:FirstName: Mary

f1.txt:LastName: Smith

f1.txt:Address: 123 Anywhere St

f1.txt:City: Nowhere

f1.txt:State: TX

f1.txt:Zip: 77777

f2.txt:FirstName: Joe

f2.txt:LastName: Bloggs

f2.txt:Address: 444 Anywhere St

f2.txt:City: Nowhere2

f2.txt:State: TXA

f2.txt:Zip: 77737

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

f1.txt,Mary,Nowhere,77777

f2.txt,Joe,Nowhere2,77737

answered Aug 8 '16 at 22:10

steve

13.7k22452

add a comment |

up vote
0
down vote

If there's a single record per file and you have GNU awk, you could do

gawk -F': +' -vOFS=, '

  BEGINFILE{delete rec}

  {rec[$1] = $2}

  ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}

' file1.txt file2.txt ...

edited Aug 9 '16 at 1:19

answered Aug 8 '16 at 22:29

steeldriver

33.6k34982

Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15

@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f302127%2fextract-data-from-text-files-to-csv%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
0
down vote

accepted

If you only have one record per file then this is a simple read loop.

#!/bin/bash



read_data()

{

  local first last addr city state zip



  file=$1



  while read -r header data

  do

    case $header in

      FirstName:) first=$data ;;

       LastName:) last=$data ;;

        Address:) addr=$data ;;

           City:) city=$data ;;

          State:) state=$data ;;

            Zip:) zip=$data ;;

               *) echo Ignoring bad line $header $data >&2

    esac

  done < $file

  echo "$file,$first,$last,$addr,$city,$state,$zip"

}



for file in *srcfiles*

do

  read_data $file

done

The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.

We call that function once for each source file via the for loop.

Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do

  echo ""$file","$first","$last","$addr","$city","$state","$zip""

as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.

Adjust the echo line to match the format you want.

edited Aug 8 '16 at 22:19

answered Aug 8 '16 at 22:09

Stephen Harris

22.8k24176

add a comment |

up vote
0
down vote

accepted

If you only have one record per file then this is a simple read loop.

#!/bin/bash



read_data()

{

  local first last addr city state zip



  file=$1



  while read -r header data

  do

    case $header in

      FirstName:) first=$data ;;

       LastName:) last=$data ;;

        Address:) addr=$data ;;

           City:) city=$data ;;

          State:) state=$data ;;

            Zip:) zip=$data ;;

               *) echo Ignoring bad line $header $data >&2

    esac

  done < $file

  echo "$file,$first,$last,$addr,$city,$state,$zip"

}



for file in *srcfiles*

do

  read_data $file

done

The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.

We call that function once for each source file via the for loop.

Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do

  echo ""$file","$first","$last","$addr","$city","$state","$zip""

as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.

Adjust the echo line to match the format you want.

edited Aug 8 '16 at 22:19

answered Aug 8 '16 at 22:09

Stephen Harris

22.8k24176

add a comment |

up vote
0
down vote

accepted

If you only have one record per file then this is a simple read loop.

#!/bin/bash



read_data()

{

  local first last addr city state zip



  file=$1



  while read -r header data

  do

    case $header in

      FirstName:) first=$data ;;

       LastName:) last=$data ;;

        Address:) addr=$data ;;

           City:) city=$data ;;

          State:) state=$data ;;

            Zip:) zip=$data ;;

               *) echo Ignoring bad line $header $data >&2

    esac

  done < $file

  echo "$file,$first,$last,$addr,$city,$state,$zip"

}



for file in *srcfiles*

do

  read_data $file

done

The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.

We call that function once for each source file via the for loop.

Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do

  echo ""$file","$first","$last","$addr","$city","$state","$zip""

as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.

Adjust the echo line to match the format you want.

edited Aug 8 '16 at 22:19

answered Aug 8 '16 at 22:09

Stephen Harris

22.8k24176

If you only have one record per file then this is a simple read loop.

#!/bin/bash



read_data()

{

  local first last addr city state zip



  file=$1



  while read -r header data

  do

    case $header in

      FirstName:) first=$data ;;

       LastName:) last=$data ;;

        Address:) addr=$data ;;

           City:) city=$data ;;

          State:) state=$data ;;

            Zip:) zip=$data ;;

               *) echo Ignoring bad line $header $data >&2

    esac

  done < $file

  echo "$file,$first,$last,$addr,$city,$state,$zip"

}



for file in *srcfiles*

do

  read_data $file

done

The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.

We call that function once for each source file via the for loop.

Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do

  echo ""$file","$first","$last","$addr","$city","$state","$zip""

as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.

Adjust the echo line to match the format you want.

edited Aug 8 '16 at 22:19

answered Aug 8 '16 at 22:09

Stephen Harris

22.8k24176

edited Aug 8 '16 at 22:19

answered Aug 8 '16 at 22:09

Stephen Harris

22.8k24176

answered Aug 8 '16 at 22:09

Stephen Harris

22.8k24176

answered Aug 8 '16 at 22:09

Stephen Harris

22.8k24176

add a comment |

up vote
0
down vote

Quick and dirty approach, may suit your requirements.

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

Example:

grep . *

f1.txt:FirstName: Mary

f1.txt:LastName: Smith

f1.txt:Address: 123 Anywhere St

f1.txt:City: Nowhere

f1.txt:State: TX

f1.txt:Zip: 77777

f2.txt:FirstName: Joe

f2.txt:LastName: Bloggs

f2.txt:Address: 444 Anywhere St

f2.txt:City: Nowhere2

f2.txt:State: TXA

f2.txt:Zip: 77737

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

f1.txt,Mary,Nowhere,77777

f2.txt,Joe,Nowhere2,77737

answered Aug 8 '16 at 22:10

steve

13.7k22452

add a comment |

up vote
0
down vote

Quick and dirty approach, may suit your requirements.

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

Example:

grep . *

f1.txt:FirstName: Mary

f1.txt:LastName: Smith

f1.txt:Address: 123 Anywhere St

f1.txt:City: Nowhere

f1.txt:State: TX

f1.txt:Zip: 77777

f2.txt:FirstName: Joe

f2.txt:LastName: Bloggs

f2.txt:Address: 444 Anywhere St

f2.txt:City: Nowhere2

f2.txt:State: TXA

f2.txt:Zip: 77737

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

f1.txt,Mary,Nowhere,77777

f2.txt,Joe,Nowhere2,77737

answered Aug 8 '16 at 22:10

steve

13.7k22452

add a comment |

up vote
0
down vote

Quick and dirty approach, may suit your requirements.

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

Example:

grep . *

f1.txt:FirstName: Mary

f1.txt:LastName: Smith

f1.txt:Address: 123 Anywhere St

f1.txt:City: Nowhere

f1.txt:State: TX

f1.txt:Zip: 77777

f2.txt:FirstName: Joe

f2.txt:LastName: Bloggs

f2.txt:Address: 444 Anywhere St

f2.txt:City: Nowhere2

f2.txt:State: TXA

f2.txt:Zip: 77737

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

f1.txt,Mary,Nowhere,77777

f2.txt,Joe,Nowhere2,77737

answered Aug 8 '16 at 22:10

steve

13.7k22452

Quick and dirty approach, may suit your requirements.

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

Example:

grep . *

f1.txt:FirstName: Mary

f1.txt:LastName: Smith

f1.txt:Address: 123 Anywhere St

f1.txt:City: Nowhere

f1.txt:State: TX

f1.txt:Zip: 77777

f2.txt:FirstName: Joe

f2.txt:LastName: Bloggs

f2.txt:Address: 444 Anywhere St

f2.txt:City: Nowhere2

f2.txt:State: TXA

f2.txt:Zip: 77737

grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'

f1.txt,Mary,Nowhere,77777

f2.txt,Joe,Nowhere2,77737

answered Aug 8 '16 at 22:10

steve

13.7k22452

answered Aug 8 '16 at 22:10

steve

13.7k22452

answered Aug 8 '16 at 22:10

steve

13.7k22452

answered Aug 8 '16 at 22:10

steve

13.7k22452

add a comment |

up vote
0
down vote

If there's a single record per file and you have GNU awk, you could do

gawk -F': +' -vOFS=, '

  BEGINFILE{delete rec}

  {rec[$1] = $2}

  ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}

' file1.txt file2.txt ...

edited Aug 9 '16 at 1:19

answered Aug 8 '16 at 22:29

steeldriver

33.6k34982

Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15

@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20

add a comment |

up vote
0
down vote

If there's a single record per file and you have GNU awk, you could do

gawk -F': +' -vOFS=, '

  BEGINFILE{delete rec}

  {rec[$1] = $2}

  ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}

' file1.txt file2.txt ...

edited Aug 9 '16 at 1:19

answered Aug 8 '16 at 22:29

steeldriver

33.6k34982

Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15

@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20

add a comment |

up vote
0
down vote

If there's a single record per file and you have GNU awk, you could do

gawk -F': +' -vOFS=, '

  BEGINFILE{delete rec}

  {rec[$1] = $2}

  ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}

' file1.txt file2.txt ...

edited Aug 9 '16 at 1:19

answered Aug 8 '16 at 22:29

steeldriver

33.6k34982

If there's a single record per file and you have GNU awk, you could do

gawk -F': +' -vOFS=, '

  BEGINFILE{delete rec}

  {rec[$1] = $2}

  ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}

' file1.txt file2.txt ...

edited Aug 9 '16 at 1:19

answered Aug 8 '16 at 22:29

steeldriver

33.6k34982

edited Aug 9 '16 at 1:19

answered Aug 8 '16 at 22:29

steeldriver

33.6k34982

answered Aug 8 '16 at 22:29

steeldriver

33.6k34982

answered Aug 8 '16 at 22:29

steeldriver

33.6k34982

Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15

@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20

add a comment |

Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15

@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20

Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15

@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj