extract data from text files to csv











up vote
1
down vote

favorite
1












I have a series of text files with a consistent format like:



FirstName: Mary
LastName: Smith
Address: 123 Anywhere St
City: Nowhere
State: TX
Zip: 77777


I need to extract several lines from these files and output them into a csv file in a format like:



<filename>,<FirstName>,<City>,<Zip>


I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.










share|improve this question




















  • 1




    What's your expected output
    – tachomi
    Aug 8 '16 at 21:14










  • Sounds. like a simple Perl task. What is between Zip and the next First Name?
    – waltinator
    Aug 8 '16 at 21:22















up vote
1
down vote

favorite
1












I have a series of text files with a consistent format like:



FirstName: Mary
LastName: Smith
Address: 123 Anywhere St
City: Nowhere
State: TX
Zip: 77777


I need to extract several lines from these files and output them into a csv file in a format like:



<filename>,<FirstName>,<City>,<Zip>


I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.










share|improve this question




















  • 1




    What's your expected output
    – tachomi
    Aug 8 '16 at 21:14










  • Sounds. like a simple Perl task. What is between Zip and the next First Name?
    – waltinator
    Aug 8 '16 at 21:22













up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I have a series of text files with a consistent format like:



FirstName: Mary
LastName: Smith
Address: 123 Anywhere St
City: Nowhere
State: TX
Zip: 77777


I need to extract several lines from these files and output them into a csv file in a format like:



<filename>,<FirstName>,<City>,<Zip>


I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.










share|improve this question















I have a series of text files with a consistent format like:



FirstName: Mary
LastName: Smith
Address: 123 Anywhere St
City: Nowhere
State: TX
Zip: 77777


I need to extract several lines from these files and output them into a csv file in a format like:



<filename>,<FirstName>,<City>,<Zip>


I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.







text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









Rui F Ribeiro

38.2k1475123




38.2k1475123










asked Aug 8 '16 at 21:07









Kevin Pearce

82




82








  • 1




    What's your expected output
    – tachomi
    Aug 8 '16 at 21:14










  • Sounds. like a simple Perl task. What is between Zip and the next First Name?
    – waltinator
    Aug 8 '16 at 21:22














  • 1




    What's your expected output
    – tachomi
    Aug 8 '16 at 21:14










  • Sounds. like a simple Perl task. What is between Zip and the next First Name?
    – waltinator
    Aug 8 '16 at 21:22








1




1




What's your expected output
– tachomi
Aug 8 '16 at 21:14




What's your expected output
– tachomi
Aug 8 '16 at 21:14












Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22




Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22










3 Answers
3






active

oldest

votes

















up vote
0
down vote



accepted










If you only have one record per file then this is a simple read loop.



#!/bin/bash

read_data()
{
local first last addr city state zip

file=$1

while read -r header data
do
case $header in
FirstName:) first=$data ;;
LastName:) last=$data ;;
Address:) addr=$data ;;
City:) city=$data ;;
State:) state=$data ;;
Zip:) zip=$data ;;
*) echo Ignoring bad line $header $data >&2
esac
done < $file
echo "$file,$first,$last,$addr,$city,$state,$zip"
}

for file in *srcfiles*
do
read_data $file
done


The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.



We call that function once for each source file via the for loop.



Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do



  echo ""$file","$first","$last","$addr","$city","$state","$zip""


as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.



Adjust the echo line to match the format you want.






share|improve this answer






























    up vote
    0
    down vote













    Quick and dirty approach, may suit your requirements.



    grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'


    Example:



    grep . *
    f1.txt:FirstName: Mary
    f1.txt:LastName: Smith
    f1.txt:Address: 123 Anywhere St
    f1.txt:City: Nowhere
    f1.txt:State: TX
    f1.txt:Zip: 77777
    f2.txt:FirstName: Joe
    f2.txt:LastName: Bloggs
    f2.txt:Address: 444 Anywhere St
    f2.txt:City: Nowhere2
    f2.txt:State: TXA
    f2.txt:Zip: 77737
    grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
    f1.txt,Mary,Nowhere,77777
    f2.txt,Joe,Nowhere2,77737





    share|improve this answer




























      up vote
      0
      down vote













      If there's a single record per file and you have GNU awk, you could do



      gawk -F': +' -vOFS=, '
      BEGINFILE{delete rec}
      {rec[$1] = $2}
      ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
      ' file1.txt file2.txt ...





      share|improve this answer























      • Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
        – Stephen Harris
        Aug 9 '16 at 1:15










      • @StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
        – steeldriver
        Aug 9 '16 at 1:20











      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "106"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














       

      draft saved


      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f302127%2fextract-data-from-text-files-to-csv%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      0
      down vote



      accepted










      If you only have one record per file then this is a simple read loop.



      #!/bin/bash

      read_data()
      {
      local first last addr city state zip

      file=$1

      while read -r header data
      do
      case $header in
      FirstName:) first=$data ;;
      LastName:) last=$data ;;
      Address:) addr=$data ;;
      City:) city=$data ;;
      State:) state=$data ;;
      Zip:) zip=$data ;;
      *) echo Ignoring bad line $header $data >&2
      esac
      done < $file
      echo "$file,$first,$last,$addr,$city,$state,$zip"
      }

      for file in *srcfiles*
      do
      read_data $file
      done


      The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.



      We call that function once for each source file via the for loop.



      Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do



        echo ""$file","$first","$last","$addr","$city","$state","$zip""


      as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.



      Adjust the echo line to match the format you want.






      share|improve this answer



























        up vote
        0
        down vote



        accepted










        If you only have one record per file then this is a simple read loop.



        #!/bin/bash

        read_data()
        {
        local first last addr city state zip

        file=$1

        while read -r header data
        do
        case $header in
        FirstName:) first=$data ;;
        LastName:) last=$data ;;
        Address:) addr=$data ;;
        City:) city=$data ;;
        State:) state=$data ;;
        Zip:) zip=$data ;;
        *) echo Ignoring bad line $header $data >&2
        esac
        done < $file
        echo "$file,$first,$last,$addr,$city,$state,$zip"
        }

        for file in *srcfiles*
        do
        read_data $file
        done


        The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.



        We call that function once for each source file via the for loop.



        Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do



          echo ""$file","$first","$last","$addr","$city","$state","$zip""


        as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.



        Adjust the echo line to match the format you want.






        share|improve this answer

























          up vote
          0
          down vote



          accepted







          up vote
          0
          down vote



          accepted






          If you only have one record per file then this is a simple read loop.



          #!/bin/bash

          read_data()
          {
          local first last addr city state zip

          file=$1

          while read -r header data
          do
          case $header in
          FirstName:) first=$data ;;
          LastName:) last=$data ;;
          Address:) addr=$data ;;
          City:) city=$data ;;
          State:) state=$data ;;
          Zip:) zip=$data ;;
          *) echo Ignoring bad line $header $data >&2
          esac
          done < $file
          echo "$file,$first,$last,$addr,$city,$state,$zip"
          }

          for file in *srcfiles*
          do
          read_data $file
          done


          The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.



          We call that function once for each source file via the for loop.



          Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do



            echo ""$file","$first","$last","$addr","$city","$state","$zip""


          as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.



          Adjust the echo line to match the format you want.






          share|improve this answer














          If you only have one record per file then this is a simple read loop.



          #!/bin/bash

          read_data()
          {
          local first last addr city state zip

          file=$1

          while read -r header data
          do
          case $header in
          FirstName:) first=$data ;;
          LastName:) last=$data ;;
          Address:) addr=$data ;;
          City:) city=$data ;;
          State:) state=$data ;;
          Zip:) zip=$data ;;
          *) echo Ignoring bad line $header $data >&2
          esac
          done < $file
          echo "$file,$first,$last,$addr,$city,$state,$zip"
          }

          for file in *srcfiles*
          do
          read_data $file
          done


          The read_data function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.



          We call that function once for each source file via the for loop.



          Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do



            echo ""$file","$first","$last","$addr","$city","$state","$zip""


          as the output to enclose everything inside "..." layout. If there's any " in the data then this may cause the CSV to be malformed as well.



          Adjust the echo line to match the format you want.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Aug 8 '16 at 22:19

























          answered Aug 8 '16 at 22:09









          Stephen Harris

          22.8k24176




          22.8k24176
























              up vote
              0
              down vote













              Quick and dirty approach, may suit your requirements.



              grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'


              Example:



              grep . *
              f1.txt:FirstName: Mary
              f1.txt:LastName: Smith
              f1.txt:Address: 123 Anywhere St
              f1.txt:City: Nowhere
              f1.txt:State: TX
              f1.txt:Zip: 77777
              f2.txt:FirstName: Joe
              f2.txt:LastName: Bloggs
              f2.txt:Address: 444 Anywhere St
              f2.txt:City: Nowhere2
              f2.txt:State: TXA
              f2.txt:Zip: 77737
              grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
              f1.txt,Mary,Nowhere,77777
              f2.txt,Joe,Nowhere2,77737





              share|improve this answer

























                up vote
                0
                down vote













                Quick and dirty approach, may suit your requirements.



                grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'


                Example:



                grep . *
                f1.txt:FirstName: Mary
                f1.txt:LastName: Smith
                f1.txt:Address: 123 Anywhere St
                f1.txt:City: Nowhere
                f1.txt:State: TX
                f1.txt:Zip: 77777
                f2.txt:FirstName: Joe
                f2.txt:LastName: Bloggs
                f2.txt:Address: 444 Anywhere St
                f2.txt:City: Nowhere2
                f2.txt:State: TXA
                f2.txt:Zip: 77737
                grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
                f1.txt,Mary,Nowhere,77777
                f2.txt,Joe,Nowhere2,77737





                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Quick and dirty approach, may suit your requirements.



                  grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'


                  Example:



                  grep . *
                  f1.txt:FirstName: Mary
                  f1.txt:LastName: Smith
                  f1.txt:Address: 123 Anywhere St
                  f1.txt:City: Nowhere
                  f1.txt:State: TX
                  f1.txt:Zip: 77777
                  f2.txt:FirstName: Joe
                  f2.txt:LastName: Bloggs
                  f2.txt:Address: 444 Anywhere St
                  f2.txt:City: Nowhere2
                  f2.txt:State: TXA
                  f2.txt:Zip: 77737
                  grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
                  f1.txt,Mary,Nowhere,77777
                  f2.txt,Joe,Nowhere2,77737





                  share|improve this answer












                  Quick and dirty approach, may suit your requirements.



                  grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'


                  Example:



                  grep . *
                  f1.txt:FirstName: Mary
                  f1.txt:LastName: Smith
                  f1.txt:Address: 123 Anywhere St
                  f1.txt:City: Nowhere
                  f1.txt:State: TX
                  f1.txt:Zip: 77777
                  f2.txt:FirstName: Joe
                  f2.txt:LastName: Bloggs
                  f2.txt:Address: 444 Anywhere St
                  f2.txt:City: Nowhere2
                  f2.txt:State: TXA
                  f2.txt:Zip: 77737
                  grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
                  f1.txt,Mary,Nowhere,77777
                  f2.txt,Joe,Nowhere2,77737






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Aug 8 '16 at 22:10









                  steve

                  13.7k22452




                  13.7k22452






















                      up vote
                      0
                      down vote













                      If there's a single record per file and you have GNU awk, you could do



                      gawk -F': +' -vOFS=, '
                      BEGINFILE{delete rec}
                      {rec[$1] = $2}
                      ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
                      ' file1.txt file2.txt ...





                      share|improve this answer























                      • Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
                        – Stephen Harris
                        Aug 9 '16 at 1:15










                      • @StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
                        – steeldriver
                        Aug 9 '16 at 1:20















                      up vote
                      0
                      down vote













                      If there's a single record per file and you have GNU awk, you could do



                      gawk -F': +' -vOFS=, '
                      BEGINFILE{delete rec}
                      {rec[$1] = $2}
                      ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
                      ' file1.txt file2.txt ...





                      share|improve this answer























                      • Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
                        – Stephen Harris
                        Aug 9 '16 at 1:15










                      • @StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
                        – steeldriver
                        Aug 9 '16 at 1:20













                      up vote
                      0
                      down vote










                      up vote
                      0
                      down vote









                      If there's a single record per file and you have GNU awk, you could do



                      gawk -F': +' -vOFS=, '
                      BEGINFILE{delete rec}
                      {rec[$1] = $2}
                      ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
                      ' file1.txt file2.txt ...





                      share|improve this answer














                      If there's a single record per file and you have GNU awk, you could do



                      gawk -F': +' -vOFS=, '
                      BEGINFILE{delete rec}
                      {rec[$1] = $2}
                      ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
                      ' file1.txt file2.txt ...






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Aug 9 '16 at 1:19

























                      answered Aug 8 '16 at 22:29









                      steeldriver

                      33.6k34982




                      33.6k34982












                      • Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
                        – Stephen Harris
                        Aug 9 '16 at 1:15










                      • @StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
                        – steeldriver
                        Aug 9 '16 at 1:20


















                      • Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
                        – Stephen Harris
                        Aug 9 '16 at 1:15










                      • @StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
                        – steeldriver
                        Aug 9 '16 at 1:20
















                      Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
                      – Stephen Harris
                      Aug 9 '16 at 1:15




                      Doesn't the $2 mean that additional words will get lost? (eg "City: New York") would return "New".
                      – Stephen Harris
                      Aug 9 '16 at 1:15












                      @StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
                      – steeldriver
                      Aug 9 '16 at 1:20




                      @StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
                      – steeldriver
                      Aug 9 '16 at 1:20


















                       

                      draft saved


                      draft discarded



















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f302127%2fextract-data-from-text-files-to-csv%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      サソリ

                      広島県道265号伴広島線

                      Accessing regular linux commands in Huawei's Dopra Linux