extract data from text files to csv
up vote
1
down vote
favorite
I have a series of text files with a consistent format like:
FirstName: Mary
LastName: Smith
Address: 123 Anywhere St
City: Nowhere
State: TX
Zip: 77777
I need to extract several lines from these files and output them into a csv file in a format like:
<filename>,<FirstName>,<City>,<Zip>
I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.
text-processing
add a comment |
up vote
1
down vote
favorite
I have a series of text files with a consistent format like:
FirstName: Mary
LastName: Smith
Address: 123 Anywhere St
City: Nowhere
State: TX
Zip: 77777
I need to extract several lines from these files and output them into a csv file in a format like:
<filename>,<FirstName>,<City>,<Zip>
I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.
text-processing
1
What's your expected output
– tachomi
Aug 8 '16 at 21:14
Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a series of text files with a consistent format like:
FirstName: Mary
LastName: Smith
Address: 123 Anywhere St
City: Nowhere
State: TX
Zip: 77777
I need to extract several lines from these files and output them into a csv file in a format like:
<filename>,<FirstName>,<City>,<Zip>
I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.
text-processing
I have a series of text files with a consistent format like:
FirstName: Mary
LastName: Smith
Address: 123 Anywhere St
City: Nowhere
State: TX
Zip: 77777
I need to extract several lines from these files and output them into a csv file in a format like:
<filename>,<FirstName>,<City>,<Zip>
I can get the fields I want with a simple grep but I don't know how to get the output the way I need it.
text-processing
text-processing
edited 2 days ago
Rui F Ribeiro
38.2k1475123
38.2k1475123
asked Aug 8 '16 at 21:07
Kevin Pearce
82
82
1
What's your expected output
– tachomi
Aug 8 '16 at 21:14
Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22
add a comment |
1
What's your expected output
– tachomi
Aug 8 '16 at 21:14
Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22
1
1
What's your expected output
– tachomi
Aug 8 '16 at 21:14
What's your expected output
– tachomi
Aug 8 '16 at 21:14
Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22
Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22
add a comment |
3 Answers
3
active
oldest
votes
up vote
0
down vote
accepted
If you only have one record per file then this is a simple read loop.
#!/bin/bash
read_data()
{
local first last addr city state zip
file=$1
while read -r header data
do
case $header in
FirstName:) first=$data ;;
LastName:) last=$data ;;
Address:) addr=$data ;;
City:) city=$data ;;
State:) state=$data ;;
Zip:) zip=$data ;;
*) echo Ignoring bad line $header $data >&2
esac
done < $file
echo "$file,$first,$last,$addr,$city,$state,$zip"
}
for file in *srcfiles*
do
read_data $file
done
The read_data
function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.
We call that function once for each source file via the for
loop.
Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do
echo ""$file","$first","$last","$addr","$city","$state","$zip""
as the output to enclose everything inside "..."
layout. If there's any "
in the data then this may cause the CSV to be malformed as well.
Adjust the echo
line to match the format you want.
add a comment |
up vote
0
down vote
Quick and dirty approach, may suit your requirements.
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
Example:
grep . *
f1.txt:FirstName: Mary
f1.txt:LastName: Smith
f1.txt:Address: 123 Anywhere St
f1.txt:City: Nowhere
f1.txt:State: TX
f1.txt:Zip: 77777
f2.txt:FirstName: Joe
f2.txt:LastName: Bloggs
f2.txt:Address: 444 Anywhere St
f2.txt:City: Nowhere2
f2.txt:State: TXA
f2.txt:Zip: 77737
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
f1.txt,Mary,Nowhere,77777
f2.txt,Joe,Nowhere2,77737
add a comment |
up vote
0
down vote
If there's a single record per file and you have GNU awk, you could do
gawk -F': +' -vOFS=, '
BEGINFILE{delete rec}
{rec[$1] = $2}
ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
' file1.txt file2.txt ...
Doesn't the$2
mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15
@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
If you only have one record per file then this is a simple read loop.
#!/bin/bash
read_data()
{
local first last addr city state zip
file=$1
while read -r header data
do
case $header in
FirstName:) first=$data ;;
LastName:) last=$data ;;
Address:) addr=$data ;;
City:) city=$data ;;
State:) state=$data ;;
Zip:) zip=$data ;;
*) echo Ignoring bad line $header $data >&2
esac
done < $file
echo "$file,$first,$last,$addr,$city,$state,$zip"
}
for file in *srcfiles*
do
read_data $file
done
The read_data
function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.
We call that function once for each source file via the for
loop.
Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do
echo ""$file","$first","$last","$addr","$city","$state","$zip""
as the output to enclose everything inside "..."
layout. If there's any "
in the data then this may cause the CSV to be malformed as well.
Adjust the echo
line to match the format you want.
add a comment |
up vote
0
down vote
accepted
If you only have one record per file then this is a simple read loop.
#!/bin/bash
read_data()
{
local first last addr city state zip
file=$1
while read -r header data
do
case $header in
FirstName:) first=$data ;;
LastName:) last=$data ;;
Address:) addr=$data ;;
City:) city=$data ;;
State:) state=$data ;;
Zip:) zip=$data ;;
*) echo Ignoring bad line $header $data >&2
esac
done < $file
echo "$file,$first,$last,$addr,$city,$state,$zip"
}
for file in *srcfiles*
do
read_data $file
done
The read_data
function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.
We call that function once for each source file via the for
loop.
Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do
echo ""$file","$first","$last","$addr","$city","$state","$zip""
as the output to enclose everything inside "..."
layout. If there's any "
in the data then this may cause the CSV to be malformed as well.
Adjust the echo
line to match the format you want.
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
If you only have one record per file then this is a simple read loop.
#!/bin/bash
read_data()
{
local first last addr city state zip
file=$1
while read -r header data
do
case $header in
FirstName:) first=$data ;;
LastName:) last=$data ;;
Address:) addr=$data ;;
City:) city=$data ;;
State:) state=$data ;;
Zip:) zip=$data ;;
*) echo Ignoring bad line $header $data >&2
esac
done < $file
echo "$file,$first,$last,$addr,$city,$state,$zip"
}
for file in *srcfiles*
do
read_data $file
done
The read_data
function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.
We call that function once for each source file via the for
loop.
Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do
echo ""$file","$first","$last","$addr","$city","$state","$zip""
as the output to enclose everything inside "..."
layout. If there's any "
in the data then this may cause the CSV to be malformed as well.
Adjust the echo
line to match the format you want.
If you only have one record per file then this is a simple read loop.
#!/bin/bash
read_data()
{
local first last addr city state zip
file=$1
while read -r header data
do
case $header in
FirstName:) first=$data ;;
LastName:) last=$data ;;
Address:) addr=$data ;;
City:) city=$data ;;
State:) state=$data ;;
Zip:) zip=$data ;;
*) echo Ignoring bad line $header $data >&2
esac
done < $file
echo "$file,$first,$last,$addr,$city,$state,$zip"
}
for file in *srcfiles*
do
read_data $file
done
The read_data
function reads each line and splits up the line into a "header" and a "data". Once we get to the end of the file we just print out the results.
We call that function once for each source file via the for
loop.
Note some potential gotcha's: If there are commas in the data then this will break things, so you might want to do
echo ""$file","$first","$last","$addr","$city","$state","$zip""
as the output to enclose everything inside "..."
layout. If there's any "
in the data then this may cause the CSV to be malformed as well.
Adjust the echo
line to match the format you want.
edited Aug 8 '16 at 22:19
answered Aug 8 '16 at 22:09
Stephen Harris
22.8k24176
22.8k24176
add a comment |
add a comment |
up vote
0
down vote
Quick and dirty approach, may suit your requirements.
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
Example:
grep . *
f1.txt:FirstName: Mary
f1.txt:LastName: Smith
f1.txt:Address: 123 Anywhere St
f1.txt:City: Nowhere
f1.txt:State: TX
f1.txt:Zip: 77777
f2.txt:FirstName: Joe
f2.txt:LastName: Bloggs
f2.txt:Address: 444 Anywhere St
f2.txt:City: Nowhere2
f2.txt:State: TXA
f2.txt:Zip: 77737
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
f1.txt,Mary,Nowhere,77777
f2.txt,Joe,Nowhere2,77737
add a comment |
up vote
0
down vote
Quick and dirty approach, may suit your requirements.
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
Example:
grep . *
f1.txt:FirstName: Mary
f1.txt:LastName: Smith
f1.txt:Address: 123 Anywhere St
f1.txt:City: Nowhere
f1.txt:State: TX
f1.txt:Zip: 77777
f2.txt:FirstName: Joe
f2.txt:LastName: Bloggs
f2.txt:Address: 444 Anywhere St
f2.txt:City: Nowhere2
f2.txt:State: TXA
f2.txt:Zip: 77737
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
f1.txt,Mary,Nowhere,77777
f2.txt,Joe,Nowhere2,77737
add a comment |
up vote
0
down vote
up vote
0
down vote
Quick and dirty approach, may suit your requirements.
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
Example:
grep . *
f1.txt:FirstName: Mary
f1.txt:LastName: Smith
f1.txt:Address: 123 Anywhere St
f1.txt:City: Nowhere
f1.txt:State: TX
f1.txt:Zip: 77777
f2.txt:FirstName: Joe
f2.txt:LastName: Bloggs
f2.txt:Address: 444 Anywhere St
f2.txt:City: Nowhere2
f2.txt:State: TXA
f2.txt:Zip: 77737
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
f1.txt,Mary,Nowhere,77777
f2.txt,Joe,Nowhere2,77737
Quick and dirty approach, may suit your requirements.
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
Example:
grep . *
f1.txt:FirstName: Mary
f1.txt:LastName: Smith
f1.txt:Address: 123 Anywhere St
f1.txt:City: Nowhere
f1.txt:State: TX
f1.txt:Zip: 77777
f2.txt:FirstName: Joe
f2.txt:LastName: Bloggs
f2.txt:Address: 444 Anywhere St
f2.txt:City: Nowhere2
f2.txt:State: TXA
f2.txt:Zip: 77737
grep . *|perl -ne 'if(/FirstName: (.*)/){$f=$1}if(/City: (.*)/){$c=$1}if(/^(.*):Zip: (.*)/){print "$1,$f,$c,$2n"}'
f1.txt,Mary,Nowhere,77777
f2.txt,Joe,Nowhere2,77737
answered Aug 8 '16 at 22:10
steve
13.7k22452
13.7k22452
add a comment |
add a comment |
up vote
0
down vote
If there's a single record per file and you have GNU awk, you could do
gawk -F': +' -vOFS=, '
BEGINFILE{delete rec}
{rec[$1] = $2}
ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
' file1.txt file2.txt ...
Doesn't the$2
mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15
@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20
add a comment |
up vote
0
down vote
If there's a single record per file and you have GNU awk, you could do
gawk -F': +' -vOFS=, '
BEGINFILE{delete rec}
{rec[$1] = $2}
ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
' file1.txt file2.txt ...
Doesn't the$2
mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15
@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20
add a comment |
up vote
0
down vote
up vote
0
down vote
If there's a single record per file and you have GNU awk, you could do
gawk -F': +' -vOFS=, '
BEGINFILE{delete rec}
{rec[$1] = $2}
ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
' file1.txt file2.txt ...
If there's a single record per file and you have GNU awk, you could do
gawk -F': +' -vOFS=, '
BEGINFILE{delete rec}
{rec[$1] = $2}
ENDFILE{print FILENAME, rec["FirstName"], rec["City"], rec["Zip"]}
' file1.txt file2.txt ...
edited Aug 9 '16 at 1:19
answered Aug 8 '16 at 22:29
steeldriver
33.6k34982
33.6k34982
Doesn't the$2
mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15
@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20
add a comment |
Doesn't the$2
mean that additional words will get lost? (eg "City: New York") would return "New".
– Stephen Harris
Aug 9 '16 at 1:15
@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20
Doesn't the
$2
mean that additional words will get lost? (eg "City: New York") would return "New".– Stephen Harris
Aug 9 '16 at 1:15
Doesn't the
$2
mean that additional words will get lost? (eg "City: New York") would return "New".– Stephen Harris
Aug 9 '16 at 1:15
@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20
@StephenHarris Doh! thanks - I have adjusted the field separator hopefully to fix that
– steeldriver
Aug 9 '16 at 1:20
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f302127%2fextract-data-from-text-files-to-csv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What's your expected output
– tachomi
Aug 8 '16 at 21:14
Sounds. like a simple Perl task. What is between Zip and the next First Name?
– waltinator
Aug 8 '16 at 21:22