Replace comma delimited file with pipe, but not remove comma or quotes etc within text qualifier field,...
up vote
0
down vote
favorite
My file is a comma delimited file and text qualifier is ~, but my requirement is to find and replace comma delimited file with |(pipe) delimited file and remove text qualifier ~ with nothing, however,
I should not remove quote or double quotes or any special character within the data present in text qualifier. eg: ~abc",~ I need it as abc",
Below is the content of my source file and the way i expect the output or manipulated file to be.
Source file:
364034,2015652205,26722,2015,4,~C25753-4~,~TC25753,~,~2WD Double Cab 144.2" SLT,~,~Y~,40506.16,43555.00,1095.00,~043,005,006,007,003,008,016,041,012,029,068,027,028,033~,3,~2WD Double Cab 144.2"~,~SLT~,6,4,~N~,~S~,~N~,~S~,~N~,~N~,~N~,~~,~ ~,~Confirmed~,~w2015k65m22t5~,~Sierra 2500HD~,~Double Cab Standard Box 2-Wheel Drive SLT~,~Rear Wheel Drive~,~Extended Cab Pickup - Standard Bed~
After cleansing I need the file like this:
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
I tried sed -i -e with multiple options but the output is not 100% right.
I tried following but it's not giving right result that i wanted
sed -i -e 's/,~/|/g' file_name
sed -i -e 's/~,/|/g' file_name
sed -i -e 's/~//g' file_name
sed -i -e 's/([0-9],[0-9])/|/g' file_name
sed -i -e 's/r//g' file_name
shell-script sed regular-expression replace
add a comment |
up vote
0
down vote
favorite
My file is a comma delimited file and text qualifier is ~, but my requirement is to find and replace comma delimited file with |(pipe) delimited file and remove text qualifier ~ with nothing, however,
I should not remove quote or double quotes or any special character within the data present in text qualifier. eg: ~abc",~ I need it as abc",
Below is the content of my source file and the way i expect the output or manipulated file to be.
Source file:
364034,2015652205,26722,2015,4,~C25753-4~,~TC25753,~,~2WD Double Cab 144.2" SLT,~,~Y~,40506.16,43555.00,1095.00,~043,005,006,007,003,008,016,041,012,029,068,027,028,033~,3,~2WD Double Cab 144.2"~,~SLT~,6,4,~N~,~S~,~N~,~S~,~N~,~N~,~N~,~~,~ ~,~Confirmed~,~w2015k65m22t5~,~Sierra 2500HD~,~Double Cab Standard Box 2-Wheel Drive SLT~,~Rear Wheel Drive~,~Extended Cab Pickup - Standard Bed~
After cleansing I need the file like this:
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
I tried sed -i -e with multiple options but the output is not 100% right.
I tried following but it's not giving right result that i wanted
sed -i -e 's/,~/|/g' file_name
sed -i -e 's/~,/|/g' file_name
sed -i -e 's/~//g' file_name
sed -i -e 's/([0-9],[0-9])/|/g' file_name
sed -i -e 's/r//g' file_name
shell-script sed regular-expression replace
format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06
1
How abouttr -d '~' <file | tr -s ',' '|'
, or sed equivalentsed 's/~//g;s/,+/|/g' file
. Is this what you are looking for?
– jimmij
May 26 '17 at 21:13
Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
My file is a comma delimited file and text qualifier is ~, but my requirement is to find and replace comma delimited file with |(pipe) delimited file and remove text qualifier ~ with nothing, however,
I should not remove quote or double quotes or any special character within the data present in text qualifier. eg: ~abc",~ I need it as abc",
Below is the content of my source file and the way i expect the output or manipulated file to be.
Source file:
364034,2015652205,26722,2015,4,~C25753-4~,~TC25753,~,~2WD Double Cab 144.2" SLT,~,~Y~,40506.16,43555.00,1095.00,~043,005,006,007,003,008,016,041,012,029,068,027,028,033~,3,~2WD Double Cab 144.2"~,~SLT~,6,4,~N~,~S~,~N~,~S~,~N~,~N~,~N~,~~,~ ~,~Confirmed~,~w2015k65m22t5~,~Sierra 2500HD~,~Double Cab Standard Box 2-Wheel Drive SLT~,~Rear Wheel Drive~,~Extended Cab Pickup - Standard Bed~
After cleansing I need the file like this:
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
I tried sed -i -e with multiple options but the output is not 100% right.
I tried following but it's not giving right result that i wanted
sed -i -e 's/,~/|/g' file_name
sed -i -e 's/~,/|/g' file_name
sed -i -e 's/~//g' file_name
sed -i -e 's/([0-9],[0-9])/|/g' file_name
sed -i -e 's/r//g' file_name
shell-script sed regular-expression replace
My file is a comma delimited file and text qualifier is ~, but my requirement is to find and replace comma delimited file with |(pipe) delimited file and remove text qualifier ~ with nothing, however,
I should not remove quote or double quotes or any special character within the data present in text qualifier. eg: ~abc",~ I need it as abc",
Below is the content of my source file and the way i expect the output or manipulated file to be.
Source file:
364034,2015652205,26722,2015,4,~C25753-4~,~TC25753,~,~2WD Double Cab 144.2" SLT,~,~Y~,40506.16,43555.00,1095.00,~043,005,006,007,003,008,016,041,012,029,068,027,028,033~,3,~2WD Double Cab 144.2"~,~SLT~,6,4,~N~,~S~,~N~,~S~,~N~,~N~,~N~,~~,~ ~,~Confirmed~,~w2015k65m22t5~,~Sierra 2500HD~,~Double Cab Standard Box 2-Wheel Drive SLT~,~Rear Wheel Drive~,~Extended Cab Pickup - Standard Bed~
After cleansing I need the file like this:
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
I tried sed -i -e with multiple options but the output is not 100% right.
I tried following but it's not giving right result that i wanted
sed -i -e 's/,~/|/g' file_name
sed -i -e 's/~,/|/g' file_name
sed -i -e 's/~//g' file_name
sed -i -e 's/([0-9],[0-9])/|/g' file_name
sed -i -e 's/r//g' file_name
shell-script sed regular-expression replace
shell-script sed regular-expression replace
edited 2 days ago
Rui F Ribeiro
38.2k1475123
38.2k1475123
asked May 26 '17 at 20:40
Sudheer
1
1
format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06
1
How abouttr -d '~' <file | tr -s ',' '|'
, or sed equivalentsed 's/~//g;s/,+/|/g' file
. Is this what you are looking for?
– jimmij
May 26 '17 at 21:13
Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56
add a comment |
format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06
1
How abouttr -d '~' <file | tr -s ',' '|'
, or sed equivalentsed 's/~//g;s/,+/|/g' file
. Is this what you are looking for?
– jimmij
May 26 '17 at 21:13
Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56
format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06
format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06
1
1
How about
tr -d '~' <file | tr -s ',' '|'
, or sed equivalent sed 's/~//g;s/,+/|/g' file
. Is this what you are looking for?– jimmij
May 26 '17 at 21:13
How about
tr -d '~' <file | tr -s ',' '|'
, or sed equivalent sed 's/~//g;s/,+/|/g' file
. Is this what you are looking for?– jimmij
May 26 '17 at 21:13
Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56
Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56
add a comment |
2 Answers
2
active
oldest
votes
up vote
2
down vote
ESC=$(printf '33')
RED="${ESC}[0;31m"
NC="${ESC}[0m"
sed -e '
/./!b
/[^[:space:]]/!b
s/.*/
&,/
:loop
h
s/(n),/|1/; # An empty field
s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/; # +-.NNN
s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/; # +-NNN.MMM +-NNN. +-NNN
s/(n)~([0-9][0-9]*),/2|1/; # ~NNN
s/(n)([0-9][0-9]*)~,/2|1/; # NNN~
s/(n)~([^~]*)~,/2|1/; # ~...~
x;G
/^(.*)n1$/{
g;'"s/n([^,]*)/${RED}1${NC}/"'
i
***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.
Cause of error: What this means is that this particular field is not
Fix: You should add to the sed code in the :loop label to
digest the able to be processed by the sed code as it stands.
The record with the offending field shown colored red:
q
}
g; # all clear: recover and carry on...
/n$/!bloop
s/..$//
' csv.data
Working
- We base the solution on the various types of fields.
- Skip empty or blank lines.
- Append a "," to simplify regex used, we'll take it away at the end.
- To set the ball rolling we place a marker,
n
, at the beginning of line. This marker will travel from left to right, jumping a field processed a time. - The action begins in the
do-while
loop, in the body of which, we process a field at a time. The field beginning is signaled byn
and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left ofn
and replace the,
by a|
. - The looping stops when the marker
n
hits the end of line/n$/
and we then take away the marker as well as the dummy,
we had placed in the beginning.
Results
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44
Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46
@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21
add a comment |
up vote
1
down vote
I'd suggest using a dedicated CSV parser such as perl's Text::CSV
perl -MText::CSV -lne '
BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) }
print join "|", $csv->fields() if $csv->parse($_)
' file_name
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
ESC=$(printf '33')
RED="${ESC}[0;31m"
NC="${ESC}[0m"
sed -e '
/./!b
/[^[:space:]]/!b
s/.*/
&,/
:loop
h
s/(n),/|1/; # An empty field
s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/; # +-.NNN
s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/; # +-NNN.MMM +-NNN. +-NNN
s/(n)~([0-9][0-9]*),/2|1/; # ~NNN
s/(n)([0-9][0-9]*)~,/2|1/; # NNN~
s/(n)~([^~]*)~,/2|1/; # ~...~
x;G
/^(.*)n1$/{
g;'"s/n([^,]*)/${RED}1${NC}/"'
i
***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.
Cause of error: What this means is that this particular field is not
Fix: You should add to the sed code in the :loop label to
digest the able to be processed by the sed code as it stands.
The record with the offending field shown colored red:
q
}
g; # all clear: recover and carry on...
/n$/!bloop
s/..$//
' csv.data
Working
- We base the solution on the various types of fields.
- Skip empty or blank lines.
- Append a "," to simplify regex used, we'll take it away at the end.
- To set the ball rolling we place a marker,
n
, at the beginning of line. This marker will travel from left to right, jumping a field processed a time. - The action begins in the
do-while
loop, in the body of which, we process a field at a time. The field beginning is signaled byn
and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left ofn
and replace the,
by a|
. - The looping stops when the marker
n
hits the end of line/n$/
and we then take away the marker as well as the dummy,
we had placed in the beginning.
Results
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44
Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46
@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21
add a comment |
up vote
2
down vote
ESC=$(printf '33')
RED="${ESC}[0;31m"
NC="${ESC}[0m"
sed -e '
/./!b
/[^[:space:]]/!b
s/.*/
&,/
:loop
h
s/(n),/|1/; # An empty field
s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/; # +-.NNN
s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/; # +-NNN.MMM +-NNN. +-NNN
s/(n)~([0-9][0-9]*),/2|1/; # ~NNN
s/(n)([0-9][0-9]*)~,/2|1/; # NNN~
s/(n)~([^~]*)~,/2|1/; # ~...~
x;G
/^(.*)n1$/{
g;'"s/n([^,]*)/${RED}1${NC}/"'
i
***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.
Cause of error: What this means is that this particular field is not
Fix: You should add to the sed code in the :loop label to
digest the able to be processed by the sed code as it stands.
The record with the offending field shown colored red:
q
}
g; # all clear: recover and carry on...
/n$/!bloop
s/..$//
' csv.data
Working
- We base the solution on the various types of fields.
- Skip empty or blank lines.
- Append a "," to simplify regex used, we'll take it away at the end.
- To set the ball rolling we place a marker,
n
, at the beginning of line. This marker will travel from left to right, jumping a field processed a time. - The action begins in the
do-while
loop, in the body of which, we process a field at a time. The field beginning is signaled byn
and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left ofn
and replace the,
by a|
. - The looping stops when the marker
n
hits the end of line/n$/
and we then take away the marker as well as the dummy,
we had placed in the beginning.
Results
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44
Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46
@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21
add a comment |
up vote
2
down vote
up vote
2
down vote
ESC=$(printf '33')
RED="${ESC}[0;31m"
NC="${ESC}[0m"
sed -e '
/./!b
/[^[:space:]]/!b
s/.*/
&,/
:loop
h
s/(n),/|1/; # An empty field
s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/; # +-.NNN
s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/; # +-NNN.MMM +-NNN. +-NNN
s/(n)~([0-9][0-9]*),/2|1/; # ~NNN
s/(n)([0-9][0-9]*)~,/2|1/; # NNN~
s/(n)~([^~]*)~,/2|1/; # ~...~
x;G
/^(.*)n1$/{
g;'"s/n([^,]*)/${RED}1${NC}/"'
i
***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.
Cause of error: What this means is that this particular field is not
Fix: You should add to the sed code in the :loop label to
digest the able to be processed by the sed code as it stands.
The record with the offending field shown colored red:
q
}
g; # all clear: recover and carry on...
/n$/!bloop
s/..$//
' csv.data
Working
- We base the solution on the various types of fields.
- Skip empty or blank lines.
- Append a "," to simplify regex used, we'll take it away at the end.
- To set the ball rolling we place a marker,
n
, at the beginning of line. This marker will travel from left to right, jumping a field processed a time. - The action begins in the
do-while
loop, in the body of which, we process a field at a time. The field beginning is signaled byn
and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left ofn
and replace the,
by a|
. - The looping stops when the marker
n
hits the end of line/n$/
and we then take away the marker as well as the dummy,
we had placed in the beginning.
Results
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
ESC=$(printf '33')
RED="${ESC}[0;31m"
NC="${ESC}[0m"
sed -e '
/./!b
/[^[:space:]]/!b
s/.*/
&,/
:loop
h
s/(n),/|1/; # An empty field
s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/; # +-.NNN
s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/; # +-NNN.MMM +-NNN. +-NNN
s/(n)~([0-9][0-9]*),/2|1/; # ~NNN
s/(n)([0-9][0-9]*)~,/2|1/; # NNN~
s/(n)~([^~]*)~,/2|1/; # ~...~
x;G
/^(.*)n1$/{
g;'"s/n([^,]*)/${RED}1${NC}/"'
i
***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.
Cause of error: What this means is that this particular field is not
Fix: You should add to the sed code in the :loop label to
digest the able to be processed by the sed code as it stands.
The record with the offending field shown colored red:
q
}
g; # all clear: recover and carry on...
/n$/!bloop
s/..$//
' csv.data
Working
- We base the solution on the various types of fields.
- Skip empty or blank lines.
- Append a "," to simplify regex used, we'll take it away at the end.
- To set the ball rolling we place a marker,
n
, at the beginning of line. This marker will travel from left to right, jumping a field processed a time. - The action begins in the
do-while
loop, in the body of which, we process a field at a time. The field beginning is signaled byn
and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left ofn
and replace the,
by a|
. - The looping stops when the marker
n
hits the end of line/n$/
and we then take away the marker as well as the dummy,
we had placed in the beginning.
Results
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
edited May 31 '17 at 14:47
answered May 27 '17 at 9:49
user218374
Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44
Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46
@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21
add a comment |
Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44
Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46
@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21
Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44
Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44
Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46
Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46
@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21
@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21
add a comment |
up vote
1
down vote
I'd suggest using a dedicated CSV parser such as perl's Text::CSV
perl -MText::CSV -lne '
BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) }
print join "|", $csv->fields() if $csv->parse($_)
' file_name
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52
add a comment |
up vote
1
down vote
I'd suggest using a dedicated CSV parser such as perl's Text::CSV
perl -MText::CSV -lne '
BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) }
print join "|", $csv->fields() if $csv->parse($_)
' file_name
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52
add a comment |
up vote
1
down vote
up vote
1
down vote
I'd suggest using a dedicated CSV parser such as perl's Text::CSV
perl -MText::CSV -lne '
BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) }
print join "|", $csv->fields() if $csv->parse($_)
' file_name
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
I'd suggest using a dedicated CSV parser such as perl's Text::CSV
perl -MText::CSV -lne '
BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) }
print join "|", $csv->fields() if $csv->parse($_)
' file_name
364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed
answered May 26 '17 at 22:27
steeldriver
33.6k34982
33.6k34982
Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52
add a comment |
Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52
Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52
Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f367488%2freplace-comma-delimited-file-with-pipe-but-not-remove-comma-or-quotes-etc-withi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06
1
How about
tr -d '~' <file | tr -s ',' '|'
, or sed equivalentsed 's/~//g;s/,+/|/g' file
. Is this what you are looking for?– jimmij
May 26 '17 at 21:13
Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56