Replace comma delimited file with pipe, but not remove comma or quotes etc within text qualifier field,...

up vote
0
down vote

favorite

My file is a comma delimited file and text qualifier is ~, but my requirement is to find and replace comma delimited file with |(pipe) delimited file and remove text qualifier ~ with nothing, however,
I should not remove quote or double quotes or any special character within the data present in text qualifier. eg: ~abc",~ I need it as abc",

Below is the content of my source file and the way i expect the output or manipulated file to be.

Source file:

364034,2015652205,26722,2015,4,~C25753-4~,~TC25753,~,~2WD Double Cab 144.2" SLT,~,~Y~,40506.16,43555.00,1095.00,~043,005,006,007,003,008,016,041,012,029,068,027,028,033~,3,~2WD Double Cab 144.2"~,~SLT~,6,4,~N~,~S~,~N~,~S~,~N~,~N~,~N~,~~,~ ~,~Confirmed~,~w2015k65m22t5~,~Sierra 2500HD~,~Double Cab Standard Box 2-Wheel Drive SLT~,~Rear Wheel Drive~,~Extended Cab Pickup - Standard Bed~

After cleansing I need the file like this:

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

I tried sed -i -e with multiple options but the output is not 100% right.

I tried following but it's not giving right result that i wanted

sed -i -e 's/,~/|/g' file_name

sed -i -e 's/~,/|/g' file_name

sed -i -e 's/~//g' file_name

sed -i -e 's/([0-9],[0-9])/|/g' file_name

sed -i -e 's/r//g' file_name

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked May 26 '17 at 20:40

Sudheer

format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06

1

How about tr -d '~' <file | tr -s ',' '|', or sed equivalent sed 's/~//g;s/,+/|/g' file. Is this what you are looking for?
– jimmij
May 26 '17 at 21:13

Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56

add a comment |

up vote
0
down vote

favorite

Below is the content of my source file and the way i expect the output or manipulated file to be.

Source file:

364034,2015652205,26722,2015,4,~C25753-4~,~TC25753,~,~2WD Double Cab 144.2" SLT,~,~Y~,40506.16,43555.00,1095.00,~043,005,006,007,003,008,016,041,012,029,068,027,028,033~,3,~2WD Double Cab 144.2"~,~SLT~,6,4,~N~,~S~,~N~,~S~,~N~,~N~,~N~,~~,~ ~,~Confirmed~,~w2015k65m22t5~,~Sierra 2500HD~,~Double Cab Standard Box 2-Wheel Drive SLT~,~Rear Wheel Drive~,~Extended Cab Pickup - Standard Bed~

After cleansing I need the file like this:

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

I tried sed -i -e with multiple options but the output is not 100% right.

I tried following but it's not giving right result that i wanted

sed -i -e 's/,~/|/g' file_name

sed -i -e 's/~,/|/g' file_name

sed -i -e 's/~//g' file_name

sed -i -e 's/([0-9],[0-9])/|/g' file_name

sed -i -e 's/r//g' file_name

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked May 26 '17 at 20:40

Sudheer

format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06

1

How about tr -d '~' <file | tr -s ',' '|', or sed equivalent sed 's/~//g;s/,+/|/g' file. Is this what you are looking for?
– jimmij
May 26 '17 at 21:13

Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56

add a comment |

up vote
0
down vote

favorite

Below is the content of my source file and the way i expect the output or manipulated file to be.

Source file:

364034,2015652205,26722,2015,4,~C25753-4~,~TC25753,~,~2WD Double Cab 144.2" SLT,~,~Y~,40506.16,43555.00,1095.00,~043,005,006,007,003,008,016,041,012,029,068,027,028,033~,3,~2WD Double Cab 144.2"~,~SLT~,6,4,~N~,~S~,~N~,~S~,~N~,~N~,~N~,~~,~ ~,~Confirmed~,~w2015k65m22t5~,~Sierra 2500HD~,~Double Cab Standard Box 2-Wheel Drive SLT~,~Rear Wheel Drive~,~Extended Cab Pickup - Standard Bed~

After cleansing I need the file like this:

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

I tried sed -i -e with multiple options but the output is not 100% right.

I tried following but it's not giving right result that i wanted

sed -i -e 's/,~/|/g' file_name

sed -i -e 's/~,/|/g' file_name

sed -i -e 's/~//g' file_name

sed -i -e 's/([0-9],[0-9])/|/g' file_name

sed -i -e 's/r//g' file_name

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked May 26 '17 at 20:40

Sudheer

Below is the content of my source file and the way i expect the output or manipulated file to be.

Source file:

364034,2015652205,26722,2015,4,~C25753-4~,~TC25753,~,~2WD Double Cab 144.2" SLT,~,~Y~,40506.16,43555.00,1095.00,~043,005,006,007,003,008,016,041,012,029,068,027,028,033~,3,~2WD Double Cab 144.2"~,~SLT~,6,4,~N~,~S~,~N~,~S~,~N~,~N~,~N~,~~,~ ~,~Confirmed~,~w2015k65m22t5~,~Sierra 2500HD~,~Double Cab Standard Box 2-Wheel Drive SLT~,~Rear Wheel Drive~,~Extended Cab Pickup - Standard Bed~

After cleansing I need the file like this:

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

I tried sed -i -e with multiple options but the output is not 100% right.

I tried following but it's not giving right result that i wanted

sed -i -e 's/,~/|/g' file_name

sed -i -e 's/~,/|/g' file_name

sed -i -e 's/~//g' file_name

sed -i -e 's/([0-9],[0-9])/|/g' file_name

sed -i -e 's/r//g' file_name

shell-script sed regular-expression replace

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked May 26 '17 at 20:40

Sudheer

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked May 26 '17 at 20:40

Sudheer

edited 2 days ago

Rui F Ribeiro

38.2k1475123

edited 2 days ago

Rui F Ribeiro

38.2k1475123

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked May 26 '17 at 20:40

Sudheer

asked May 26 '17 at 20:40

Sudheer

asked May 26 '17 at 20:40

Sudheer

format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06

1

How about tr -d '~' <file | tr -s ',' '|', or sed equivalent sed 's/~//g;s/,+/|/g' file. Is this what you are looking for?
– jimmij
May 26 '17 at 21:13

Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56

add a comment |

format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06

1

How about tr -d '~' <file | tr -s ',' '|', or sed equivalent sed 's/~//g;s/,+/|/g' file. Is this what you are looking for?
– jimmij
May 26 '17 at 21:13

Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56

format your question to make it readable
– RomanPerekhrest
May 26 '17 at 21:06

How about tr -d '~' <file | tr -s ',' '|', or sed equivalent sed 's/~//g;s/,+/|/g' file. Is this what you are looking for?
– jimmij
May 26 '17 at 21:13

Hi There, I am looking for basically replacing , with | ( but not when these are enclosed within ~abc,#~) then finally remove text qualifier ~ Input file is like: 1234,4826,~Ter1337,#~,~~,~46437uyy"@~ Desired manipulated input file 1234|4826|Ter1337,#||46437uyy"@
– Sudheer
May 30 '17 at 19:56

add a comment |

2 Answers
2

active

oldest

votes

up vote
2
down vote

ESC=$(printf '33')

RED="${ESC}[0;31m"

 NC="${ESC}[0m"



sed -e '

   /./!b

   /[^[:space:]]/!b



   s/.*/

&,/



   :loop

      h

      s/(n),/|1/;                                                  # An empty field

      s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/;                   # +-.NNN

      s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/;  # +-NNN.MMM +-NNN. +-NNN

      s/(n)~([0-9][0-9]*),/2|1/;                                # ~NNN

      s/(n)([0-9][0-9]*)~,/2|1/;                                # NNN~

      s/(n)~([^~]*)~,/2|1/;                                     # ~...~

      x;G

      /^(.*)n1$/{

         g;'"s/n([^,]*)/${RED}1${NC}/"'

         i

***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.



Cause of error: What this means is that this particular field is not 



           Fix: You should add to the sed code in the :loop label to 

                digest the able to be processed by the sed code as it stands.



The record with the offending field shown colored red:



         q

      }

      g; # all clear: recover and carry on...

   /n$/!bloop



   s/..$//

' csv.data

Working

We base the solution on the various types of fields.

Skip empty or blank lines.

Append a "," to simplify regex used, we'll take it away at the end.

To set the ball rolling we place a marker, n, at the beginning of line. This marker will travel from left to right, jumping a field processed a time.

The action begins in the do-while loop, in the body of which, we process a field at a time. The field beginning is signaled by n and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left of n and replace the , by a |.

The looping stops when the marker n hits the end of line /n$/ and we then take away the marker as well as the dummy , we had placed in the beginning.

Results

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

edited May 31 '17 at 14:47

answered May 27 '17 at 9:49

user218374

Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44

Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46

@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21

add a comment |

up vote
1
down vote

I'd suggest using a dedicated CSV parser such as perl's Text::CSV

perl -MText::CSV -lne '

    BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) } 

    print join "|", $csv->fields() if $csv->parse($_)

  ' file_name

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

answered May 26 '17 at 22:27

steeldriver

33.6k34982

Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f367488%2freplace-comma-delimited-file-with-pipe-but-not-remove-comma-or-quotes-etc-withi%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

ESC=$(printf '33')

RED="${ESC}[0;31m"

 NC="${ESC}[0m"



sed -e '

   /./!b

   /[^[:space:]]/!b



   s/.*/

&,/



   :loop

      h

      s/(n),/|1/;                                                  # An empty field

      s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/;                   # +-.NNN

      s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/;  # +-NNN.MMM +-NNN. +-NNN

      s/(n)~([0-9][0-9]*),/2|1/;                                # ~NNN

      s/(n)([0-9][0-9]*)~,/2|1/;                                # NNN~

      s/(n)~([^~]*)~,/2|1/;                                     # ~...~

      x;G

      /^(.*)n1$/{

         g;'"s/n([^,]*)/${RED}1${NC}/"'

         i

***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.



Cause of error: What this means is that this particular field is not 



           Fix: You should add to the sed code in the :loop label to 

                digest the able to be processed by the sed code as it stands.



The record with the offending field shown colored red:



         q

      }

      g; # all clear: recover and carry on...

   /n$/!bloop



   s/..$//

' csv.data

Working

We base the solution on the various types of fields.

Skip empty or blank lines.

Append a "," to simplify regex used, we'll take it away at the end.

To set the ball rolling we place a marker, n, at the beginning of line. This marker will travel from left to right, jumping a field processed a time.

The action begins in the do-while loop, in the body of which, we process a field at a time. The field beginning is signaled by n and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left of n and replace the , by a |.

The looping stops when the marker n hits the end of line /n$/ and we then take away the marker as well as the dummy , we had placed in the beginning.

Results

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

edited May 31 '17 at 14:47

answered May 27 '17 at 9:49

user218374

Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44

Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46

@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21

add a comment |

up vote
2
down vote

ESC=$(printf '33')

RED="${ESC}[0;31m"

 NC="${ESC}[0m"



sed -e '

   /./!b

   /[^[:space:]]/!b



   s/.*/

&,/



   :loop

      h

      s/(n),/|1/;                                                  # An empty field

      s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/;                   # +-.NNN

      s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/;  # +-NNN.MMM +-NNN. +-NNN

      s/(n)~([0-9][0-9]*),/2|1/;                                # ~NNN

      s/(n)([0-9][0-9]*)~,/2|1/;                                # NNN~

      s/(n)~([^~]*)~,/2|1/;                                     # ~...~

      x;G

      /^(.*)n1$/{

         g;'"s/n([^,]*)/${RED}1${NC}/"'

         i

***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.



Cause of error: What this means is that this particular field is not 



           Fix: You should add to the sed code in the :loop label to 

                digest the able to be processed by the sed code as it stands.



The record with the offending field shown colored red:



         q

      }

      g; # all clear: recover and carry on...

   /n$/!bloop



   s/..$//

' csv.data

Working

We base the solution on the various types of fields.

Skip empty or blank lines.

Append a "," to simplify regex used, we'll take it away at the end.

To set the ball rolling we place a marker, n, at the beginning of line. This marker will travel from left to right, jumping a field processed a time.

The action begins in the do-while loop, in the body of which, we process a field at a time. The field beginning is signaled by n and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left of n and replace the , by a |.

The looping stops when the marker n hits the end of line /n$/ and we then take away the marker as well as the dummy , we had placed in the beginning.

Results

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

edited May 31 '17 at 14:47

answered May 27 '17 at 9:49

user218374

Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44

Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46

@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21

add a comment |

up vote
2
down vote

ESC=$(printf '33')

RED="${ESC}[0;31m"

 NC="${ESC}[0m"



sed -e '

   /./!b

   /[^[:space:]]/!b



   s/.*/

&,/



   :loop

      h

      s/(n),/|1/;                                                  # An empty field

      s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/;                   # +-.NNN

      s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/;  # +-NNN.MMM +-NNN. +-NNN

      s/(n)~([0-9][0-9]*),/2|1/;                                # ~NNN

      s/(n)([0-9][0-9]*)~,/2|1/;                                # NNN~

      s/(n)~([^~]*)~,/2|1/;                                     # ~...~

      x;G

      /^(.*)n1$/{

         g;'"s/n([^,]*)/${RED}1${NC}/"'

         i

***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.



Cause of error: What this means is that this particular field is not 



           Fix: You should add to the sed code in the :loop label to 

                digest the able to be processed by the sed code as it stands.



The record with the offending field shown colored red:



         q

      }

      g; # all clear: recover and carry on...

   /n$/!bloop



   s/..$//

' csv.data

Working

We base the solution on the various types of fields.

Skip empty or blank lines.

Append a "," to simplify regex used, we'll take it away at the end.

To set the ball rolling we place a marker, n, at the beginning of line. This marker will travel from left to right, jumping a field processed a time.

The action begins in the do-while loop, in the body of which, we process a field at a time. The field beginning is signaled by n and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left of n and replace the , by a |.

The looping stops when the marker n hits the end of line /n$/ and we then take away the marker as well as the dummy , we had placed in the beginning.

Results

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

edited May 31 '17 at 14:47

answered May 27 '17 at 9:49

user218374

ESC=$(printf '33')

RED="${ESC}[0;31m"

 NC="${ESC}[0m"



sed -e '

   /./!b

   /[^[:space:]]/!b



   s/.*/

&,/



   :loop

      h

      s/(n),/|1/;                                                  # An empty field

      s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/;                   # +-.NNN

      s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/;  # +-NNN.MMM +-NNN. +-NNN

      s/(n)~([0-9][0-9]*),/2|1/;                                # ~NNN

      s/(n)([0-9][0-9]*)~,/2|1/;                                # NNN~

      s/(n)~([^~]*)~,/2|1/;                                     # ~...~

      x;G

      /^(.*)n1$/{

         g;'"s/n([^,]*)/${RED}1${NC}/"'

         i

***'"${RED}ERROR${NC}"'*** Unable to process the field shown colored.



Cause of error: What this means is that this particular field is not 



           Fix: You should add to the sed code in the :loop label to 

                digest the able to be processed by the sed code as it stands.



The record with the offending field shown colored red:



         q

      }

      g; # all clear: recover and carry on...

   /n$/!bloop



   s/..$//

' csv.data

Working

We base the solution on the various types of fields.

Skip empty or blank lines.

Append a "," to simplify regex used, we'll take it away at the end.

To set the ball rolling we place a marker, n, at the beginning of line. This marker will travel from left to right, jumping a field processed a time.

The action begins in the do-while loop, in the body of which, we process a field at a time. The field beginning is signaled by n and we process the various varieties of fields that can occur. Everytime, we bring the field processed away to the left of n and replace the , by a |.

The looping stops when the marker n hits the end of line /n$/ and we then take away the marker as well as the dummy , we had placed in the beginning.

Results

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043|005|006|007|003|008|016|041|012|029|068|027|028|033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

edited May 31 '17 at 14:47

answered May 27 '17 at 9:49

user218374

edited May 31 '17 at 14:47

answered May 27 '17 at 9:49

user218374

answered May 27 '17 at 9:49

user218374

answered May 27 '17 at 9:49

user218374

Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44

Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46

@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21

add a comment |

Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44

Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46

@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21

Hi Rakesh, Thanks much for replying quickly. I incorporated the logic you provided in a Shell script which is supposed to perform the task explained for multiple files in a directory. Following is the shell script and when I execute the code it gets stuck on First file execution without performing anything. Can you please check and provide any suggestions to address the issue?
– Sudheer
May 30 '17 at 19:44

Here is the shell script #!/bin/ksh cd /CustomFolder/input/Test FILES=$(ls -t *.txt) for f in $FILES do sed -e ' # skip empty and blank lines /./!b /[^[:space:]]/!b s/$/,/ G;s/(.*)(.)/21/ :loop s/(n),/|1/ s/(n)([+-]{0,1}[.][0-9]{1,}),/2|1/ s/(n)([+-]{0,1}[0-9]{1,}([.][0-9]*){0,1}),/2|1/ s/(n)~([0-9]*),/2|1/ s/(n)([0-9]*)~,/2|1/ s/(n)~([^~]*)~,/2|1/ /n$/!bloop s/..$// ' $f done exit 0
– Sudheer
May 30 '17 at 19:46

@Sudheer What that means is that there are some fields which are not of the type being handled presently in the code. Providing a diagnostic facility within to help you zero in on the offender.
– user218374
May 30 '17 at 22:21

add a comment |

up vote
1
down vote

I'd suggest using a dedicated CSV parser such as perl's Text::CSV

perl -MText::CSV -lne '

    BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) } 

    print join "|", $csv->fields() if $csv->parse($_)

  ' file_name

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

answered May 26 '17 at 22:27

steeldriver

33.6k34982

Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52

add a comment |

up vote
1
down vote

I'd suggest using a dedicated CSV parser such as perl's Text::CSV

perl -MText::CSV -lne '

    BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) } 

    print join "|", $csv->fields() if $csv->parse($_)

  ' file_name

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

answered May 26 '17 at 22:27

steeldriver

33.6k34982

Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52

add a comment |

up vote
1
down vote

I'd suggest using a dedicated CSV parser such as perl's Text::CSV

perl -MText::CSV -lne '

    BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) } 

    print join "|", $csv->fields() if $csv->parse($_)

  ' file_name

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

answered May 26 '17 at 22:27

steeldriver

33.6k34982

I'd suggest using a dedicated CSV parser such as perl's Text::CSV

perl -MText::CSV -lne '

    BEGIN{ $csv = Text::CSV->new({ quote_char => "~" , escape_char => "~" , allow_whitespace => 1}) } 

    print join "|", $csv->fields() if $csv->parse($_)

  ' file_name

364034|2015652205|26722|2015|4|C25753-4|TC25753,|2WD Double Cab 144.2" SLT,|Y|40506.16|43555.00|1095.00|043,005,006,007,003,008,016,041,012,029,068,027,028,033|3|2WD Double Cab 144.2"|SLT|6|4|N|S|N|S|N|N|N|| |Confirmed|w2015k65m22t5|Sierra 2500HD|Double Cab Standard Box 2-Wheel Drive SLT|Rear Wheel Drive|Extended Cab Pickup - Standard Bed

answered May 26 '17 at 22:27

steeldriver

33.6k34982

answered May 26 '17 at 22:27

steeldriver

33.6k34982

answered May 26 '17 at 22:27

steeldriver

33.6k34982

answered May 26 '17 at 22:27

steeldriver

33.6k34982

Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52

add a comment |

Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52

Thanks for your suggestion but I can't use perl, I need to use standard unix shell scripting or unix commands.
– Sudheer
May 30 '17 at 19:52

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj