sed command to swap characters
up vote
5
down vote
favorite
My input file layout is: mm/dd/yyyy,hh:mm,other fields
I need to format it as: yyyy-mm-dd hh:mm:00,other fields
sample input:
01/02/1998,09:30,0.4571,0.4613,0.4529,0.4592,6042175
01/02/1998,09:45,0.4592,0.4613,0.4529,0.4571,9956023
01/02/1998,10:00,0.4571,0.4613,0.455,0.4613,8939555
01/02/1998,10:15,0.4613,0.4697,0.4571,0.4697,12823627
01/02/1998,10:30,0.4676,0.4969,0.4613,0.4906,28145145
sample output:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
etc...
I tried to use:
sed -r 's/(^[0-9][0-9])(/[0-9][0-9]/)(/[0-9][0-9][0-9][0-9],)/312/g
text-processing sed awk regular-expression perl
add a comment |
up vote
5
down vote
favorite
My input file layout is: mm/dd/yyyy,hh:mm,other fields
I need to format it as: yyyy-mm-dd hh:mm:00,other fields
sample input:
01/02/1998,09:30,0.4571,0.4613,0.4529,0.4592,6042175
01/02/1998,09:45,0.4592,0.4613,0.4529,0.4571,9956023
01/02/1998,10:00,0.4571,0.4613,0.455,0.4613,8939555
01/02/1998,10:15,0.4613,0.4697,0.4571,0.4697,12823627
01/02/1998,10:30,0.4676,0.4969,0.4613,0.4906,28145145
sample output:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
etc...
I tried to use:
sed -r 's/(^[0-9][0-9])(/[0-9][0-9]/)(/[0-9][0-9][0-9][0-9],)/312/g
text-processing sed awk regular-expression perl
add a comment |
up vote
5
down vote
favorite
up vote
5
down vote
favorite
My input file layout is: mm/dd/yyyy,hh:mm,other fields
I need to format it as: yyyy-mm-dd hh:mm:00,other fields
sample input:
01/02/1998,09:30,0.4571,0.4613,0.4529,0.4592,6042175
01/02/1998,09:45,0.4592,0.4613,0.4529,0.4571,9956023
01/02/1998,10:00,0.4571,0.4613,0.455,0.4613,8939555
01/02/1998,10:15,0.4613,0.4697,0.4571,0.4697,12823627
01/02/1998,10:30,0.4676,0.4969,0.4613,0.4906,28145145
sample output:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
etc...
I tried to use:
sed -r 's/(^[0-9][0-9])(/[0-9][0-9]/)(/[0-9][0-9][0-9][0-9],)/312/g
text-processing sed awk regular-expression perl
My input file layout is: mm/dd/yyyy,hh:mm,other fields
I need to format it as: yyyy-mm-dd hh:mm:00,other fields
sample input:
01/02/1998,09:30,0.4571,0.4613,0.4529,0.4592,6042175
01/02/1998,09:45,0.4592,0.4613,0.4529,0.4571,9956023
01/02/1998,10:00,0.4571,0.4613,0.455,0.4613,8939555
01/02/1998,10:15,0.4613,0.4697,0.4571,0.4697,12823627
01/02/1998,10:30,0.4676,0.4969,0.4613,0.4906,28145145
sample output:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
etc...
I tried to use:
sed -r 's/(^[0-9][0-9])(/[0-9][0-9]/)(/[0-9][0-9][0-9][0-9],)/312/g
text-processing sed awk regular-expression perl
text-processing sed awk regular-expression perl
edited 2 days ago
Rui F Ribeiro
38.2k1475123
38.2k1475123
asked Jun 2 '15 at 11:58
Karthik Appigatla
284
284
add a comment |
add a comment |
6 Answers
6
active
oldest
votes
up vote
6
down vote
accepted
sed -e 's/(..)/(..)/(....),(.....),(.*)/3-1-2 4:00,5/'
Edited to include the input from the comments below:
sed -e 's#(..).(..).(....),(.....),#3-1-2 4:00,#'
It's not necessary to capture(.*)
just to add it to the end (5
) of the replacement string
– glenn jackman
Jun 2 '15 at 14:39
1
@MikeS - he doesn't need to match slashes - they're matched. He could just use another.
– mikeserv
Jun 2 '15 at 14:46
1
This to me looks the simplest, though I like chaos' solution in another answer as well. One minor change I would make: When you need a lot of backslashes in the search pattern, the first character after the 's' can define something else as a separator. E.g.:sed -e 's#(..)/(..)/(....),(.....),(.*)#3-2-1 4:00,5#'
(thanks @mikeserv: my text was confusing. Fixed)
– Mike S
Jun 2 '15 at 15:46
On second thought, my bad, jhilmer, if you're of the feminine persuasion. Whatever persuades you, it's a goodsed
script - but glenn's right about your trailing capture - it's not doing you any favors. I would drop it.
– mikeserv
Jun 2 '15 at 16:52
add a comment |
up vote
3
down vote
That worked for me:
sed -r 's/([0-9]{2})/([0-9]{2})/([0-9]{4}),([0-9:]{5})/3-1-2 4:00/g'
Match 2 digits (([0-9]{2})
), slash, 2 digits (([0-9]{2})
), slash, 4 digits (([0-9]{4})
), and then digits and :
(([0-9:]{5})
). Replace it with the order you wish: 3-1-2 4:00
(year-month-day hour:minute:00).
1
I think I prefer the digit-and-count approach to the RE, if only because it's a LOT more intelligible.
– Sobrique
Jun 2 '15 at 15:11
add a comment |
up vote
3
down vote
sed 'y|/|-|
s/,*(.....)-*([^,]*)/2-1/
s// 1:00/2
' <infile
OUTPUT:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
With sed
you don't usually need to try so hard - it often doesn't pay to try to explicitly enumerate the matches you're looking for. Rather, it is far simpler usually just to specify a few landmarks - delimiters - and let a pattern gobble up the interim for you.
Above sed
first y///
translates /
characters to -
characters. Next it references the first not-comma (provided there are at least 5) character in pattern-space and and the next four characters as 1
while possibly ignoring a trailing -
. It follows that on by referencing as many sequential ^
not-comma chars in 2
as it might before the next occurring comma in pattern space. The result - for the first substitution - is that it puts mm-dd
in 1
before matching -
and then yyyy
in 2
. So we swap those, drop the -
and insert a new one on the other side like:
s/.../2-1/
And last we do it again - reusing the same pattern for a different purpose. When I do:
s// 1:00/2
I'm instructing sed
to reuse the last regexp (as signified by the //
empty address), but this time to find the second occurrence of that pattern in pattern space - which does match a comma with ,*
this time - it matches the comma separating this field and the last. It also matches HH:MM
in 1
and (because that string is immediately followed by a comma) the ''
null-string in 2
. All that remains from there is to replace 1
with itself preceded by a <space> and followed by the :00 string. Both the intervening comma and the null-string are edited away.
If you feel you would rather get a little more specific after all, though, consider how much easier it might be with just a little abstraction. The primary benefit offered by regular expressions is that they provide us a means of quickly and efficiently abstracting away a repetitive task given only a clear understanding of what makes it repetitive in the first place.
If constructing your regexp becomes a repetitive task in and of itself, then, well... something's probably missing. One of the advantages of a simple regexp syntax, though, is that it too often makes a very good candidate for abstraction - and it is easily achieved.
For example:
d='[0-9][0-9]' T=$d:$d m=$d y=$d$d
sed -E "s|($m/$d)/($y),($T)|2-1 3:00|;s|/|-|"
looks like OP wants a space after the date, not a comma
– glenn jackman
Jun 2 '15 at 13:54
@glennjackman - thanks. I didn't notice that.
– mikeserv
Jun 2 '15 at 14:19
Interesting - a bit like self-modifying code. +1
– Peter.O
Jun 2 '15 at 21:09
add a comment |
up vote
2
down vote
I would suggest taking a slightly different approach - parse the timestamp, then spit out a formatted timestamp. And I'd use perl
for this:
#!/usr/bin/perl
use strict;
use warnings;
use Time::Piece;
my $input_format = '%m/%d/%Y,%H:%M';
my $output_format = '%Y-%m-%d %H:%M:%S';
while (<>) {
my ( $date, $time, @stuff ) = split(",");
my $timestamp = Time::Piece->strptime( "$date,$time", $input_format );
print join( ",", $timestamp->strftime($output_format), @stuff );
}
Which you can reduce to a one liner thus:
perl -MTime::Piece -lne '($date,$time,@stuff) = split; print join ( ',', Time::Piece->strptime( "$date,$time", "%m/%d/%Y,%H:%M" ) -> strftime("%Y-%m-%d %H:%M:%S"), @stuff);'
Which with your sample data, spits out:
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
add a comment |
up vote
2
down vote
And possible awk
solution:
awk 'BEGIN { FS = OFS = ","; } { split($1, d, "/"); $2 = d[3] "-" d[1] "-" d[2] " " $2 ":00"; $1 = ""; } { for (i = 2; i < NF; i++) printf("%s", $i OFS); printf("%s", $NF ORS);}' file
add a comment |
up vote
1
down vote
Use this:
sed -n 's_^([^/]*)/([^]*)/([^,]*),([^:]*):([^,]*)_3-1-2 4:5:00_p' file.txt
To get the correct date format, it needs-1-2
, not-2-1
– Peter.O
Jun 3 '15 at 7:56
@Peter.O edited..
– heemayl
Jun 3 '15 at 8:03
add a comment |
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
sed -e 's/(..)/(..)/(....),(.....),(.*)/3-1-2 4:00,5/'
Edited to include the input from the comments below:
sed -e 's#(..).(..).(....),(.....),#3-1-2 4:00,#'
It's not necessary to capture(.*)
just to add it to the end (5
) of the replacement string
– glenn jackman
Jun 2 '15 at 14:39
1
@MikeS - he doesn't need to match slashes - they're matched. He could just use another.
– mikeserv
Jun 2 '15 at 14:46
1
This to me looks the simplest, though I like chaos' solution in another answer as well. One minor change I would make: When you need a lot of backslashes in the search pattern, the first character after the 's' can define something else as a separator. E.g.:sed -e 's#(..)/(..)/(....),(.....),(.*)#3-2-1 4:00,5#'
(thanks @mikeserv: my text was confusing. Fixed)
– Mike S
Jun 2 '15 at 15:46
On second thought, my bad, jhilmer, if you're of the feminine persuasion. Whatever persuades you, it's a goodsed
script - but glenn's right about your trailing capture - it's not doing you any favors. I would drop it.
– mikeserv
Jun 2 '15 at 16:52
add a comment |
up vote
6
down vote
accepted
sed -e 's/(..)/(..)/(....),(.....),(.*)/3-1-2 4:00,5/'
Edited to include the input from the comments below:
sed -e 's#(..).(..).(....),(.....),#3-1-2 4:00,#'
It's not necessary to capture(.*)
just to add it to the end (5
) of the replacement string
– glenn jackman
Jun 2 '15 at 14:39
1
@MikeS - he doesn't need to match slashes - they're matched. He could just use another.
– mikeserv
Jun 2 '15 at 14:46
1
This to me looks the simplest, though I like chaos' solution in another answer as well. One minor change I would make: When you need a lot of backslashes in the search pattern, the first character after the 's' can define something else as a separator. E.g.:sed -e 's#(..)/(..)/(....),(.....),(.*)#3-2-1 4:00,5#'
(thanks @mikeserv: my text was confusing. Fixed)
– Mike S
Jun 2 '15 at 15:46
On second thought, my bad, jhilmer, if you're of the feminine persuasion. Whatever persuades you, it's a goodsed
script - but glenn's right about your trailing capture - it's not doing you any favors. I would drop it.
– mikeserv
Jun 2 '15 at 16:52
add a comment |
up vote
6
down vote
accepted
up vote
6
down vote
accepted
sed -e 's/(..)/(..)/(....),(.....),(.*)/3-1-2 4:00,5/'
Edited to include the input from the comments below:
sed -e 's#(..).(..).(....),(.....),#3-1-2 4:00,#'
sed -e 's/(..)/(..)/(....),(.....),(.*)/3-1-2 4:00,5/'
Edited to include the input from the comments below:
sed -e 's#(..).(..).(....),(.....),#3-1-2 4:00,#'
edited Jun 2 '15 at 21:01
answered Jun 2 '15 at 12:05
jhilmer
37913
37913
It's not necessary to capture(.*)
just to add it to the end (5
) of the replacement string
– glenn jackman
Jun 2 '15 at 14:39
1
@MikeS - he doesn't need to match slashes - they're matched. He could just use another.
– mikeserv
Jun 2 '15 at 14:46
1
This to me looks the simplest, though I like chaos' solution in another answer as well. One minor change I would make: When you need a lot of backslashes in the search pattern, the first character after the 's' can define something else as a separator. E.g.:sed -e 's#(..)/(..)/(....),(.....),(.*)#3-2-1 4:00,5#'
(thanks @mikeserv: my text was confusing. Fixed)
– Mike S
Jun 2 '15 at 15:46
On second thought, my bad, jhilmer, if you're of the feminine persuasion. Whatever persuades you, it's a goodsed
script - but glenn's right about your trailing capture - it's not doing you any favors. I would drop it.
– mikeserv
Jun 2 '15 at 16:52
add a comment |
It's not necessary to capture(.*)
just to add it to the end (5
) of the replacement string
– glenn jackman
Jun 2 '15 at 14:39
1
@MikeS - he doesn't need to match slashes - they're matched. He could just use another.
– mikeserv
Jun 2 '15 at 14:46
1
This to me looks the simplest, though I like chaos' solution in another answer as well. One minor change I would make: When you need a lot of backslashes in the search pattern, the first character after the 's' can define something else as a separator. E.g.:sed -e 's#(..)/(..)/(....),(.....),(.*)#3-2-1 4:00,5#'
(thanks @mikeserv: my text was confusing. Fixed)
– Mike S
Jun 2 '15 at 15:46
On second thought, my bad, jhilmer, if you're of the feminine persuasion. Whatever persuades you, it's a goodsed
script - but glenn's right about your trailing capture - it's not doing you any favors. I would drop it.
– mikeserv
Jun 2 '15 at 16:52
It's not necessary to capture
(.*)
just to add it to the end (5
) of the replacement string– glenn jackman
Jun 2 '15 at 14:39
It's not necessary to capture
(.*)
just to add it to the end (5
) of the replacement string– glenn jackman
Jun 2 '15 at 14:39
1
1
@MikeS - he doesn't need to match slashes - they're matched. He could just use another
.
– mikeserv
Jun 2 '15 at 14:46
@MikeS - he doesn't need to match slashes - they're matched. He could just use another
.
– mikeserv
Jun 2 '15 at 14:46
1
1
This to me looks the simplest, though I like chaos' solution in another answer as well. One minor change I would make: When you need a lot of backslashes in the search pattern, the first character after the 's' can define something else as a separator. E.g.:
sed -e 's#(..)/(..)/(....),(.....),(.*)#3-2-1 4:00,5#'
(thanks @mikeserv: my text was confusing. Fixed)– Mike S
Jun 2 '15 at 15:46
This to me looks the simplest, though I like chaos' solution in another answer as well. One minor change I would make: When you need a lot of backslashes in the search pattern, the first character after the 's' can define something else as a separator. E.g.:
sed -e 's#(..)/(..)/(....),(.....),(.*)#3-2-1 4:00,5#'
(thanks @mikeserv: my text was confusing. Fixed)– Mike S
Jun 2 '15 at 15:46
On second thought, my bad, jhilmer, if you're of the feminine persuasion. Whatever persuades you, it's a good
sed
script - but glenn's right about your trailing capture - it's not doing you any favors. I would drop it.– mikeserv
Jun 2 '15 at 16:52
On second thought, my bad, jhilmer, if you're of the feminine persuasion. Whatever persuades you, it's a good
sed
script - but glenn's right about your trailing capture - it's not doing you any favors. I would drop it.– mikeserv
Jun 2 '15 at 16:52
add a comment |
up vote
3
down vote
That worked for me:
sed -r 's/([0-9]{2})/([0-9]{2})/([0-9]{4}),([0-9:]{5})/3-1-2 4:00/g'
Match 2 digits (([0-9]{2})
), slash, 2 digits (([0-9]{2})
), slash, 4 digits (([0-9]{4})
), and then digits and :
(([0-9:]{5})
). Replace it with the order you wish: 3-1-2 4:00
(year-month-day hour:minute:00).
1
I think I prefer the digit-and-count approach to the RE, if only because it's a LOT more intelligible.
– Sobrique
Jun 2 '15 at 15:11
add a comment |
up vote
3
down vote
That worked for me:
sed -r 's/([0-9]{2})/([0-9]{2})/([0-9]{4}),([0-9:]{5})/3-1-2 4:00/g'
Match 2 digits (([0-9]{2})
), slash, 2 digits (([0-9]{2})
), slash, 4 digits (([0-9]{4})
), and then digits and :
(([0-9:]{5})
). Replace it with the order you wish: 3-1-2 4:00
(year-month-day hour:minute:00).
1
I think I prefer the digit-and-count approach to the RE, if only because it's a LOT more intelligible.
– Sobrique
Jun 2 '15 at 15:11
add a comment |
up vote
3
down vote
up vote
3
down vote
That worked for me:
sed -r 's/([0-9]{2})/([0-9]{2})/([0-9]{4}),([0-9:]{5})/3-1-2 4:00/g'
Match 2 digits (([0-9]{2})
), slash, 2 digits (([0-9]{2})
), slash, 4 digits (([0-9]{4})
), and then digits and :
(([0-9:]{5})
). Replace it with the order you wish: 3-1-2 4:00
(year-month-day hour:minute:00).
That worked for me:
sed -r 's/([0-9]{2})/([0-9]{2})/([0-9]{4}),([0-9:]{5})/3-1-2 4:00/g'
Match 2 digits (([0-9]{2})
), slash, 2 digits (([0-9]{2})
), slash, 4 digits (([0-9]{4})
), and then digits and :
(([0-9:]{5})
). Replace it with the order you wish: 3-1-2 4:00
(year-month-day hour:minute:00).
answered Jun 2 '15 at 12:04
chaos
34.7k772115
34.7k772115
1
I think I prefer the digit-and-count approach to the RE, if only because it's a LOT more intelligible.
– Sobrique
Jun 2 '15 at 15:11
add a comment |
1
I think I prefer the digit-and-count approach to the RE, if only because it's a LOT more intelligible.
– Sobrique
Jun 2 '15 at 15:11
1
1
I think I prefer the digit-and-count approach to the RE, if only because it's a LOT more intelligible.
– Sobrique
Jun 2 '15 at 15:11
I think I prefer the digit-and-count approach to the RE, if only because it's a LOT more intelligible.
– Sobrique
Jun 2 '15 at 15:11
add a comment |
up vote
3
down vote
sed 'y|/|-|
s/,*(.....)-*([^,]*)/2-1/
s// 1:00/2
' <infile
OUTPUT:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
With sed
you don't usually need to try so hard - it often doesn't pay to try to explicitly enumerate the matches you're looking for. Rather, it is far simpler usually just to specify a few landmarks - delimiters - and let a pattern gobble up the interim for you.
Above sed
first y///
translates /
characters to -
characters. Next it references the first not-comma (provided there are at least 5) character in pattern-space and and the next four characters as 1
while possibly ignoring a trailing -
. It follows that on by referencing as many sequential ^
not-comma chars in 2
as it might before the next occurring comma in pattern space. The result - for the first substitution - is that it puts mm-dd
in 1
before matching -
and then yyyy
in 2
. So we swap those, drop the -
and insert a new one on the other side like:
s/.../2-1/
And last we do it again - reusing the same pattern for a different purpose. When I do:
s// 1:00/2
I'm instructing sed
to reuse the last regexp (as signified by the //
empty address), but this time to find the second occurrence of that pattern in pattern space - which does match a comma with ,*
this time - it matches the comma separating this field and the last. It also matches HH:MM
in 1
and (because that string is immediately followed by a comma) the ''
null-string in 2
. All that remains from there is to replace 1
with itself preceded by a <space> and followed by the :00 string. Both the intervening comma and the null-string are edited away.
If you feel you would rather get a little more specific after all, though, consider how much easier it might be with just a little abstraction. The primary benefit offered by regular expressions is that they provide us a means of quickly and efficiently abstracting away a repetitive task given only a clear understanding of what makes it repetitive in the first place.
If constructing your regexp becomes a repetitive task in and of itself, then, well... something's probably missing. One of the advantages of a simple regexp syntax, though, is that it too often makes a very good candidate for abstraction - and it is easily achieved.
For example:
d='[0-9][0-9]' T=$d:$d m=$d y=$d$d
sed -E "s|($m/$d)/($y),($T)|2-1 3:00|;s|/|-|"
looks like OP wants a space after the date, not a comma
– glenn jackman
Jun 2 '15 at 13:54
@glennjackman - thanks. I didn't notice that.
– mikeserv
Jun 2 '15 at 14:19
Interesting - a bit like self-modifying code. +1
– Peter.O
Jun 2 '15 at 21:09
add a comment |
up vote
3
down vote
sed 'y|/|-|
s/,*(.....)-*([^,]*)/2-1/
s// 1:00/2
' <infile
OUTPUT:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
With sed
you don't usually need to try so hard - it often doesn't pay to try to explicitly enumerate the matches you're looking for. Rather, it is far simpler usually just to specify a few landmarks - delimiters - and let a pattern gobble up the interim for you.
Above sed
first y///
translates /
characters to -
characters. Next it references the first not-comma (provided there are at least 5) character in pattern-space and and the next four characters as 1
while possibly ignoring a trailing -
. It follows that on by referencing as many sequential ^
not-comma chars in 2
as it might before the next occurring comma in pattern space. The result - for the first substitution - is that it puts mm-dd
in 1
before matching -
and then yyyy
in 2
. So we swap those, drop the -
and insert a new one on the other side like:
s/.../2-1/
And last we do it again - reusing the same pattern for a different purpose. When I do:
s// 1:00/2
I'm instructing sed
to reuse the last regexp (as signified by the //
empty address), but this time to find the second occurrence of that pattern in pattern space - which does match a comma with ,*
this time - it matches the comma separating this field and the last. It also matches HH:MM
in 1
and (because that string is immediately followed by a comma) the ''
null-string in 2
. All that remains from there is to replace 1
with itself preceded by a <space> and followed by the :00 string. Both the intervening comma and the null-string are edited away.
If you feel you would rather get a little more specific after all, though, consider how much easier it might be with just a little abstraction. The primary benefit offered by regular expressions is that they provide us a means of quickly and efficiently abstracting away a repetitive task given only a clear understanding of what makes it repetitive in the first place.
If constructing your regexp becomes a repetitive task in and of itself, then, well... something's probably missing. One of the advantages of a simple regexp syntax, though, is that it too often makes a very good candidate for abstraction - and it is easily achieved.
For example:
d='[0-9][0-9]' T=$d:$d m=$d y=$d$d
sed -E "s|($m/$d)/($y),($T)|2-1 3:00|;s|/|-|"
looks like OP wants a space after the date, not a comma
– glenn jackman
Jun 2 '15 at 13:54
@glennjackman - thanks. I didn't notice that.
– mikeserv
Jun 2 '15 at 14:19
Interesting - a bit like self-modifying code. +1
– Peter.O
Jun 2 '15 at 21:09
add a comment |
up vote
3
down vote
up vote
3
down vote
sed 'y|/|-|
s/,*(.....)-*([^,]*)/2-1/
s// 1:00/2
' <infile
OUTPUT:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
With sed
you don't usually need to try so hard - it often doesn't pay to try to explicitly enumerate the matches you're looking for. Rather, it is far simpler usually just to specify a few landmarks - delimiters - and let a pattern gobble up the interim for you.
Above sed
first y///
translates /
characters to -
characters. Next it references the first not-comma (provided there are at least 5) character in pattern-space and and the next four characters as 1
while possibly ignoring a trailing -
. It follows that on by referencing as many sequential ^
not-comma chars in 2
as it might before the next occurring comma in pattern space. The result - for the first substitution - is that it puts mm-dd
in 1
before matching -
and then yyyy
in 2
. So we swap those, drop the -
and insert a new one on the other side like:
s/.../2-1/
And last we do it again - reusing the same pattern for a different purpose. When I do:
s// 1:00/2
I'm instructing sed
to reuse the last regexp (as signified by the //
empty address), but this time to find the second occurrence of that pattern in pattern space - which does match a comma with ,*
this time - it matches the comma separating this field and the last. It also matches HH:MM
in 1
and (because that string is immediately followed by a comma) the ''
null-string in 2
. All that remains from there is to replace 1
with itself preceded by a <space> and followed by the :00 string. Both the intervening comma and the null-string are edited away.
If you feel you would rather get a little more specific after all, though, consider how much easier it might be with just a little abstraction. The primary benefit offered by regular expressions is that they provide us a means of quickly and efficiently abstracting away a repetitive task given only a clear understanding of what makes it repetitive in the first place.
If constructing your regexp becomes a repetitive task in and of itself, then, well... something's probably missing. One of the advantages of a simple regexp syntax, though, is that it too often makes a very good candidate for abstraction - and it is easily achieved.
For example:
d='[0-9][0-9]' T=$d:$d m=$d y=$d$d
sed -E "s|($m/$d)/($y),($T)|2-1 3:00|;s|/|-|"
sed 'y|/|-|
s/,*(.....)-*([^,]*)/2-1/
s// 1:00/2
' <infile
OUTPUT:
1998-01-02 09:30:00,0.4571,0.4613,0.4529,0.4592,6042175
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
With sed
you don't usually need to try so hard - it often doesn't pay to try to explicitly enumerate the matches you're looking for. Rather, it is far simpler usually just to specify a few landmarks - delimiters - and let a pattern gobble up the interim for you.
Above sed
first y///
translates /
characters to -
characters. Next it references the first not-comma (provided there are at least 5) character in pattern-space and and the next four characters as 1
while possibly ignoring a trailing -
. It follows that on by referencing as many sequential ^
not-comma chars in 2
as it might before the next occurring comma in pattern space. The result - for the first substitution - is that it puts mm-dd
in 1
before matching -
and then yyyy
in 2
. So we swap those, drop the -
and insert a new one on the other side like:
s/.../2-1/
And last we do it again - reusing the same pattern for a different purpose. When I do:
s// 1:00/2
I'm instructing sed
to reuse the last regexp (as signified by the //
empty address), but this time to find the second occurrence of that pattern in pattern space - which does match a comma with ,*
this time - it matches the comma separating this field and the last. It also matches HH:MM
in 1
and (because that string is immediately followed by a comma) the ''
null-string in 2
. All that remains from there is to replace 1
with itself preceded by a <space> and followed by the :00 string. Both the intervening comma and the null-string are edited away.
If you feel you would rather get a little more specific after all, though, consider how much easier it might be with just a little abstraction. The primary benefit offered by regular expressions is that they provide us a means of quickly and efficiently abstracting away a repetitive task given only a clear understanding of what makes it repetitive in the first place.
If constructing your regexp becomes a repetitive task in and of itself, then, well... something's probably missing. One of the advantages of a simple regexp syntax, though, is that it too often makes a very good candidate for abstraction - and it is easily achieved.
For example:
d='[0-9][0-9]' T=$d:$d m=$d y=$d$d
sed -E "s|($m/$d)/($y),($T)|2-1 3:00|;s|/|-|"
edited Jun 3 '15 at 3:24
answered Jun 2 '15 at 12:42
mikeserv
44.9k565152
44.9k565152
looks like OP wants a space after the date, not a comma
– glenn jackman
Jun 2 '15 at 13:54
@glennjackman - thanks. I didn't notice that.
– mikeserv
Jun 2 '15 at 14:19
Interesting - a bit like self-modifying code. +1
– Peter.O
Jun 2 '15 at 21:09
add a comment |
looks like OP wants a space after the date, not a comma
– glenn jackman
Jun 2 '15 at 13:54
@glennjackman - thanks. I didn't notice that.
– mikeserv
Jun 2 '15 at 14:19
Interesting - a bit like self-modifying code. +1
– Peter.O
Jun 2 '15 at 21:09
looks like OP wants a space after the date, not a comma
– glenn jackman
Jun 2 '15 at 13:54
looks like OP wants a space after the date, not a comma
– glenn jackman
Jun 2 '15 at 13:54
@glennjackman - thanks. I didn't notice that.
– mikeserv
Jun 2 '15 at 14:19
@glennjackman - thanks. I didn't notice that.
– mikeserv
Jun 2 '15 at 14:19
Interesting - a bit like self-modifying code. +1
– Peter.O
Jun 2 '15 at 21:09
Interesting - a bit like self-modifying code. +1
– Peter.O
Jun 2 '15 at 21:09
add a comment |
up vote
2
down vote
I would suggest taking a slightly different approach - parse the timestamp, then spit out a formatted timestamp. And I'd use perl
for this:
#!/usr/bin/perl
use strict;
use warnings;
use Time::Piece;
my $input_format = '%m/%d/%Y,%H:%M';
my $output_format = '%Y-%m-%d %H:%M:%S';
while (<>) {
my ( $date, $time, @stuff ) = split(",");
my $timestamp = Time::Piece->strptime( "$date,$time", $input_format );
print join( ",", $timestamp->strftime($output_format), @stuff );
}
Which you can reduce to a one liner thus:
perl -MTime::Piece -lne '($date,$time,@stuff) = split; print join ( ',', Time::Piece->strptime( "$date,$time", "%m/%d/%Y,%H:%M" ) -> strftime("%Y-%m-%d %H:%M:%S"), @stuff);'
Which with your sample data, spits out:
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
add a comment |
up vote
2
down vote
I would suggest taking a slightly different approach - parse the timestamp, then spit out a formatted timestamp. And I'd use perl
for this:
#!/usr/bin/perl
use strict;
use warnings;
use Time::Piece;
my $input_format = '%m/%d/%Y,%H:%M';
my $output_format = '%Y-%m-%d %H:%M:%S';
while (<>) {
my ( $date, $time, @stuff ) = split(",");
my $timestamp = Time::Piece->strptime( "$date,$time", $input_format );
print join( ",", $timestamp->strftime($output_format), @stuff );
}
Which you can reduce to a one liner thus:
perl -MTime::Piece -lne '($date,$time,@stuff) = split; print join ( ',', Time::Piece->strptime( "$date,$time", "%m/%d/%Y,%H:%M" ) -> strftime("%Y-%m-%d %H:%M:%S"), @stuff);'
Which with your sample data, spits out:
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
add a comment |
up vote
2
down vote
up vote
2
down vote
I would suggest taking a slightly different approach - parse the timestamp, then spit out a formatted timestamp. And I'd use perl
for this:
#!/usr/bin/perl
use strict;
use warnings;
use Time::Piece;
my $input_format = '%m/%d/%Y,%H:%M';
my $output_format = '%Y-%m-%d %H:%M:%S';
while (<>) {
my ( $date, $time, @stuff ) = split(",");
my $timestamp = Time::Piece->strptime( "$date,$time", $input_format );
print join( ",", $timestamp->strftime($output_format), @stuff );
}
Which you can reduce to a one liner thus:
perl -MTime::Piece -lne '($date,$time,@stuff) = split; print join ( ',', Time::Piece->strptime( "$date,$time", "%m/%d/%Y,%H:%M" ) -> strftime("%Y-%m-%d %H:%M:%S"), @stuff);'
Which with your sample data, spits out:
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
I would suggest taking a slightly different approach - parse the timestamp, then spit out a formatted timestamp. And I'd use perl
for this:
#!/usr/bin/perl
use strict;
use warnings;
use Time::Piece;
my $input_format = '%m/%d/%Y,%H:%M';
my $output_format = '%Y-%m-%d %H:%M:%S';
while (<>) {
my ( $date, $time, @stuff ) = split(",");
my $timestamp = Time::Piece->strptime( "$date,$time", $input_format );
print join( ",", $timestamp->strftime($output_format), @stuff );
}
Which you can reduce to a one liner thus:
perl -MTime::Piece -lne '($date,$time,@stuff) = split; print join ( ',', Time::Piece->strptime( "$date,$time", "%m/%d/%Y,%H:%M" ) -> strftime("%Y-%m-%d %H:%M:%S"), @stuff);'
Which with your sample data, spits out:
1998-01-02 09:45:00,0.4592,0.4613,0.4529,0.4571,9956023
1998-01-02 10:00:00,0.4571,0.4613,0.455,0.4613,8939555
1998-01-02 10:15:00,0.4613,0.4697,0.4571,0.4697,12823627
1998-01-02 10:30:00,0.4676,0.4969,0.4613,0.4906,28145145
answered Jun 2 '15 at 13:05
Sobrique
3,759517
3,759517
add a comment |
add a comment |
up vote
2
down vote
And possible awk
solution:
awk 'BEGIN { FS = OFS = ","; } { split($1, d, "/"); $2 = d[3] "-" d[1] "-" d[2] " " $2 ":00"; $1 = ""; } { for (i = 2; i < NF; i++) printf("%s", $i OFS); printf("%s", $NF ORS);}' file
add a comment |
up vote
2
down vote
And possible awk
solution:
awk 'BEGIN { FS = OFS = ","; } { split($1, d, "/"); $2 = d[3] "-" d[1] "-" d[2] " " $2 ":00"; $1 = ""; } { for (i = 2; i < NF; i++) printf("%s", $i OFS); printf("%s", $NF ORS);}' file
add a comment |
up vote
2
down vote
up vote
2
down vote
And possible awk
solution:
awk 'BEGIN { FS = OFS = ","; } { split($1, d, "/"); $2 = d[3] "-" d[1] "-" d[2] " " $2 ":00"; $1 = ""; } { for (i = 2; i < NF; i++) printf("%s", $i OFS); printf("%s", $NF ORS);}' file
And possible awk
solution:
awk 'BEGIN { FS = OFS = ","; } { split($1, d, "/"); $2 = d[3] "-" d[1] "-" d[2] " " $2 ":00"; $1 = ""; } { for (i = 2; i < NF; i++) printf("%s", $i OFS); printf("%s", $NF ORS);}' file
edited Jun 2 '15 at 21:13
answered Jun 2 '15 at 12:18
taliezin
6,77011527
6,77011527
add a comment |
add a comment |
up vote
1
down vote
Use this:
sed -n 's_^([^/]*)/([^]*)/([^,]*),([^:]*):([^,]*)_3-1-2 4:5:00_p' file.txt
To get the correct date format, it needs-1-2
, not-2-1
– Peter.O
Jun 3 '15 at 7:56
@Peter.O edited..
– heemayl
Jun 3 '15 at 8:03
add a comment |
up vote
1
down vote
Use this:
sed -n 's_^([^/]*)/([^]*)/([^,]*),([^:]*):([^,]*)_3-1-2 4:5:00_p' file.txt
To get the correct date format, it needs-1-2
, not-2-1
– Peter.O
Jun 3 '15 at 7:56
@Peter.O edited..
– heemayl
Jun 3 '15 at 8:03
add a comment |
up vote
1
down vote
up vote
1
down vote
Use this:
sed -n 's_^([^/]*)/([^]*)/([^,]*),([^:]*):([^,]*)_3-1-2 4:5:00_p' file.txt
Use this:
sed -n 's_^([^/]*)/([^]*)/([^,]*),([^:]*):([^,]*)_3-1-2 4:5:00_p' file.txt
edited Jun 3 '15 at 8:02
answered Jun 2 '15 at 12:09
heemayl
34k370100
34k370100
To get the correct date format, it needs-1-2
, not-2-1
– Peter.O
Jun 3 '15 at 7:56
@Peter.O edited..
– heemayl
Jun 3 '15 at 8:03
add a comment |
To get the correct date format, it needs-1-2
, not-2-1
– Peter.O
Jun 3 '15 at 7:56
@Peter.O edited..
– heemayl
Jun 3 '15 at 8:03
To get the correct date format, it needs
-1-2
, not -2-1
– Peter.O
Jun 3 '15 at 7:56
To get the correct date format, it needs
-1-2
, not -2-1
– Peter.O
Jun 3 '15 at 7:56
@Peter.O edited..
– heemayl
Jun 3 '15 at 8:03
@Peter.O edited..
– heemayl
Jun 3 '15 at 8:03
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f207039%2fsed-command-to-swap-characters%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown