bash regex extract key = value
I have a complex string of this form:
inp="key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:
rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"
if [[ $inp =~ $rkeyval ]]; then
key=${BASH_REMATCH[1]}
val=${BASH_REMATCH[3]}
left=${BASH_REMATCH[4]}
for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do
echo -e "$i: "${BASH_REMATCH[$i]}"";
done;
else
echo "no match"
fi
This does not work. On my Mac with Bash 4.4, there is no match:
no match
On my Red Hat Linux, I get the following output:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value key2 = the value Nb.2 "
4: "key3= "last value""
I expect the following output:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value"
4: "key3= "last value""
In other words, the key would be the second matching group, and the value the third.
This expression works on an online PHP regexp tester.
I want this to work in any Unix machine having an updated version of Bash.
I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?
bash regular-expression posix
add a comment |
I have a complex string of this form:
inp="key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:
rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"
if [[ $inp =~ $rkeyval ]]; then
key=${BASH_REMATCH[1]}
val=${BASH_REMATCH[3]}
left=${BASH_REMATCH[4]}
for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do
echo -e "$i: "${BASH_REMATCH[$i]}"";
done;
else
echo "no match"
fi
This does not work. On my Mac with Bash 4.4, there is no match:
no match
On my Red Hat Linux, I get the following output:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value key2 = the value Nb.2 "
4: "key3= "last value""
I expect the following output:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value"
4: "key3= "last value""
In other words, the key would be the second matching group, and the value the third.
This expression works on an online PHP regexp tester.
I want this to work in any Unix machine having an updated version of Bash.
I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?
bash regular-expression posix
If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.
– Michael Homer
Jan 9 '17 at 20:09
There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!
– kaligne
Jan 9 '17 at 21:35
add a comment |
I have a complex string of this form:
inp="key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:
rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"
if [[ $inp =~ $rkeyval ]]; then
key=${BASH_REMATCH[1]}
val=${BASH_REMATCH[3]}
left=${BASH_REMATCH[4]}
for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do
echo -e "$i: "${BASH_REMATCH[$i]}"";
done;
else
echo "no match"
fi
This does not work. On my Mac with Bash 4.4, there is no match:
no match
On my Red Hat Linux, I get the following output:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value key2 = the value Nb.2 "
4: "key3= "last value""
I expect the following output:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value"
4: "key3= "last value""
In other words, the key would be the second matching group, and the value the third.
This expression works on an online PHP regexp tester.
I want this to work in any Unix machine having an updated version of Bash.
I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?
bash regular-expression posix
I have a complex string of this form:
inp="key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:
rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"
if [[ $inp =~ $rkeyval ]]; then
key=${BASH_REMATCH[1]}
val=${BASH_REMATCH[3]}
left=${BASH_REMATCH[4]}
for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do
echo -e "$i: "${BASH_REMATCH[$i]}"";
done;
else
echo "no match"
fi
This does not work. On my Mac with Bash 4.4, there is no match:
no match
On my Red Hat Linux, I get the following output:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value key2 = the value Nb.2 "
4: "key3= "last value""
I expect the following output:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value"
4: "key3= "last value""
In other words, the key would be the second matching group, and the value the third.
This expression works on an online PHP regexp tester.
I want this to work in any Unix machine having an updated version of Bash.
I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?
bash regular-expression posix
bash regular-expression posix
edited Jan 9 '17 at 22:17
Michael Homer
47.4k8124163
47.4k8124163
asked Jan 9 '17 at 18:49
kalignekaligne
4131718
4131718
If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.
– Michael Homer
Jan 9 '17 at 20:09
There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!
– kaligne
Jan 9 '17 at 21:35
add a comment |
If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.
– Michael Homer
Jan 9 '17 at 20:09
There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!
– kaligne
Jan 9 '17 at 21:35
If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.
– Michael Homer
Jan 9 '17 at 20:09
If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.
– Michael Homer
Jan 9 '17 at 20:09
There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!
– kaligne
Jan 9 '17 at 21:35
There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!
– kaligne
Jan 9 '17 at 21:35
add a comment |
2 Answers
2
active
oldest
votes
An asterisk is already an optional count (as it could be zero characters). There is no need to add an ?
to it.
So, Will it be ok if each parenthesis will capture a key or a value?:
s='[[:space:]]*' # spaces
n='[_[:alnum:]]+' # a valid name (limited by spaces)
e="${s}=${s}" # an equal sign (=).
rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
# 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
echo "$rkeyval"
That will capture like this:
if [[ $inp =~ $rkeyval ]]; then
i=0
while ((i<${#BASH_REMATCH[@]})); do
printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
done
else
echo "no match"
fi
Printing:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value"
3: "key2"
4: "the value Nb.2 "
5: "key3"
6: ""last value""
And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):
key="${BASH_REMATCH[1]}"
val="${BASH_REMATCH[@]:2:3}"
left="${BASH_REMATCH[@]:5:2}"
add a comment |
POSIX does not define *?
for EREs, which Bash uses, instead specifying that:
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.
Bash uses the system regcomp
/regexec
for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?
.
There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?
, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f336072%2fbash-regex-extract-key-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
An asterisk is already an optional count (as it could be zero characters). There is no need to add an ?
to it.
So, Will it be ok if each parenthesis will capture a key or a value?:
s='[[:space:]]*' # spaces
n='[_[:alnum:]]+' # a valid name (limited by spaces)
e="${s}=${s}" # an equal sign (=).
rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
# 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
echo "$rkeyval"
That will capture like this:
if [[ $inp =~ $rkeyval ]]; then
i=0
while ((i<${#BASH_REMATCH[@]})); do
printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
done
else
echo "no match"
fi
Printing:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value"
3: "key2"
4: "the value Nb.2 "
5: "key3"
6: ""last value""
And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):
key="${BASH_REMATCH[1]}"
val="${BASH_REMATCH[@]:2:3}"
left="${BASH_REMATCH[@]:5:2}"
add a comment |
An asterisk is already an optional count (as it could be zero characters). There is no need to add an ?
to it.
So, Will it be ok if each parenthesis will capture a key or a value?:
s='[[:space:]]*' # spaces
n='[_[:alnum:]]+' # a valid name (limited by spaces)
e="${s}=${s}" # an equal sign (=).
rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
# 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
echo "$rkeyval"
That will capture like this:
if [[ $inp =~ $rkeyval ]]; then
i=0
while ((i<${#BASH_REMATCH[@]})); do
printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
done
else
echo "no match"
fi
Printing:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value"
3: "key2"
4: "the value Nb.2 "
5: "key3"
6: ""last value""
And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):
key="${BASH_REMATCH[1]}"
val="${BASH_REMATCH[@]:2:3}"
left="${BASH_REMATCH[@]:5:2}"
add a comment |
An asterisk is already an optional count (as it could be zero characters). There is no need to add an ?
to it.
So, Will it be ok if each parenthesis will capture a key or a value?:
s='[[:space:]]*' # spaces
n='[_[:alnum:]]+' # a valid name (limited by spaces)
e="${s}=${s}" # an equal sign (=).
rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
# 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
echo "$rkeyval"
That will capture like this:
if [[ $inp =~ $rkeyval ]]; then
i=0
while ((i<${#BASH_REMATCH[@]})); do
printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
done
else
echo "no match"
fi
Printing:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value"
3: "key2"
4: "the value Nb.2 "
5: "key3"
6: ""last value""
And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):
key="${BASH_REMATCH[1]}"
val="${BASH_REMATCH[@]:2:3}"
left="${BASH_REMATCH[@]:5:2}"
An asterisk is already an optional count (as it could be zero characters). There is no need to add an ?
to it.
So, Will it be ok if each parenthesis will capture a key or a value?:
s='[[:space:]]*' # spaces
n='[_[:alnum:]]+' # a valid name (limited by spaces)
e="${s}=${s}" # an equal sign (=).
rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
# 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
echo "$rkeyval"
That will capture like this:
if [[ $inp =~ $rkeyval ]]; then
i=0
while ((i<${#BASH_REMATCH[@]})); do
printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
done
else
echo "no match"
fi
Printing:
0: "key1 = what' ever the value key2 = the value Nb.2 key3= "last value""
1: "key1"
2: "what' ever the value"
3: "key2"
4: "the value Nb.2 "
5: "key3"
6: ""last value""
And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):
key="${BASH_REMATCH[1]}"
val="${BASH_REMATCH[@]:2:3}"
left="${BASH_REMATCH[@]:5:2}"
answered Jan 10 '17 at 20:17
sorontarsorontar
4,428928
4,428928
add a comment |
add a comment |
POSIX does not define *?
for EREs, which Bash uses, instead specifying that:
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.
Bash uses the system regcomp
/regexec
for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?
.
There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?
, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.
add a comment |
POSIX does not define *?
for EREs, which Bash uses, instead specifying that:
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.
Bash uses the system regcomp
/regexec
for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?
.
There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?
, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.
add a comment |
POSIX does not define *?
for EREs, which Bash uses, instead specifying that:
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.
Bash uses the system regcomp
/regexec
for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?
.
There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?
, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.
POSIX does not define *?
for EREs, which Bash uses, instead specifying that:
The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.
Bash uses the system regcomp
/regexec
for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?
.
There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?
, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.
edited 39 mins ago
Pang
11915
11915
answered Jan 9 '17 at 22:12
Michael HomerMichael Homer
47.4k8124163
47.4k8124163
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f336072%2fbash-regex-extract-key-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.
– Michael Homer
Jan 9 '17 at 20:09
There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!
– kaligne
Jan 9 '17 at 21:35