bash regex extract key = value












1















I have a complex string of this form:



inp="key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""


I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:



rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"

if [[ $inp =~ $rkeyval ]]; then

key=${BASH_REMATCH[1]}
val=${BASH_REMATCH[3]}
left=${BASH_REMATCH[4]}

for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do
echo -e "$i: "${BASH_REMATCH[$i]}"";
done;
else
echo "no match"
fi


This does not work. On my Mac with Bash 4.4, there is no match:



no match


On my Red Hat Linux, I get the following output:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value key2 = the value Nb.2 "
4: "key3= "last value""


I expect the following output:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value"
4: "key3= "last value""


In other words, the key would be the second matching group, and the value the third.



This expression works on an online PHP regexp tester.



I want this to work in any Unix machine having an updated version of Bash.



I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?










share|improve this question

























  • If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.

    – Michael Homer
    Jan 9 '17 at 20:09











  • There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!

    – kaligne
    Jan 9 '17 at 21:35
















1















I have a complex string of this form:



inp="key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""


I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:



rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"

if [[ $inp =~ $rkeyval ]]; then

key=${BASH_REMATCH[1]}
val=${BASH_REMATCH[3]}
left=${BASH_REMATCH[4]}

for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do
echo -e "$i: "${BASH_REMATCH[$i]}"";
done;
else
echo "no match"
fi


This does not work. On my Mac with Bash 4.4, there is no match:



no match


On my Red Hat Linux, I get the following output:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value key2 = the value Nb.2 "
4: "key3= "last value""


I expect the following output:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value"
4: "key3= "last value""


In other words, the key would be the second matching group, and the value the third.



This expression works on an online PHP regexp tester.



I want this to work in any Unix machine having an updated version of Bash.



I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?










share|improve this question

























  • If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.

    – Michael Homer
    Jan 9 '17 at 20:09











  • There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!

    – kaligne
    Jan 9 '17 at 21:35














1












1








1








I have a complex string of this form:



inp="key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""


I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:



rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"

if [[ $inp =~ $rkeyval ]]; then

key=${BASH_REMATCH[1]}
val=${BASH_REMATCH[3]}
left=${BASH_REMATCH[4]}

for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do
echo -e "$i: "${BASH_REMATCH[$i]}"";
done;
else
echo "no match"
fi


This does not work. On my Mac with Bash 4.4, there is no match:



no match


On my Red Hat Linux, I get the following output:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value key2 = the value Nb.2 "
4: "key3= "last value""


I expect the following output:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value"
4: "key3= "last value""


In other words, the key would be the second matching group, and the value the third.



This expression works on an online PHP regexp tester.



I want this to work in any Unix machine having an updated version of Bash.



I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?










share|improve this question
















I have a complex string of this form:



inp="key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""


I need to get the first key associated with its first value. I want to use bash regex to extract the key, the value, and what remains from the string:



rkeyval="[[:space:]]*([_[:alnum:]]*?)[[:space:]]*=[[:space:]]*((.*?)[[:space:]]+([_[:alnum:]]+?[[:space:]]*=[[:space:]]*.*))"

if [[ $inp =~ $rkeyval ]]; then

key=${BASH_REMATCH[1]}
val=${BASH_REMATCH[3]}
left=${BASH_REMATCH[4]}

for i in $(seq 0 $(( ${#BASH_REMATCH[*]}-1 ))); do
echo -e "$i: "${BASH_REMATCH[$i]}"";
done;
else
echo "no match"
fi


This does not work. On my Mac with Bash 4.4, there is no match:



no match


On my Red Hat Linux, I get the following output:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value key2 = the value Nb.2 "
4: "key3= "last value""


I expect the following output:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value key2 = the value Nb.2 key3= "last value""
3: "what' ever the value"
4: "key3= "last value""


In other words, the key would be the second matching group, and the value the third.



This expression works on an online PHP regexp tester.



I want this to work in any Unix machine having an updated version of Bash.



I don't know why this does not work, and why the result differ from one platform to another, even though my regex respects the Posix convention (or does it?). What am I doing wrong here?







bash regular-expression posix






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 9 '17 at 22:17









Michael Homer

47.4k8124163




47.4k8124163










asked Jan 9 '17 at 18:49









kalignekaligne

4131718




4131718













  • If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.

    – Michael Homer
    Jan 9 '17 at 20:09











  • There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!

    – kaligne
    Jan 9 '17 at 21:35



















  • If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.

    – Michael Homer
    Jan 9 '17 at 20:09











  • There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!

    – kaligne
    Jan 9 '17 at 21:35

















If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.

– Michael Homer
Jan 9 '17 at 20:09





If you "want this to work in any unix machine having an updated version of bash" then you don't want it to work on a Mac, which has Bash from over a decade ago.

– Michael Homer
Jan 9 '17 at 20:09













There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!

– kaligne
Jan 9 '17 at 21:35





There is a bit of truth in that statement. However bash supports regex since version 3.something. My mac has bash 4.4.0. Bash regex should work. If really not, at least let's find a general answer for linux!

– kaligne
Jan 9 '17 at 21:35










2 Answers
2






active

oldest

votes


















1














An asterisk is already an optional count (as it could be zero characters). There is no need to add an ? to it.



So, Will it be ok if each parenthesis will capture a key or a value?:



s='[[:space:]]*'        # spaces
n='[_[:alnum:]]+' # a valid name (limited by spaces)
e="${s}=${s}" # an equal sign (=).

rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
# 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
echo "$rkeyval"


That will capture like this:



if [[ $inp =~ $rkeyval ]]; then

i=0
while ((i<${#BASH_REMATCH[@]})); do
printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
done
else
echo "no match"
fi


Printing:



0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
1: "key1"
2: "what' ever the value"
3: "key2"
4: "the value Nb.2 "
5: "key3"
6: ""last value""


And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):



key="${BASH_REMATCH[1]}"
val="${BASH_REMATCH[@]:2:3}"
left="${BASH_REMATCH[@]:5:2}"





share|improve this answer































    2














    POSIX does not define *? for EREs, which Bash uses, instead specifying that:




    The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.




    Bash uses the system regcomp/regexec for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?.



    There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.






    share|improve this answer

























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "106"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f336072%2fbash-regex-extract-key-value%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      An asterisk is already an optional count (as it could be zero characters). There is no need to add an ? to it.



      So, Will it be ok if each parenthesis will capture a key or a value?:



      s='[[:space:]]*'        # spaces
      n='[_[:alnum:]]+' # a valid name (limited by spaces)
      e="${s}=${s}" # an equal sign (=).

      rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
      # 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
      echo "$rkeyval"


      That will capture like this:



      if [[ $inp =~ $rkeyval ]]; then

      i=0
      while ((i<${#BASH_REMATCH[@]})); do
      printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
      done
      else
      echo "no match"
      fi


      Printing:



      0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
      1: "key1"
      2: "what' ever the value"
      3: "key2"
      4: "the value Nb.2 "
      5: "key3"
      6: ""last value""


      And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):



      key="${BASH_REMATCH[1]}"
      val="${BASH_REMATCH[@]:2:3}"
      left="${BASH_REMATCH[@]:5:2}"





      share|improve this answer




























        1














        An asterisk is already an optional count (as it could be zero characters). There is no need to add an ? to it.



        So, Will it be ok if each parenthesis will capture a key or a value?:



        s='[[:space:]]*'        # spaces
        n='[_[:alnum:]]+' # a valid name (limited by spaces)
        e="${s}=${s}" # an equal sign (=).

        rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
        # 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
        echo "$rkeyval"


        That will capture like this:



        if [[ $inp =~ $rkeyval ]]; then

        i=0
        while ((i<${#BASH_REMATCH[@]})); do
        printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
        done
        else
        echo "no match"
        fi


        Printing:



        0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
        1: "key1"
        2: "what' ever the value"
        3: "key2"
        4: "the value Nb.2 "
        5: "key3"
        6: ""last value""


        And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):



        key="${BASH_REMATCH[1]}"
        val="${BASH_REMATCH[@]:2:3}"
        left="${BASH_REMATCH[@]:5:2}"





        share|improve this answer


























          1












          1








          1







          An asterisk is already an optional count (as it could be zero characters). There is no need to add an ? to it.



          So, Will it be ok if each parenthesis will capture a key or a value?:



          s='[[:space:]]*'        # spaces
          n='[_[:alnum:]]+' # a valid name (limited by spaces)
          e="${s}=${s}" # an equal sign (=).

          rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
          # 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
          echo "$rkeyval"


          That will capture like this:



          if [[ $inp =~ $rkeyval ]]; then

          i=0
          while ((i<${#BASH_REMATCH[@]})); do
          printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
          done
          else
          echo "no match"
          fi


          Printing:



          0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
          1: "key1"
          2: "what' ever the value"
          3: "key2"
          4: "the value Nb.2 "
          5: "key3"
          6: ""last value""


          And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):



          key="${BASH_REMATCH[1]}"
          val="${BASH_REMATCH[@]:2:3}"
          left="${BASH_REMATCH[@]:5:2}"





          share|improve this answer













          An asterisk is already an optional count (as it could be zero characters). There is no need to add an ? to it.



          So, Will it be ok if each parenthesis will capture a key or a value?:



          s='[[:space:]]*'        # spaces
          n='[_[:alnum:]]+' # a valid name (limited by spaces)
          e="${s}=${s}" # an equal sign (=).

          rkeyval="${s}(${n})${e}([^=]*) (${n})${e}([^=]*) (${n})${e}(.*)"
          # 1^^^^^ 2^^^^^^ 3^^^^^ 4^^^^^^ 5^^^^^ 6^^^
          echo "$rkeyval"


          That will capture like this:



          if [[ $inp =~ $rkeyval ]]; then

          i=0
          while ((i<${#BASH_REMATCH[@]})); do
          printf '%s: "%s"n' "$((i))" "${BASH_REMATCH[i++]}";
          done
          else
          echo "no match"
          fi


          Printing:



          0: "key1 =   what' ever the value key2 = the value Nb.2   key3= "last value""
          1: "key1"
          2: "what' ever the value"
          3: "key2"
          4: "the value Nb.2 "
          5: "key3"
          6: ""last value""


          And the values you want (if I understand your code correctly) could be approximated by (edit to get a perfect match):



          key="${BASH_REMATCH[1]}"
          val="${BASH_REMATCH[@]:2:3}"
          left="${BASH_REMATCH[@]:5:2}"






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 10 '17 at 20:17









          sorontarsorontar

          4,428928




          4,428928

























              2














              POSIX does not define *? for EREs, which Bash uses, instead specifying that:




              The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.




              Bash uses the system regcomp/regexec for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?.



              There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.






              share|improve this answer






























                2














                POSIX does not define *? for EREs, which Bash uses, instead specifying that:




                The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.




                Bash uses the system regcomp/regexec for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?.



                There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.






                share|improve this answer




























                  2












                  2








                  2







                  POSIX does not define *? for EREs, which Bash uses, instead specifying that:




                  The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.




                  Bash uses the system regcomp/regexec for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?.



                  There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.






                  share|improve this answer















                  POSIX does not define *? for EREs, which Bash uses, instead specifying that:




                  The behavior of multiple adjacent duplication symbols ( '+', '*', '?', and intervals) produces undefined results.




                  Bash uses the system regcomp/regexec for regular-expression matching. Apple's libc presumably does not implement the behaviour you want for *?.



                  There is no standard way to recover non-greedy matching semantics from greedy, though in this case at least some of them are unnecessary (the first [_[:alnum:]]*?, for example). Otherwise, you need to transform the expression to match something else or mutate the data in advance (and probably afterwards) to get the effect.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 39 mins ago









                  Pang

                  11915




                  11915










                  answered Jan 9 '17 at 22:12









                  Michael HomerMichael Homer

                  47.4k8124163




                  47.4k8124163






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f336072%2fbash-regex-extract-key-value%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      サソリ

                      広島県道265号伴広島線

                      Accessing regular linux commands in Huawei's Dopra Linux