Delete duplicate lines, with partial match











up vote
3
down vote

favorite












Sample text:



This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.


Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;



This is first line.
This is over_second line.
This is third line.
This is over_fourth line.


I could only come up with: over_w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.










share|improve this question
























  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
    – phils
    6 hours ago










  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
    – msinfo
    6 hours ago










  • Once this process gets complete for over_second, same should be repeated for over_fourth.
    – msinfo
    6 hours ago















up vote
3
down vote

favorite












Sample text:



This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.


Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;



This is first line.
This is over_second line.
This is third line.
This is over_fourth line.


I could only come up with: over_w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.










share|improve this question
























  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
    – phils
    6 hours ago










  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
    – msinfo
    6 hours ago










  • Once this process gets complete for over_second, same should be repeated for over_fourth.
    – msinfo
    6 hours ago













up vote
3
down vote

favorite









up vote
3
down vote

favorite











Sample text:



This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.


Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;



This is first line.
This is over_second line.
This is third line.
This is over_fourth line.


I could only come up with: over_w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.










share|improve this question















Sample text:



This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.


Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;



This is first line.
This is over_second line.
This is third line.
This is over_fourth line.


I could only come up with: over_w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.







deletion lines






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 6 hours ago









Drew

46.6k461104




46.6k461104










asked 6 hours ago









msinfo

1211




1211












  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
    – phils
    6 hours ago










  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
    – msinfo
    6 hours ago










  • Once this process gets complete for over_second, same should be repeated for over_fourth.
    – msinfo
    6 hours ago


















  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
    – phils
    6 hours ago










  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
    – msinfo
    6 hours ago










  • Once this process gets complete for over_second, same should be repeated for over_fourth.
    – msinfo
    6 hours ago
















Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
6 hours ago




Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
6 hours ago












Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
6 hours ago




Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
6 hours ago












Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
6 hours ago




Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
6 hours ago










3 Answers
3






active

oldest

votes

















up vote
1
down vote













If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



So, for over_second, call C-M-%, then enter the regular expression:



C-qC-j.*over_second.*


This will match an entire line that contains the string over_second, and includes the previous new line.



Then enter the empty string (just type <enter>) for the replacement value.



The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.






share|improve this answer




























    up vote
    1
    down vote














    1. Try delete-duplicate-lines, which is part of distributed Emacs.



    2. Emacs Wiki page Duplicate Lines might help.




      • It points to a blog post about it.


      • It explains why interactive search-and-replace might not help.


      • It explains how to do it with Lisp, in various ways.


      • It explains how to do it with the UNIX / GNU/Linux command sort or unique.









    share|improve this answer





















    • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
      – Tyler
      6 hours ago










    • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
      – Drew
      4 hours ago


















    up vote
    0
    down vote













    I could swear this is a duplicate, but I couldn't find it.



    Try this:



    (defun my-delete-duplicate-matches (regexp)
    "Delete matching lines, except the first instance of each specific match."
    (interactive (list (read-regexp "Regexp: ")))
    (save-restriction
    (when (use-region-p)
    (narrow-to-region (region-beginning) (region-end)))
    (save-excursion
    (goto-char (point-min))
    (let ((matches (make-hash-table :test #'equal)))
    (save-match-data
    (while (re-search-forward regexp nil :noerror)
    (if (not (gethash (match-string 0) matches))
    (puthash (match-string 0) t matches)
    (forward-line 0)
    (delete-region (point) (progn (forward-line 1)
    (point))))))))))


    Caveats:




    • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


    • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


    • Multi-line patterns are not supported.







    share|improve this answer























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "583"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f46324%2fdelete-duplicate-lines-with-partial-match%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



      Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



      So, for over_second, call C-M-%, then enter the regular expression:



      C-qC-j.*over_second.*


      This will match an entire line that contains the string over_second, and includes the previous new line.



      Then enter the empty string (just type <enter>) for the replacement value.



      The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



      You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.






      share|improve this answer

























        up vote
        1
        down vote













        If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



        Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



        So, for over_second, call C-M-%, then enter the regular expression:



        C-qC-j.*over_second.*


        This will match an entire line that contains the string over_second, and includes the previous new line.



        Then enter the empty string (just type <enter>) for the replacement value.



        The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



        You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.






        share|improve this answer























          up vote
          1
          down vote










          up vote
          1
          down vote









          If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



          Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



          So, for over_second, call C-M-%, then enter the regular expression:



          C-qC-j.*over_second.*


          This will match an entire line that contains the string over_second, and includes the previous new line.



          Then enter the empty string (just type <enter>) for the replacement value.



          The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



          You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.






          share|improve this answer












          If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



          Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



          So, for over_second, call C-M-%, then enter the regular expression:



          C-qC-j.*over_second.*


          This will match an entire line that contains the string over_second, and includes the previous new line.



          Then enter the empty string (just type <enter>) for the replacement value.



          The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



          You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 6 hours ago









          Tyler

          10.8k12046




          10.8k12046






















              up vote
              1
              down vote














              1. Try delete-duplicate-lines, which is part of distributed Emacs.



              2. Emacs Wiki page Duplicate Lines might help.




                • It points to a blog post about it.


                • It explains why interactive search-and-replace might not help.


                • It explains how to do it with Lisp, in various ways.


                • It explains how to do it with the UNIX / GNU/Linux command sort or unique.









              share|improve this answer





















              • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
                – Tyler
                6 hours ago










              • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
                – Drew
                4 hours ago















              up vote
              1
              down vote














              1. Try delete-duplicate-lines, which is part of distributed Emacs.



              2. Emacs Wiki page Duplicate Lines might help.




                • It points to a blog post about it.


                • It explains why interactive search-and-replace might not help.


                • It explains how to do it with Lisp, in various ways.


                • It explains how to do it with the UNIX / GNU/Linux command sort or unique.









              share|improve this answer





















              • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
                – Tyler
                6 hours ago










              • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
                – Drew
                4 hours ago













              up vote
              1
              down vote










              up vote
              1
              down vote










              1. Try delete-duplicate-lines, which is part of distributed Emacs.



              2. Emacs Wiki page Duplicate Lines might help.




                • It points to a blog post about it.


                • It explains why interactive search-and-replace might not help.


                • It explains how to do it with Lisp, in various ways.


                • It explains how to do it with the UNIX / GNU/Linux command sort or unique.









              share|improve this answer













              1. Try delete-duplicate-lines, which is part of distributed Emacs.



              2. Emacs Wiki page Duplicate Lines might help.




                • It points to a blog post about it.


                • It explains why interactive search-and-replace might not help.


                • It explains how to do it with Lisp, in various ways.


                • It explains how to do it with the UNIX / GNU/Linux command sort or unique.










              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 6 hours ago









              Drew

              46.6k461104




              46.6k461104












              • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
                – Tyler
                6 hours ago










              • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
                – Drew
                4 hours ago


















              • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
                – Tyler
                6 hours ago










              • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
                – Drew
                4 hours ago
















              These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
              – Tyler
              6 hours ago




              These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
              – Tyler
              6 hours ago












              @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
              – Drew
              4 hours ago




              @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
              – Drew
              4 hours ago










              up vote
              0
              down vote













              I could swear this is a duplicate, but I couldn't find it.



              Try this:



              (defun my-delete-duplicate-matches (regexp)
              "Delete matching lines, except the first instance of each specific match."
              (interactive (list (read-regexp "Regexp: ")))
              (save-restriction
              (when (use-region-p)
              (narrow-to-region (region-beginning) (region-end)))
              (save-excursion
              (goto-char (point-min))
              (let ((matches (make-hash-table :test #'equal)))
              (save-match-data
              (while (re-search-forward regexp nil :noerror)
              (if (not (gethash (match-string 0) matches))
              (puthash (match-string 0) t matches)
              (forward-line 0)
              (delete-region (point) (progn (forward-line 1)
              (point))))))))))


              Caveats:




              • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


              • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


              • Multi-line patterns are not supported.







              share|improve this answer



























                up vote
                0
                down vote













                I could swear this is a duplicate, but I couldn't find it.



                Try this:



                (defun my-delete-duplicate-matches (regexp)
                "Delete matching lines, except the first instance of each specific match."
                (interactive (list (read-regexp "Regexp: ")))
                (save-restriction
                (when (use-region-p)
                (narrow-to-region (region-beginning) (region-end)))
                (save-excursion
                (goto-char (point-min))
                (let ((matches (make-hash-table :test #'equal)))
                (save-match-data
                (while (re-search-forward regexp nil :noerror)
                (if (not (gethash (match-string 0) matches))
                (puthash (match-string 0) t matches)
                (forward-line 0)
                (delete-region (point) (progn (forward-line 1)
                (point))))))))))


                Caveats:




                • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


                • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


                • Multi-line patterns are not supported.







                share|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  I could swear this is a duplicate, but I couldn't find it.



                  Try this:



                  (defun my-delete-duplicate-matches (regexp)
                  "Delete matching lines, except the first instance of each specific match."
                  (interactive (list (read-regexp "Regexp: ")))
                  (save-restriction
                  (when (use-region-p)
                  (narrow-to-region (region-beginning) (region-end)))
                  (save-excursion
                  (goto-char (point-min))
                  (let ((matches (make-hash-table :test #'equal)))
                  (save-match-data
                  (while (re-search-forward regexp nil :noerror)
                  (if (not (gethash (match-string 0) matches))
                  (puthash (match-string 0) t matches)
                  (forward-line 0)
                  (delete-region (point) (progn (forward-line 1)
                  (point))))))))))


                  Caveats:




                  • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


                  • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


                  • Multi-line patterns are not supported.







                  share|improve this answer














                  I could swear this is a duplicate, but I couldn't find it.



                  Try this:



                  (defun my-delete-duplicate-matches (regexp)
                  "Delete matching lines, except the first instance of each specific match."
                  (interactive (list (read-regexp "Regexp: ")))
                  (save-restriction
                  (when (use-region-p)
                  (narrow-to-region (region-beginning) (region-end)))
                  (save-excursion
                  (goto-char (point-min))
                  (let ((matches (make-hash-table :test #'equal)))
                  (save-match-data
                  (while (re-search-forward regexp nil :noerror)
                  (if (not (gethash (match-string 0) matches))
                  (puthash (match-string 0) t matches)
                  (forward-line 0)
                  (delete-region (point) (progn (forward-line 1)
                  (point))))))))))


                  Caveats:




                  • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


                  • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


                  • Multi-line patterns are not supported.








                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 4 hours ago

























                  answered 5 hours ago









                  phils

                  25.2k23463




                  25.2k23463






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Emacs Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f46324%2fdelete-duplicate-lines-with-partial-match%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Accessing regular linux commands in Huawei's Dopra Linux

                      Can't connect RFCOMM socket: Host is down

                      Kernel panic - not syncing: Fatal Exception in Interrupt