Delete duplicate lines, with partial match
up vote
3
down vote
favorite
Sample text:
This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.
Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;
This is first line.
This is over_second line.
This is third line.
This is over_fourth line.
I could only come up with: over_w+
for selecting section of text. But don't know how to recognize duplicates, and delete whole line.
deletion lines
add a comment |
up vote
3
down vote
favorite
Sample text:
This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.
Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;
This is first line.
This is over_second line.
This is third line.
This is over_fourth line.
I could only come up with: over_w+
for selecting section of text. But don't know how to recognize duplicates, and delete whole line.
deletion lines
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
6 hours ago
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
6 hours ago
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
6 hours ago
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Sample text:
This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.
Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;
This is first line.
This is over_second line.
This is third line.
This is over_fourth line.
I could only come up with: over_w+
for selecting section of text. But don't know how to recognize duplicates, and delete whole line.
deletion lines
Sample text:
This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.
Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;
This is first line.
This is over_second line.
This is third line.
This is over_fourth line.
I could only come up with: over_w+
for selecting section of text. But don't know how to recognize duplicates, and delete whole line.
deletion lines
deletion lines
edited 6 hours ago
Drew
46.6k461104
46.6k461104
asked 6 hours ago
msinfo
1211
1211
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
6 hours ago
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
6 hours ago
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
6 hours ago
add a comment |
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
6 hours ago
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
6 hours ago
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
6 hours ago
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
6 hours ago
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
6 hours ago
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
6 hours ago
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
6 hours ago
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
6 hours ago
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
6 hours ago
add a comment |
3 Answers
3
active
oldest
votes
up vote
1
down vote
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
add a comment |
up vote
1
down vote
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
6 hours ago
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
4 hours ago
add a comment |
up vote
0
down vote
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
add a comment |
up vote
1
down vote
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
add a comment |
up vote
1
down vote
up vote
1
down vote
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
answered 6 hours ago
Tyler
10.8k12046
10.8k12046
add a comment |
add a comment |
up vote
1
down vote
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
6 hours ago
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
4 hours ago
add a comment |
up vote
1
down vote
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
6 hours ago
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
4 hours ago
add a comment |
up vote
1
down vote
up vote
1
down vote
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
answered 6 hours ago
Drew
46.6k461104
46.6k461104
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
6 hours ago
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
4 hours ago
add a comment |
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
6 hours ago
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
4 hours ago
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
6 hours ago
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
6 hours ago
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
4 hours ago
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
4 hours ago
add a comment |
up vote
0
down vote
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
add a comment |
up vote
0
down vote
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
add a comment |
up vote
0
down vote
up vote
0
down vote
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
edited 4 hours ago
answered 5 hours ago
phils
25.2k23463
25.2k23463
add a comment |
add a comment |
Thanks for contributing an answer to Emacs Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f46324%2fdelete-duplicate-lines-with-partial-match%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
6 hours ago
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
6 hours ago
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
6 hours ago