detecting unique lines from log file











up vote
0
down vote

favorite












I have a large log file and would like to detect the patterns instead of specific lines.



for example:



/path/messages-20181116:11/15/2018 14:23:05.159|worker001|clusterm|I|userx deleted job 5018
/path/messages-20181116:11/15/2018 14:41:25.662|worker001|clusterm|I|userx deleted job 4895
/path/messages-20181116:11/15/2018 14:41:25.673|worker000|clusterm|I|userx deleted job 4890
/path/messages-20181116:11/15/2018 14:41:25.681|worker000|clusterm|I|userx deleted job 4889
11/09/2018 06:18:55.115|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.118|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.120|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.140|scheduler000|clusterm|P|PROF: job dispatching took 5.005 s (10 fast)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 1 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 5 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 3 job(s)
11/09/2018 06:18:55.145|scheduler000|clusterm|P|PROF: parallel matching 14 0438 107668
11/09/2018 06:18:55.148|scheduler000|clusterm|P|PROF: sequential matching 9 0261 8203
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting :wc =0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc=5.005
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting : wc=0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc =0.015


becomes something like below:



/path/messages-*NUMBER*:*DATE* *TIME*|worker001|clusterm|I|userx deleted job *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job profiling(low job) of *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job dispatching took *NUMBER* s (*NUMBER* fast)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: dispatched *NUMBER* job(s)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: parallel matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: sequential matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job sorting :wc =*NUMBER*s
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job dispatching: wc=*NUMBER*


which greatly reduce the number of lines and make analyzing/reading log by eye easier.



basically detecting variable words and replace them with some symbol.










share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    What steps have you tried to take on your own? What were the results? Please include this in your question.
    – Panki
    2 days ago










  • have you looked at cut and uniq?
    – ctrl-alt-delor
    2 days ago










  • Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
    – user772266
    2 days ago












  • I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
    – user772266
    2 days ago










  • I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
    – Jeff Schaller
    2 days ago















up vote
0
down vote

favorite












I have a large log file and would like to detect the patterns instead of specific lines.



for example:



/path/messages-20181116:11/15/2018 14:23:05.159|worker001|clusterm|I|userx deleted job 5018
/path/messages-20181116:11/15/2018 14:41:25.662|worker001|clusterm|I|userx deleted job 4895
/path/messages-20181116:11/15/2018 14:41:25.673|worker000|clusterm|I|userx deleted job 4890
/path/messages-20181116:11/15/2018 14:41:25.681|worker000|clusterm|I|userx deleted job 4889
11/09/2018 06:18:55.115|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.118|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.120|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.140|scheduler000|clusterm|P|PROF: job dispatching took 5.005 s (10 fast)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 1 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 5 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 3 job(s)
11/09/2018 06:18:55.145|scheduler000|clusterm|P|PROF: parallel matching 14 0438 107668
11/09/2018 06:18:55.148|scheduler000|clusterm|P|PROF: sequential matching 9 0261 8203
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting :wc =0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc=5.005
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting : wc=0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc =0.015


becomes something like below:



/path/messages-*NUMBER*:*DATE* *TIME*|worker001|clusterm|I|userx deleted job *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job profiling(low job) of *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job dispatching took *NUMBER* s (*NUMBER* fast)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: dispatched *NUMBER* job(s)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: parallel matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: sequential matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job sorting :wc =*NUMBER*s
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job dispatching: wc=*NUMBER*


which greatly reduce the number of lines and make analyzing/reading log by eye easier.



basically detecting variable words and replace them with some symbol.










share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    What steps have you tried to take on your own? What were the results? Please include this in your question.
    – Panki
    2 days ago










  • have you looked at cut and uniq?
    – ctrl-alt-delor
    2 days ago










  • Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
    – user772266
    2 days ago












  • I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
    – user772266
    2 days ago










  • I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
    – Jeff Schaller
    2 days ago













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a large log file and would like to detect the patterns instead of specific lines.



for example:



/path/messages-20181116:11/15/2018 14:23:05.159|worker001|clusterm|I|userx deleted job 5018
/path/messages-20181116:11/15/2018 14:41:25.662|worker001|clusterm|I|userx deleted job 4895
/path/messages-20181116:11/15/2018 14:41:25.673|worker000|clusterm|I|userx deleted job 4890
/path/messages-20181116:11/15/2018 14:41:25.681|worker000|clusterm|I|userx deleted job 4889
11/09/2018 06:18:55.115|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.118|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.120|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.140|scheduler000|clusterm|P|PROF: job dispatching took 5.005 s (10 fast)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 1 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 5 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 3 job(s)
11/09/2018 06:18:55.145|scheduler000|clusterm|P|PROF: parallel matching 14 0438 107668
11/09/2018 06:18:55.148|scheduler000|clusterm|P|PROF: sequential matching 9 0261 8203
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting :wc =0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc=5.005
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting : wc=0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc =0.015


becomes something like below:



/path/messages-*NUMBER*:*DATE* *TIME*|worker001|clusterm|I|userx deleted job *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job profiling(low job) of *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job dispatching took *NUMBER* s (*NUMBER* fast)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: dispatched *NUMBER* job(s)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: parallel matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: sequential matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job sorting :wc =*NUMBER*s
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job dispatching: wc=*NUMBER*


which greatly reduce the number of lines and make analyzing/reading log by eye easier.



basically detecting variable words and replace them with some symbol.










share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have a large log file and would like to detect the patterns instead of specific lines.



for example:



/path/messages-20181116:11/15/2018 14:23:05.159|worker001|clusterm|I|userx deleted job 5018
/path/messages-20181116:11/15/2018 14:41:25.662|worker001|clusterm|I|userx deleted job 4895
/path/messages-20181116:11/15/2018 14:41:25.673|worker000|clusterm|I|userx deleted job 4890
/path/messages-20181116:11/15/2018 14:41:25.681|worker000|clusterm|I|userx deleted job 4889
11/09/2018 06:18:55.115|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.118|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.120|scheduler000|clusterm|P|PROF: job profiling(low job) of 9473507.1
11/09/2018 06:18:55.140|scheduler000|clusterm|P|PROF: job dispatching took 5.005 s (10 fast)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 1 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 5 job(s)
11/09/2018 06:18:55.143|scheduler000|clusterm|P|PROF: dispatched 3 job(s)
11/09/2018 06:18:55.145|scheduler000|clusterm|P|PROF: parallel matching 14 0438 107668
11/09/2018 06:18:55.148|scheduler000|clusterm|P|PROF: sequential matching 9 0261 8203
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting :wc =0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc=5.005
11/09/2018 06:18:55.561|scheduler000|clusterm|P|PROF(1776285440): job sorting : wc=0.006s
11/09/2018 06:18:55.564|scheduler000|clusterm|P|PROF(1776285440): job dispatching: wc =0.015


becomes something like below:



/path/messages-*NUMBER*:*DATE* *TIME*|worker001|clusterm|I|userx deleted job *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job profiling(low job) of *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: job dispatching took *NUMBER* s (*NUMBER* fast)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: dispatched *NUMBER* job(s)
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: parallel matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF: sequential matching *NUMBER* *NUMBER* *NUMBER*
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job sorting :wc =*NUMBER*s
*DATE* *TIME*|scheduler*NUMBER*|clusterm|P|PROF(*NUMBER*): job dispatching: wc=*NUMBER*


which greatly reduce the number of lines and make analyzing/reading log by eye easier.



basically detecting variable words and replace them with some symbol.







command-line logs wildcards text






share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









user772266

1




1




New contributor




user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






user772266 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    What steps have you tried to take on your own? What were the results? Please include this in your question.
    – Panki
    2 days ago










  • have you looked at cut and uniq?
    – ctrl-alt-delor
    2 days ago










  • Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
    – user772266
    2 days ago












  • I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
    – user772266
    2 days ago










  • I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
    – Jeff Schaller
    2 days ago














  • 1




    What steps have you tried to take on your own? What were the results? Please include this in your question.
    – Panki
    2 days ago










  • have you looked at cut and uniq?
    – ctrl-alt-delor
    2 days ago










  • Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
    – user772266
    2 days ago












  • I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
    – user772266
    2 days ago










  • I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
    – Jeff Schaller
    2 days ago








1




1




What steps have you tried to take on your own? What were the results? Please include this in your question.
– Panki
2 days ago




What steps have you tried to take on your own? What were the results? Please include this in your question.
– Panki
2 days ago












have you looked at cut and uniq?
– ctrl-alt-delor
2 days ago




have you looked at cut and uniq?
– ctrl-alt-delor
2 days ago












Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
– user772266
2 days ago






Uniq will only work with exactly matching lines. Will not work for two same line with different time stamp. Cut you need to read the whole log file and yet you don’t know the patterns
– user772266
2 days ago














I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
– user772266
2 days ago




I do used sort -u | uniq but this shows equal lines as if two lines only differ in time stamp both will be printed.
– user772266
2 days ago












I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
– Jeff Schaller
2 days ago




I don't understand the transformations you're expecting. Do you want the dates and times replaced by *DATE* *TIME* or are those placeholders for real values of some sort? What makes a line non-unique?
– Jeff Schaller
2 days ago















active

oldest

votes











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






user772266 is a new contributor. Be nice, and check out our Code of Conduct.










 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482960%2fdetecting-unique-lines-from-log-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes








user772266 is a new contributor. Be nice, and check out our Code of Conduct.










 

draft saved


draft discarded


















user772266 is a new contributor. Be nice, and check out our Code of Conduct.













user772266 is a new contributor. Be nice, and check out our Code of Conduct.












user772266 is a new contributor. Be nice, and check out our Code of Conduct.















 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482960%2fdetecting-unique-lines-from-log-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Entries order in /etc/network/interfaces

新発田市

Grub takes very long (several minutes) to open Menu (in Multi-Boot-System)