Search for most frequently occurring patterns in text (to replace them by macros)












13














Are there any ready available tools for searching the most frequently occurring text patterns (word combinations, phrases) in the text you work on, so to give ideas, which ones to replace by short macros?










share|improve this question
























  • Interesting question, though in case of "patterns" longer than, say, one word, things might get a bit difficult due to space/newline treatment, various ways to get space after a macro etc.
    – mbork
    Mar 8 '13 at 10:42










  • Often used word combinations, phrases...
    – Dennis Yurichev
    Mar 8 '13 at 10:54










  • Welcome to TeX.SX.
    – Claudio Fiandrino
    Mar 8 '13 at 10:55










  • An interesting question, but sounds like a job for an external tool...
    – cmhughes
    Mar 8 '13 at 16:19






  • 1




    I wonder what your goal is with this. For writing a document, you're better off analyzing which combinations you find significant and worth automating. You seem to be asking for a sort of "literate LZW" algorithm.
    – Ryan Reich
    May 2 '13 at 21:35
















13














Are there any ready available tools for searching the most frequently occurring text patterns (word combinations, phrases) in the text you work on, so to give ideas, which ones to replace by short macros?










share|improve this question
























  • Interesting question, though in case of "patterns" longer than, say, one word, things might get a bit difficult due to space/newline treatment, various ways to get space after a macro etc.
    – mbork
    Mar 8 '13 at 10:42










  • Often used word combinations, phrases...
    – Dennis Yurichev
    Mar 8 '13 at 10:54










  • Welcome to TeX.SX.
    – Claudio Fiandrino
    Mar 8 '13 at 10:55










  • An interesting question, but sounds like a job for an external tool...
    – cmhughes
    Mar 8 '13 at 16:19






  • 1




    I wonder what your goal is with this. For writing a document, you're better off analyzing which combinations you find significant and worth automating. You seem to be asking for a sort of "literate LZW" algorithm.
    – Ryan Reich
    May 2 '13 at 21:35














13












13








13


1





Are there any ready available tools for searching the most frequently occurring text patterns (word combinations, phrases) in the text you work on, so to give ideas, which ones to replace by short macros?










share|improve this question















Are there any ready available tools for searching the most frequently occurring text patterns (word combinations, phrases) in the text you work on, so to give ideas, which ones to replace by short macros?







macros






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 8 '13 at 11:11









Mico

273k30369756




273k30369756










asked Mar 8 '13 at 10:37









Dennis Yurichev

21017




21017












  • Interesting question, though in case of "patterns" longer than, say, one word, things might get a bit difficult due to space/newline treatment, various ways to get space after a macro etc.
    – mbork
    Mar 8 '13 at 10:42










  • Often used word combinations, phrases...
    – Dennis Yurichev
    Mar 8 '13 at 10:54










  • Welcome to TeX.SX.
    – Claudio Fiandrino
    Mar 8 '13 at 10:55










  • An interesting question, but sounds like a job for an external tool...
    – cmhughes
    Mar 8 '13 at 16:19






  • 1




    I wonder what your goal is with this. For writing a document, you're better off analyzing which combinations you find significant and worth automating. You seem to be asking for a sort of "literate LZW" algorithm.
    – Ryan Reich
    May 2 '13 at 21:35


















  • Interesting question, though in case of "patterns" longer than, say, one word, things might get a bit difficult due to space/newline treatment, various ways to get space after a macro etc.
    – mbork
    Mar 8 '13 at 10:42










  • Often used word combinations, phrases...
    – Dennis Yurichev
    Mar 8 '13 at 10:54










  • Welcome to TeX.SX.
    – Claudio Fiandrino
    Mar 8 '13 at 10:55










  • An interesting question, but sounds like a job for an external tool...
    – cmhughes
    Mar 8 '13 at 16:19






  • 1




    I wonder what your goal is with this. For writing a document, you're better off analyzing which combinations you find significant and worth automating. You seem to be asking for a sort of "literate LZW" algorithm.
    – Ryan Reich
    May 2 '13 at 21:35
















Interesting question, though in case of "patterns" longer than, say, one word, things might get a bit difficult due to space/newline treatment, various ways to get space after a macro etc.
– mbork
Mar 8 '13 at 10:42




Interesting question, though in case of "patterns" longer than, say, one word, things might get a bit difficult due to space/newline treatment, various ways to get space after a macro etc.
– mbork
Mar 8 '13 at 10:42












Often used word combinations, phrases...
– Dennis Yurichev
Mar 8 '13 at 10:54




Often used word combinations, phrases...
– Dennis Yurichev
Mar 8 '13 at 10:54












Welcome to TeX.SX.
– Claudio Fiandrino
Mar 8 '13 at 10:55




Welcome to TeX.SX.
– Claudio Fiandrino
Mar 8 '13 at 10:55












An interesting question, but sounds like a job for an external tool...
– cmhughes
Mar 8 '13 at 16:19




An interesting question, but sounds like a job for an external tool...
– cmhughes
Mar 8 '13 at 16:19




1




1




I wonder what your goal is with this. For writing a document, you're better off analyzing which combinations you find significant and worth automating. You seem to be asking for a sort of "literate LZW" algorithm.
– Ryan Reich
May 2 '13 at 21:35




I wonder what your goal is with this. For writing a document, you're better off analyzing which combinations you find significant and worth automating. You seem to be asking for a sort of "literate LZW" algorithm.
– Ryan Reich
May 2 '13 at 21:35










2 Answers
2






active

oldest

votes


















22














EDIT: I had implemented some version of the Lempel-Ziv-Welch algorithm, following a comment of Ryan Reich. I've now changed the code completely and I don't know if the algorithm has a name.



I fed the new code the 12 days of Christmas, as David Carlisle asked, and obtained the following result (for plain TeX)



longdefA#1#2{def#1{#2}A}AAG{sparACAAB}AH{AEWXVTU mVAD}AA
{CDEFG}AB{HIJAparbigskipB}AC{OPparMNKL nd}AD{RSparQ
A ring}AE{Y es dancV}AF{tZ a leapV}B{par On the }C{ day of C}D
{hristmas }E{my true l}F{ove gave }G{to mepar}H{a partrid}I{ge in a p}J
{ear tree.}K{two turtl}L{e dovespar a}M{three fre}N{nch henspar}O{four
call}P{ing birds}Q{five gold}R{six geese}S{ a laying}T{seven swa}U{ns a
swim}V{ingpar}W{eight mai}X{ds a milk}Y{nine ladi}Z{en lords } A{ }B
firstAAAB secondAAKL nd AB thirdAAMNKL nd AB fourthAAACAAB
fifthAAQA ringAG sixthAAADAG seventhAATU mVADAG eighthAAWXVT
U mVADAG ninthAAAHAG tenthAAAFAHAG eleventhAA eleven pipers pipV
AFAHAG twelfthAA twelve drummers drummV eleven pipers pipVAFAH spar
ACAHIJAbye


which is 878 bytes long. This should be compared to the 767 bytes of xii.tex, which produces the same lyrics.



RequirePackage{expl3}
ExplSyntaxOn
cs_generate_variant:Nn tl_replace_all:Nnn { Nxx, NVc, Nxn }
cs_generate_variant:Nn tl_replace_once:Nnn { Nxx }
cs_generate_variant:Nn tl_if_in:NnT { No }
cs_generate_variant:Nn int_min:nn { c, cc }
cs_generate_variant:Nn tl_if_eq:NNTF { cc }
cs_generate_variant:Nn tl_if_eq:nnTF { xx }
cs_generate_variant:Nn tl_if_eq:nnT { xx }
cs_generate_variant:Nn str_count:N { c }
cs_generate_variant:Nn regex_match:nnF { nV }
cs_generate_variant:Nn str_set:Nn { NV }
str_new:N l__find_prefix_str
int_new:N l__find_prefix_int
tl_new:N l__find_tl
tl_new:N l__find_chunks_tl
int_step_inline:nnnn { 1 } { 1 } { 9 }
{ seq_new:c { l__find_chunks_#1_seq } }
int_new:N l__find_common_int
int_new:N l__find_nesting_int
tl_new:N l__find_previous_tl
seq_new:N l__find_chunks_seq
int_new:N l__find_best_score_int
int_new:N l__find_macro_int
tl_new:N l__find_macros_tl
tl_new:N l__find_result_tl
int_new:N l__find_length_int
int_new:N l__find_previous_length_int
tl_new:N l__find_display_tl

cs_new_protected_nopar:Npn dot: { tex_message:D { . } }

cs_new_protected:Npn find_matches:nnN #1#2#3
{
% '#1' is the prefix, '#2' is the token list to study.
%
__find_set_prefix:n {#1}
tl_set:Nn l__find_tl { ~! #2 }
__find_escape_spaces:xN { l__find_prefix_str A } l__find_tl
int_set_eq:NN l__find_macro_int c_one
__find_get_length:V l__find_prefix_str
iow_term:x { int_use:N l__find_length_int }
int_set_eq:NN l__find_length_int c_max_int
__find_matches_aux:V l__find_prefix_str
tl_replace_once:Nnn l__find_tl { ! } { { ~ } }
tl_set:Nx #3
{
__find_preamble:
exp_not:V l__find_tl
}
}
cs_new_protected:Npn __find_escape_spaces:nN #1#2
{
regex_replace_all:nnN { cS } { c{#1} } #2
dot:
}
cs_generate_variant:Nn __find_escape_spaces:nN { x }
cs_new_nopar:Npn __find_preamble:
{
exp_not:n { long def } exp_not:c { l__find_prefix_str A } ####1####2
{
exp_not:N def ####1{####2}
exp_not:c { l__find_prefix_str A }
}
exp_not:c { l__find_prefix_str A }
}
cs_new_protected:Npn __find_matches_aux:n #1
{
int_set_eq:NN l__find_previous_length_int l__find_length_int
tl_set_eq:NN l__find_previous_tl l__find_tl
__find_escape_tl:nN {#1} l__find_tl
int_compare:nNnTF { tl_count:N l__find_tl } < c_nine
{ tl_set_eq:Nn l__find_tl l__find_previous_tl }
{
__find_set_chunks:
__find_sort_chunks:
__find_common:
__find_best_macros:
__find_undefine_tmp:
tl_set_eq:NN l__find_tl l__find_result_tl
__find_unescape_tl:nN {#1} l__find_tl
__find_get_length:n {#1}
iow_term:x { int_use:N l__find_length_int }
int_compare:nNnTF l__find_length_int < l__find_previous_length_int
{ __find_matches_aux:n {#1} }
{
iow_term:n { }
iow_term:x { l__find_display_tl }
iow_term:n { }
tl_set_eq:NN l__find_tl l__find_previous_tl
}
}
}
cs_generate_variant:Nn __find_matches_aux:n { V }

cs_new_protected:Npn __find_get_length:n #1
{
tl_set_eq:NN l__find_display_tl l__find_tl
tl_replace_once:Nxx l__find_display_tl
{ exp_not:c { #1 A } ! }
{ ~ exp_not:c { #1 A } {~} }
str_set:NV l__find_display_tl l__find_display_tl
tl_replace_all:Nxn l__find_display_tl
{ c_backslash_str #1 token_to_str:N A ~ } c_space_tl
tl_replace_all:Nnn l__find_display_tl
{ ~ c_space_tl } { ~ exp_not:c { #1 A } }
dot:
str_set:Nx l__find_display_tl { __find_preamble: l__find_display_tl }
tl_replace_all:Nxx l__find_display_tl
{ c_hash_str c_hash_str } { c_hash_str }
dot:
regex_replace_all:nnN
{ (\[A-Za-z]+) ([A-Za-z]) }
{ 1 2 }
l__find_display_tl
dot:
regex_replace_all:nnN
{ (\[A-Za-z]+) }
{ 1 c{c__iow_wrap_zwbs_marker_tl} }
l__find_display_tl
dot:
iow_wrap:nnnN { l__find_display_tl } { } { } __find_get_length_aux:n
}
cs_generate_variant:Nn __find_get_length:n { V }
cs_new_protected:Npn __find_get_length_aux:n #1
{
int_set:Nn l__find_length_int { str_count:n {#1} }
tl_set:Nn l__find_display_tl {#1}
% iow_term:n { }
% iow_term:n {#1}
% iow_term:n { }
}



cs_new_protected:Npn __find_set_prefix:n #1
{
% Check that the prefix |#1| is made only of alphabetic characters.
%
str_set:Nx l__find_prefix_str {#1}
int_set:Nn l__find_prefix_int { str_count:N l__find_prefix_str }
regex_match:nVF { Aw*Z } l__find_prefix_str
{
msg_error:nnx { find } { invalid-prefix }
{ l__find_prefix_str }
}
dot:
}

cs_new_protected:Npn __find_escape_tl:nN #1#2
{
% During the 'study' step, we manipulate the token list |#2|
% with all begin-group and end-group tokens replaced by a
% control sequence built from the prefix. We must change both
% begin-group and end-group tokens in one pass, to avoid getting an
% unbalanced result. Also replace macro parameters because they
% cannot be used as delimiters for macros. Spaces have been
% turned into a control sequence earlier. At this stage, every
% token in |#2| can be grabbed as an N-type argument.
%
regex_replace_all:nnN { cB. } { cB{ } #2
dot:
regex_replace_all:nnN { cE. } { cE} } #2
dot:
regex_replace_all:nnN { cP. } { c{ #1 # } } #2
dot:
regex_replace_all:nnN { c[BEP]. } { c{ #1 } } #2
dot:
}




cs_new_protected_nopar:Npn __find_set_chunks:
{
% Build a token list whose items are each nine consecutive tokens
% of the original token list, in a running window. So for instance
% |ABCDEFGHIJKL| would lead to the following (12) items:
% |ABCDEFGHI|, |BCDEFGHIJ|, |CDEFGHIJK|, |DEFGHIJKL|, |EFGHIJKL|,
% |FGHIJKL|, |GHIJKL|, |HIJKL|, |IJKL|, |JKL|, |KL|, |L|. The items
% of this token list are built in an |x|-expanded loop.
% A special case arises if the |find| token list is too short to
% safely perform the loop: then our whole algorithm is not going to
% do any good anyways, so we build an empty chunk list.
%
tl_set:Nx l__find_chunks_tl
{
exp_after:wN __find_set_chunks_loop:NNNNNNNNN l__find_tl
q_recursion_tail q_recursion_stop
}
}
cs_new:Npn __find_set_chunks_loop:NNNNNNNNN #1#2#3#4#5#6#7#8#9
{
% As usual in a TeX loop, first check for the end-loop marker (here,
% cs{q_recursion_tail}). If it is reached, we fill in the last few
% chunks (which become shorter and shorter as we go). Otherwise,
% add (to the token list we are building) an item containing (9)
% tokens, and loop, dropping the first of the items.
%
quark_if_recursion_tail_stop_do:Nn #9
{ __find_set_chunks_end:NNNNNNNN #1 #2 #3 #4 #5 #6 #7 #8 }
{ exp_not:n { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
__find_set_chunks_loop:NNNNNNNNN #2#3#4#5#6#7#8#9
}
cs_new:Npn __find_set_chunks_end:NNNNNNNN #1#2#3#4#5#6#7#8
{
exp_not:n
{
{ #1 #2 #3 #4 #5 #6 #7 #8 }
{ #2 #3 #4 #5 #6 #7 #8 }
{ #3 #4 #5 #6 #7 #8 }
{ #4 #5 #6 #7 #8 }
{ #5 #6 #7 #8 }
{ #6 #7 #8 }
{ #7 #8 }
{ #8 }
}
}
cs_new_protected:Npn __find_sort_chunks:
{
tl_sort:Nn l__find_chunks_tl
{
int_compare:nNnTF
{
pdftex_strcmp:D
{ exp_not:n {##1} }
{ exp_not:n {##2} }
}
> c_zero
{ sort_return_swapped: }
{ sort_return_same: }
}
}

cs_new_protected:Npn __find_common:
{
int_step_inline:nnnn { 1 } { 1 } { 9 }
{ seq_clear:c { l__find_chunks_##1_seq } }
exp_after:wN __find_common_loop:nn l__find_chunks_tl
q_recursion_tail q_recursion_tail q_recursion_stop
int_step_inline:nnnn { 4 } { 1 } { 9 }
{
seq_map_inline:cn { l__find_chunks_##1_seq }
{
tl_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {####1} ' _tl }
{
tl_put_right:cn
{ l__find_chunk_ ' tl_to_str:n {####1} ' _tl } { i }
}
{
cs_set_eq:cN
{ l__find_chunk_ ' tl_to_str:n {####1} ' _tl } c_empty_tl
}
}
}
}

cs_new_protected:Npn __find_common_loop:nn #1#2
{
quark_if_recursion_tail_stop:n {#2}
int_zero:N l__find_common_int
__find_count_common_aux:nn {#1} {#2}
use:c { __find_common_ int_use:N l__find_common_int :w }
#1 X X X X X X X X X q_stop
__find_common_loop:nn {#2}
}
cs_new_protected:Npn __find_count_common_aux:nn #1#2
{
tl_if_empty:nF {#1}
{
tl_if_empty:nF {#2}
{
tl_if_eq:xxT { tl_head:n {#1} } { tl_head:n {#2} }
{
int_incr:N l__find_common_int
__find_count_common_aux:xx
{ tl_tail:n {#1} } { tl_tail:n {#2} }
}
}
}
}
cs_generate_variant:Nn __find_count_common_aux:nn { xx }

cs_new_eq:cN { __find_common_0:w } use_none_delimit_by_q_stop:w
cs_new_protected:cpn { __find_common_1:w } #1
{ __find_common_auxii:nnw { 1 } {#1} }
cs_new_protected:cpn { __find_common_2:w } #1#2
{ __find_common_auxii:nnw { 2 } { #1 #2 } }
cs_new_protected:cpn { __find_common_3:w } #1#2#3
{ __find_common_auxii:nnw { 3 } { #1 #2 #3 } }
cs_new_protected:cpn { __find_common_4:w } #1#2#3#4
{ __find_common_auxii:nnw { 4 } { #1 #2 #3 #4 } }
cs_new_protected:cpn { __find_common_5:w } #1#2#3#4#5
{
dot:
__find_common_auxii:nnw { 5 } { #1 #2 #3 #4 #5 }
}
cs_new_protected:cpn { __find_common_6:w } #1#2#3#4#5#6
{ __find_common_auxii:nnw { 6 } { #1 #2 #3 #4 #5 #6 } }
cs_new_protected:cpn { __find_common_7:w } #1#2#3#4#5#6#7
{ __find_common_auxii:nnw { 7 } { #1 #2 #3 #4 #5 #6 #7 } }
cs_new_protected:cpn { __find_common_8:w } #1#2#3#4#5#6#7#8
{ __find_common_auxii:nnw { 8 } { #1 #2 #3 #4 #5 #6 #7 #8 } }
cs_new_protected:cpn { __find_common_9:w } #1#2#3#4#5#6#7#8#9
{ __find_common_auxii:nnw { 9 } { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
cs_new_protected:Npn __find_common_auxii:nnw #1#2#3 q_stop
{
int_zero:N l__find_nesting_int
tl_map_inline:nn {#2}
{
str_case_x:nnn { exp_not:n {##1} }
{
{ exp_not:c { l__find_prefix_str c_lbrace_str } }
{ int_incr:N l__find_nesting_int }
{ exp_not:c { l__find_prefix_str c_rbrace_str } }
{
int_compare:nNnF l__find_nesting_int > c_zero
{ use_none_delimit_by_q_stop:w }
int_decr:N l__find_nesting_int
}
}
{ }
}
int_compare:nNnF l__find_nesting_int = c_zero
{ use_none_delimit_by_q_stop:w }
seq_put_right:cn { l__find_chunks_#1_seq } {#2}
use_none_delimit_by_q_stop:w
q_stop
}

cs_new_protected_nopar:Npn __find_best_macros:
{
tl_clear:N l__find_macros_tl
tl_clear:N l__find_result_tl
__find_best_macros_aux:
tl_put_left:NV l__find_result_tl l__find_macros_tl
}
cs_new_protected:Npn __find_best_macros_aux:
{
exp_after:wN __find_best_macros_auxii:NNNNNNNNN l__find_tl
q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_stop
tl_if_empty:NF l__find_tl { __find_best_macros_aux: }
}
cs_new_protected:Npn __find_best_macros_auxii:NNNNNNNNN #1#2#3#4#5#6#7#8#9
{
int_zero:N l__find_best_score_int
tl_clear:N l__find_best_chunk_tl
tl_map_inline:nn
{
{#1} {#1#2} {#1#2#3} {#1#2#3#4} {#1#2#3#4#5} {#1#2#3#4#5#6}
{#1#2#3#4#5#6#7} {#1#2#3#4#5#6#7#8} {#1#2#3#4#5#6#7#8#9}
}
{
int_compare:nNnT
{ __find_score:n {##1} } > l__find_best_score_int
{
tl_set:Nn l__find_best_chunk_tl {##1}
int_set:Nn l__find_best_score_int
{ __find_score:n {##1} }
}
}
tl_if_empty:NF l__find_best_chunk_tl
{
int_incr:N l__find_macro_int
tl_put_right:Nx l__find_macros_tl
{
exp_not:c
{ l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
{ exp_not:V l__find_best_chunk_tl }
}
tl_replace_all:NVc l__find_tl l__find_best_chunk_tl
{ l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
dot:
}
tl_put_right:Nx l__find_result_tl { tl_head:N l__find_tl }
tl_set:Nx l__find_tl { tl_tail:N l__find_tl }
use_none_delimit_by_q_stop:w
}
cs_new:Npn __find_score:n #1
{
% Turning ####1 into a length (p+2) macro
% (e.g. |<prefix>A|) saves this number of chars.
% Good if non-negative.
cs_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
{
int_eval:n
{
tl_count:c { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
* ( str_count:n {#1} - 3 - l__find_prefix_int )
- 2 * l__find_prefix_int - 9
}
}
{ -1 }
}

cs_new_protected:Npn __find_undefine_tmp:
{
int_step_inline:nnnn { 4 } { 1 } { 9 }
{
seq_map_inline:cn { l__find_chunks_##1_seq }
{ cs_undefine:c { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } }
}
}

cs_new_protected:Npn __find_unescape_tl:nN #1#2
{
regex_replace_all:nnN { c{#1{} } { cB{ cE? } #2
dot:
regex_replace_all:nnN { c{#1}} } { cB? cE} } #2
dot:
regex_replace_all:nnN { c[BE]? } { } #2
dot:
regex_replace_all:nnN { c{#1#} } { cP# } #2
dot:
}

%%%% Hack LaTeX3's iow_wrap to allow zero-width breaking spaces
tl_const:Nx c__iow_wrap_zwbs_marker_tl
{
c_catcode_other_space_tl
c__iow_wrap_marker_tl
c_catcode_other_space_tl
zwbs
c_catcode_other_space_tl
}
cs_new_protected_nopar:Npn __iow_wrap_zwbs:
{
tl_put_right:Nn l__iow_current_line_tl
{ tex_romannumeral:D -`0 }
int_decr:N l__iow_current_line_int
}
%%% End of hack.

msg_new:nnn { find } { invalid-prefix }
{ The~prefix~used,~'#1',~must~be~made~of~letters. }

cs_set_protected:Npn FindMatches #1#2
{
find_matches:nnN {#1} {#2} l_tmpa_tl
tl_use:N l_tmpa_tl
}
ExplSyntaxOff

% Set up some basic things to behave a bit like plain TeX.
%
documentclass{article}
begin{document}
setlength{parindent}{0pt}
defbye{end{document}}
%
% End of setup.

FindMatches{}{% Let's go!

On the first day of Christmas my true love gave to mepar
a partridge in a pear tree.

bigskip

On the second day of Christmas my true love gave to mepar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the third day of Christmas my true love gave to mepar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the fourth day of Christmas my true love gave to mepar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the fifth day of Christmas my true love gave to mepar
five gold ringspar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the sixth day of Christmas my true love gave to mepar
six geese a layingpar
five gold ringspar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the seventh day of Christmas my true love gave to mepar
seven swans a swimmingpar
six geese a layingpar
five gold ringspar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the eighth day of Christmas my true love gave to mepar
eight maids a milkingpar
seven swans a swimmingpar
six geese a layingpar
five gold ringspar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the ninth day of Christmas my true love gave to mepar
nine ladies dancingpar
eight maids a milkingpar
seven swans a swimmingpar
six geese a layingpar
five gold ringspar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the tenth day of Christmas my true love gave to mepar
ten lords a leapingpar
nine ladies dancingpar
eight maids a milkingpar
seven swans a swimmingpar
six geese a layingpar
five gold ringspar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the eleventh day of Christmas my true love gave to mepar
eleven pipers pipingpar
ten lords a leapingpar
nine ladies dancingpar
eight maids a milkingpar
seven swans a swimmingpar
six geese a layingpar
five gold ringspar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.

bigskip

On the twelfth day of Christmas my true love gave to mepar
twelve drummers drummingpar
eleven pipers pipingpar
ten lords a leapingpar
nine ladies dancingpar
eight maids a milkingpar
seven swans a swimmingpar
six geese a layingpar
five gold ringspar
four calling birdspar
three french henspar
two turtle dovespar
and a partridge in a pear tree.
bye
}





share|improve this answer



















  • 5




    Awesome as always, Bruno! :)
    – Paulo Cereda
    May 3 '13 at 0:54






  • 5




    This is one of the more outrageous things I've ever seen here. (It is very impressive!)
    – Ryan Reich
    May 3 '13 at 0:56










  • Hehe! Thanks @RyanReich. By the way, do you know of any algorithm in the vein of LZW that would be able to produce macros with varying arguments?
    – Bruno Le Floch
    May 3 '13 at 1:10












  • @Bruno All I know about LZW is the idea, unfortunately. You mean, you want the "dictionary" to be macros that take an argument, or that take a variable number of arguments? Isn't this the same as having an array and then having the macro argument index into it?
    – Ryan Reich
    May 3 '13 at 4:15






  • 2




    @BrunoLeFloch no wonder we get stalled on the middle layer of L3, but at least nobody can any longer deny that expl3 is up to arbitrary programming tasks :-)
    – Frank Mittelbach
    May 4 '13 at 20:16



















5














What I present is a total kludge, stealing things from my titlecaps package. So the format is not optimal, but it might give a place from which to proceed. Nice thing is that punctuation is screened out of the search in the target-text.



When a search-string and target-text (not exceeding one paragraph) is passed to the seekphrase, it will output where each word of the search-phrase appears in the target-text. I output it as a pair "target-location:search-word-index". To try to make sense of it, I output a [ before a search-word-index=1 match and a ]after a search-word-index=n match. Thus to qualify as a matching "phrase" in, for example, a 3-word search, the output would have to contain an instance of something like [214:1 215:2, 216:3]



Further refinement is clearly possible, but it would take more time than I have at the moment.



documentclass{article}
usepackage{titlecaps}
usepackage{lipsum}
makeatletter
renewcommandseek@lcwords{%
kill@punct%
setcounter{word@count}{0}%
whiledo{value{word@count} < narg}{%
addtocounter{word@count}{1}%
protected@edefcurrent@word{csname argroman{word@count}endcsname}%
deffound@word{F}%
setcounter{lcword@index}{0}%
expandafterdefcsname%
found@wordroman{word@count}endcsname{F}%
whiledo{value{lcword@index} < value{lc@words}}{%
addtocounter{lcword@index}{1}%
protected@edefcurrent@lcword{%
csname lcwordroman{lcword@index}endcsname}%
%% THE FOLLOWING THREE LINES ARE FROM DAVID CARLISLE
protected@edeftmp{noexpandscantokens{defnoexpandtmp%
{noexpandifthenelse{noexpandequal{current@word}{current@lcword}}}}}%
tmpifhmodeunskipfitmp
%%
{expandafterdefcsname%
found@wordroman{word@count}endcsname{T}%
ifthenelse{equal{value{lcword@index}}{1}}{[}{}%
arabic{word@count}:arabic{lcword@index}%
ifthenelse{equal{value{lcword@index}}{value{lc@words}}}{]}{ }%
setcounter{lcword@index}{value{lc@words}}}%
{}%
}%
}%
restore@punct%
}
letgetargsCget@argsC

newcommandseekphrase[2]{%
Seeking phrase ``#1'':\%
Addlcwords{#1}%
redefine@tertius%
getargsC{#2}%
seek@lcwords%
Resetlcwords%
par%
}
makeatother

begin{document}

lipsum[1]
lipsum[1]

defx{%
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
Pellentesque habitant morbi tristique senectus et netus et malesuada
fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
sit amet tortor gravida placerat. Integer sapien est, iaculis in,
pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
diam. Duis eget orci sit amet orci dignissim rutrum.
%
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
Pellentesque habitant morbi tristique senectus et netus et malesuada
fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
sit amet tortor gravida placerat. Integer sapien est, iaculis in,
pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
diam. Duis eget orci sit amet orci dignissim rutrum.
}

seekphrase{et}{x}

seekphrase{et netus}{x}

seekphrase{bibendum Aenean}{x}

seekphrase{eget sem vel}{x}

end{document}


enter image description here






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "85"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f101434%2fsearch-for-most-frequently-occurring-patterns-in-text-to-replace-them-by-macros%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    22














    EDIT: I had implemented some version of the Lempel-Ziv-Welch algorithm, following a comment of Ryan Reich. I've now changed the code completely and I don't know if the algorithm has a name.



    I fed the new code the 12 days of Christmas, as David Carlisle asked, and obtained the following result (for plain TeX)



    longdefA#1#2{def#1{#2}A}AAG{sparACAAB}AH{AEWXVTU mVAD}AA
    {CDEFG}AB{HIJAparbigskipB}AC{OPparMNKL nd}AD{RSparQ
    A ring}AE{Y es dancV}AF{tZ a leapV}B{par On the }C{ day of C}D
    {hristmas }E{my true l}F{ove gave }G{to mepar}H{a partrid}I{ge in a p}J
    {ear tree.}K{two turtl}L{e dovespar a}M{three fre}N{nch henspar}O{four
    call}P{ing birds}Q{five gold}R{six geese}S{ a laying}T{seven swa}U{ns a
    swim}V{ingpar}W{eight mai}X{ds a milk}Y{nine ladi}Z{en lords } A{ }B
    firstAAAB secondAAKL nd AB thirdAAMNKL nd AB fourthAAACAAB
    fifthAAQA ringAG sixthAAADAG seventhAATU mVADAG eighthAAWXVT
    U mVADAG ninthAAAHAG tenthAAAFAHAG eleventhAA eleven pipers pipV
    AFAHAG twelfthAA twelve drummers drummV eleven pipers pipVAFAH spar
    ACAHIJAbye


    which is 878 bytes long. This should be compared to the 767 bytes of xii.tex, which produces the same lyrics.



    RequirePackage{expl3}
    ExplSyntaxOn
    cs_generate_variant:Nn tl_replace_all:Nnn { Nxx, NVc, Nxn }
    cs_generate_variant:Nn tl_replace_once:Nnn { Nxx }
    cs_generate_variant:Nn tl_if_in:NnT { No }
    cs_generate_variant:Nn int_min:nn { c, cc }
    cs_generate_variant:Nn tl_if_eq:NNTF { cc }
    cs_generate_variant:Nn tl_if_eq:nnTF { xx }
    cs_generate_variant:Nn tl_if_eq:nnT { xx }
    cs_generate_variant:Nn str_count:N { c }
    cs_generate_variant:Nn regex_match:nnF { nV }
    cs_generate_variant:Nn str_set:Nn { NV }
    str_new:N l__find_prefix_str
    int_new:N l__find_prefix_int
    tl_new:N l__find_tl
    tl_new:N l__find_chunks_tl
    int_step_inline:nnnn { 1 } { 1 } { 9 }
    { seq_new:c { l__find_chunks_#1_seq } }
    int_new:N l__find_common_int
    int_new:N l__find_nesting_int
    tl_new:N l__find_previous_tl
    seq_new:N l__find_chunks_seq
    int_new:N l__find_best_score_int
    int_new:N l__find_macro_int
    tl_new:N l__find_macros_tl
    tl_new:N l__find_result_tl
    int_new:N l__find_length_int
    int_new:N l__find_previous_length_int
    tl_new:N l__find_display_tl

    cs_new_protected_nopar:Npn dot: { tex_message:D { . } }

    cs_new_protected:Npn find_matches:nnN #1#2#3
    {
    % '#1' is the prefix, '#2' is the token list to study.
    %
    __find_set_prefix:n {#1}
    tl_set:Nn l__find_tl { ~! #2 }
    __find_escape_spaces:xN { l__find_prefix_str A } l__find_tl
    int_set_eq:NN l__find_macro_int c_one
    __find_get_length:V l__find_prefix_str
    iow_term:x { int_use:N l__find_length_int }
    int_set_eq:NN l__find_length_int c_max_int
    __find_matches_aux:V l__find_prefix_str
    tl_replace_once:Nnn l__find_tl { ! } { { ~ } }
    tl_set:Nx #3
    {
    __find_preamble:
    exp_not:V l__find_tl
    }
    }
    cs_new_protected:Npn __find_escape_spaces:nN #1#2
    {
    regex_replace_all:nnN { cS } { c{#1} } #2
    dot:
    }
    cs_generate_variant:Nn __find_escape_spaces:nN { x }
    cs_new_nopar:Npn __find_preamble:
    {
    exp_not:n { long def } exp_not:c { l__find_prefix_str A } ####1####2
    {
    exp_not:N def ####1{####2}
    exp_not:c { l__find_prefix_str A }
    }
    exp_not:c { l__find_prefix_str A }
    }
    cs_new_protected:Npn __find_matches_aux:n #1
    {
    int_set_eq:NN l__find_previous_length_int l__find_length_int
    tl_set_eq:NN l__find_previous_tl l__find_tl
    __find_escape_tl:nN {#1} l__find_tl
    int_compare:nNnTF { tl_count:N l__find_tl } < c_nine
    { tl_set_eq:Nn l__find_tl l__find_previous_tl }
    {
    __find_set_chunks:
    __find_sort_chunks:
    __find_common:
    __find_best_macros:
    __find_undefine_tmp:
    tl_set_eq:NN l__find_tl l__find_result_tl
    __find_unescape_tl:nN {#1} l__find_tl
    __find_get_length:n {#1}
    iow_term:x { int_use:N l__find_length_int }
    int_compare:nNnTF l__find_length_int < l__find_previous_length_int
    { __find_matches_aux:n {#1} }
    {
    iow_term:n { }
    iow_term:x { l__find_display_tl }
    iow_term:n { }
    tl_set_eq:NN l__find_tl l__find_previous_tl
    }
    }
    }
    cs_generate_variant:Nn __find_matches_aux:n { V }

    cs_new_protected:Npn __find_get_length:n #1
    {
    tl_set_eq:NN l__find_display_tl l__find_tl
    tl_replace_once:Nxx l__find_display_tl
    { exp_not:c { #1 A } ! }
    { ~ exp_not:c { #1 A } {~} }
    str_set:NV l__find_display_tl l__find_display_tl
    tl_replace_all:Nxn l__find_display_tl
    { c_backslash_str #1 token_to_str:N A ~ } c_space_tl
    tl_replace_all:Nnn l__find_display_tl
    { ~ c_space_tl } { ~ exp_not:c { #1 A } }
    dot:
    str_set:Nx l__find_display_tl { __find_preamble: l__find_display_tl }
    tl_replace_all:Nxx l__find_display_tl
    { c_hash_str c_hash_str } { c_hash_str }
    dot:
    regex_replace_all:nnN
    { (\[A-Za-z]+) ([A-Za-z]) }
    { 1 2 }
    l__find_display_tl
    dot:
    regex_replace_all:nnN
    { (\[A-Za-z]+) }
    { 1 c{c__iow_wrap_zwbs_marker_tl} }
    l__find_display_tl
    dot:
    iow_wrap:nnnN { l__find_display_tl } { } { } __find_get_length_aux:n
    }
    cs_generate_variant:Nn __find_get_length:n { V }
    cs_new_protected:Npn __find_get_length_aux:n #1
    {
    int_set:Nn l__find_length_int { str_count:n {#1} }
    tl_set:Nn l__find_display_tl {#1}
    % iow_term:n { }
    % iow_term:n {#1}
    % iow_term:n { }
    }



    cs_new_protected:Npn __find_set_prefix:n #1
    {
    % Check that the prefix |#1| is made only of alphabetic characters.
    %
    str_set:Nx l__find_prefix_str {#1}
    int_set:Nn l__find_prefix_int { str_count:N l__find_prefix_str }
    regex_match:nVF { Aw*Z } l__find_prefix_str
    {
    msg_error:nnx { find } { invalid-prefix }
    { l__find_prefix_str }
    }
    dot:
    }

    cs_new_protected:Npn __find_escape_tl:nN #1#2
    {
    % During the 'study' step, we manipulate the token list |#2|
    % with all begin-group and end-group tokens replaced by a
    % control sequence built from the prefix. We must change both
    % begin-group and end-group tokens in one pass, to avoid getting an
    % unbalanced result. Also replace macro parameters because they
    % cannot be used as delimiters for macros. Spaces have been
    % turned into a control sequence earlier. At this stage, every
    % token in |#2| can be grabbed as an N-type argument.
    %
    regex_replace_all:nnN { cB. } { cB{ } #2
    dot:
    regex_replace_all:nnN { cE. } { cE} } #2
    dot:
    regex_replace_all:nnN { cP. } { c{ #1 # } } #2
    dot:
    regex_replace_all:nnN { c[BEP]. } { c{ #1 } } #2
    dot:
    }




    cs_new_protected_nopar:Npn __find_set_chunks:
    {
    % Build a token list whose items are each nine consecutive tokens
    % of the original token list, in a running window. So for instance
    % |ABCDEFGHIJKL| would lead to the following (12) items:
    % |ABCDEFGHI|, |BCDEFGHIJ|, |CDEFGHIJK|, |DEFGHIJKL|, |EFGHIJKL|,
    % |FGHIJKL|, |GHIJKL|, |HIJKL|, |IJKL|, |JKL|, |KL|, |L|. The items
    % of this token list are built in an |x|-expanded loop.
    % A special case arises if the |find| token list is too short to
    % safely perform the loop: then our whole algorithm is not going to
    % do any good anyways, so we build an empty chunk list.
    %
    tl_set:Nx l__find_chunks_tl
    {
    exp_after:wN __find_set_chunks_loop:NNNNNNNNN l__find_tl
    q_recursion_tail q_recursion_stop
    }
    }
    cs_new:Npn __find_set_chunks_loop:NNNNNNNNN #1#2#3#4#5#6#7#8#9
    {
    % As usual in a TeX loop, first check for the end-loop marker (here,
    % cs{q_recursion_tail}). If it is reached, we fill in the last few
    % chunks (which become shorter and shorter as we go). Otherwise,
    % add (to the token list we are building) an item containing (9)
    % tokens, and loop, dropping the first of the items.
    %
    quark_if_recursion_tail_stop_do:Nn #9
    { __find_set_chunks_end:NNNNNNNN #1 #2 #3 #4 #5 #6 #7 #8 }
    { exp_not:n { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
    __find_set_chunks_loop:NNNNNNNNN #2#3#4#5#6#7#8#9
    }
    cs_new:Npn __find_set_chunks_end:NNNNNNNN #1#2#3#4#5#6#7#8
    {
    exp_not:n
    {
    { #1 #2 #3 #4 #5 #6 #7 #8 }
    { #2 #3 #4 #5 #6 #7 #8 }
    { #3 #4 #5 #6 #7 #8 }
    { #4 #5 #6 #7 #8 }
    { #5 #6 #7 #8 }
    { #6 #7 #8 }
    { #7 #8 }
    { #8 }
    }
    }
    cs_new_protected:Npn __find_sort_chunks:
    {
    tl_sort:Nn l__find_chunks_tl
    {
    int_compare:nNnTF
    {
    pdftex_strcmp:D
    { exp_not:n {##1} }
    { exp_not:n {##2} }
    }
    > c_zero
    { sort_return_swapped: }
    { sort_return_same: }
    }
    }

    cs_new_protected:Npn __find_common:
    {
    int_step_inline:nnnn { 1 } { 1 } { 9 }
    { seq_clear:c { l__find_chunks_##1_seq } }
    exp_after:wN __find_common_loop:nn l__find_chunks_tl
    q_recursion_tail q_recursion_tail q_recursion_stop
    int_step_inline:nnnn { 4 } { 1 } { 9 }
    {
    seq_map_inline:cn { l__find_chunks_##1_seq }
    {
    tl_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {####1} ' _tl }
    {
    tl_put_right:cn
    { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } { i }
    }
    {
    cs_set_eq:cN
    { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } c_empty_tl
    }
    }
    }
    }

    cs_new_protected:Npn __find_common_loop:nn #1#2
    {
    quark_if_recursion_tail_stop:n {#2}
    int_zero:N l__find_common_int
    __find_count_common_aux:nn {#1} {#2}
    use:c { __find_common_ int_use:N l__find_common_int :w }
    #1 X X X X X X X X X q_stop
    __find_common_loop:nn {#2}
    }
    cs_new_protected:Npn __find_count_common_aux:nn #1#2
    {
    tl_if_empty:nF {#1}
    {
    tl_if_empty:nF {#2}
    {
    tl_if_eq:xxT { tl_head:n {#1} } { tl_head:n {#2} }
    {
    int_incr:N l__find_common_int
    __find_count_common_aux:xx
    { tl_tail:n {#1} } { tl_tail:n {#2} }
    }
    }
    }
    }
    cs_generate_variant:Nn __find_count_common_aux:nn { xx }

    cs_new_eq:cN { __find_common_0:w } use_none_delimit_by_q_stop:w
    cs_new_protected:cpn { __find_common_1:w } #1
    { __find_common_auxii:nnw { 1 } {#1} }
    cs_new_protected:cpn { __find_common_2:w } #1#2
    { __find_common_auxii:nnw { 2 } { #1 #2 } }
    cs_new_protected:cpn { __find_common_3:w } #1#2#3
    { __find_common_auxii:nnw { 3 } { #1 #2 #3 } }
    cs_new_protected:cpn { __find_common_4:w } #1#2#3#4
    { __find_common_auxii:nnw { 4 } { #1 #2 #3 #4 } }
    cs_new_protected:cpn { __find_common_5:w } #1#2#3#4#5
    {
    dot:
    __find_common_auxii:nnw { 5 } { #1 #2 #3 #4 #5 }
    }
    cs_new_protected:cpn { __find_common_6:w } #1#2#3#4#5#6
    { __find_common_auxii:nnw { 6 } { #1 #2 #3 #4 #5 #6 } }
    cs_new_protected:cpn { __find_common_7:w } #1#2#3#4#5#6#7
    { __find_common_auxii:nnw { 7 } { #1 #2 #3 #4 #5 #6 #7 } }
    cs_new_protected:cpn { __find_common_8:w } #1#2#3#4#5#6#7#8
    { __find_common_auxii:nnw { 8 } { #1 #2 #3 #4 #5 #6 #7 #8 } }
    cs_new_protected:cpn { __find_common_9:w } #1#2#3#4#5#6#7#8#9
    { __find_common_auxii:nnw { 9 } { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
    cs_new_protected:Npn __find_common_auxii:nnw #1#2#3 q_stop
    {
    int_zero:N l__find_nesting_int
    tl_map_inline:nn {#2}
    {
    str_case_x:nnn { exp_not:n {##1} }
    {
    { exp_not:c { l__find_prefix_str c_lbrace_str } }
    { int_incr:N l__find_nesting_int }
    { exp_not:c { l__find_prefix_str c_rbrace_str } }
    {
    int_compare:nNnF l__find_nesting_int > c_zero
    { use_none_delimit_by_q_stop:w }
    int_decr:N l__find_nesting_int
    }
    }
    { }
    }
    int_compare:nNnF l__find_nesting_int = c_zero
    { use_none_delimit_by_q_stop:w }
    seq_put_right:cn { l__find_chunks_#1_seq } {#2}
    use_none_delimit_by_q_stop:w
    q_stop
    }

    cs_new_protected_nopar:Npn __find_best_macros:
    {
    tl_clear:N l__find_macros_tl
    tl_clear:N l__find_result_tl
    __find_best_macros_aux:
    tl_put_left:NV l__find_result_tl l__find_macros_tl
    }
    cs_new_protected:Npn __find_best_macros_aux:
    {
    exp_after:wN __find_best_macros_auxii:NNNNNNNNN l__find_tl
    q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_stop
    tl_if_empty:NF l__find_tl { __find_best_macros_aux: }
    }
    cs_new_protected:Npn __find_best_macros_auxii:NNNNNNNNN #1#2#3#4#5#6#7#8#9
    {
    int_zero:N l__find_best_score_int
    tl_clear:N l__find_best_chunk_tl
    tl_map_inline:nn
    {
    {#1} {#1#2} {#1#2#3} {#1#2#3#4} {#1#2#3#4#5} {#1#2#3#4#5#6}
    {#1#2#3#4#5#6#7} {#1#2#3#4#5#6#7#8} {#1#2#3#4#5#6#7#8#9}
    }
    {
    int_compare:nNnT
    { __find_score:n {##1} } > l__find_best_score_int
    {
    tl_set:Nn l__find_best_chunk_tl {##1}
    int_set:Nn l__find_best_score_int
    { __find_score:n {##1} }
    }
    }
    tl_if_empty:NF l__find_best_chunk_tl
    {
    int_incr:N l__find_macro_int
    tl_put_right:Nx l__find_macros_tl
    {
    exp_not:c
    { l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
    { exp_not:V l__find_best_chunk_tl }
    }
    tl_replace_all:NVc l__find_tl l__find_best_chunk_tl
    { l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
    dot:
    }
    tl_put_right:Nx l__find_result_tl { tl_head:N l__find_tl }
    tl_set:Nx l__find_tl { tl_tail:N l__find_tl }
    use_none_delimit_by_q_stop:w
    }
    cs_new:Npn __find_score:n #1
    {
    % Turning ####1 into a length (p+2) macro
    % (e.g. |<prefix>A|) saves this number of chars.
    % Good if non-negative.
    cs_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
    {
    int_eval:n
    {
    tl_count:c { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
    * ( str_count:n {#1} - 3 - l__find_prefix_int )
    - 2 * l__find_prefix_int - 9
    }
    }
    { -1 }
    }

    cs_new_protected:Npn __find_undefine_tmp:
    {
    int_step_inline:nnnn { 4 } { 1 } { 9 }
    {
    seq_map_inline:cn { l__find_chunks_##1_seq }
    { cs_undefine:c { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } }
    }
    }

    cs_new_protected:Npn __find_unescape_tl:nN #1#2
    {
    regex_replace_all:nnN { c{#1{} } { cB{ cE? } #2
    dot:
    regex_replace_all:nnN { c{#1}} } { cB? cE} } #2
    dot:
    regex_replace_all:nnN { c[BE]? } { } #2
    dot:
    regex_replace_all:nnN { c{#1#} } { cP# } #2
    dot:
    }

    %%%% Hack LaTeX3's iow_wrap to allow zero-width breaking spaces
    tl_const:Nx c__iow_wrap_zwbs_marker_tl
    {
    c_catcode_other_space_tl
    c__iow_wrap_marker_tl
    c_catcode_other_space_tl
    zwbs
    c_catcode_other_space_tl
    }
    cs_new_protected_nopar:Npn __iow_wrap_zwbs:
    {
    tl_put_right:Nn l__iow_current_line_tl
    { tex_romannumeral:D -`0 }
    int_decr:N l__iow_current_line_int
    }
    %%% End of hack.

    msg_new:nnn { find } { invalid-prefix }
    { The~prefix~used,~'#1',~must~be~made~of~letters. }

    cs_set_protected:Npn FindMatches #1#2
    {
    find_matches:nnN {#1} {#2} l_tmpa_tl
    tl_use:N l_tmpa_tl
    }
    ExplSyntaxOff

    % Set up some basic things to behave a bit like plain TeX.
    %
    documentclass{article}
    begin{document}
    setlength{parindent}{0pt}
    defbye{end{document}}
    %
    % End of setup.

    FindMatches{}{% Let's go!

    On the first day of Christmas my true love gave to mepar
    a partridge in a pear tree.

    bigskip

    On the second day of Christmas my true love gave to mepar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the third day of Christmas my true love gave to mepar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the fourth day of Christmas my true love gave to mepar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the fifth day of Christmas my true love gave to mepar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the sixth day of Christmas my true love gave to mepar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the seventh day of Christmas my true love gave to mepar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the eighth day of Christmas my true love gave to mepar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the ninth day of Christmas my true love gave to mepar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the tenth day of Christmas my true love gave to mepar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the eleventh day of Christmas my true love gave to mepar
    eleven pipers pipingpar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the twelfth day of Christmas my true love gave to mepar
    twelve drummers drummingpar
    eleven pipers pipingpar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.
    bye
    }





    share|improve this answer



















    • 5




      Awesome as always, Bruno! :)
      – Paulo Cereda
      May 3 '13 at 0:54






    • 5




      This is one of the more outrageous things I've ever seen here. (It is very impressive!)
      – Ryan Reich
      May 3 '13 at 0:56










    • Hehe! Thanks @RyanReich. By the way, do you know of any algorithm in the vein of LZW that would be able to produce macros with varying arguments?
      – Bruno Le Floch
      May 3 '13 at 1:10












    • @Bruno All I know about LZW is the idea, unfortunately. You mean, you want the "dictionary" to be macros that take an argument, or that take a variable number of arguments? Isn't this the same as having an array and then having the macro argument index into it?
      – Ryan Reich
      May 3 '13 at 4:15






    • 2




      @BrunoLeFloch no wonder we get stalled on the middle layer of L3, but at least nobody can any longer deny that expl3 is up to arbitrary programming tasks :-)
      – Frank Mittelbach
      May 4 '13 at 20:16
















    22














    EDIT: I had implemented some version of the Lempel-Ziv-Welch algorithm, following a comment of Ryan Reich. I've now changed the code completely and I don't know if the algorithm has a name.



    I fed the new code the 12 days of Christmas, as David Carlisle asked, and obtained the following result (for plain TeX)



    longdefA#1#2{def#1{#2}A}AAG{sparACAAB}AH{AEWXVTU mVAD}AA
    {CDEFG}AB{HIJAparbigskipB}AC{OPparMNKL nd}AD{RSparQ
    A ring}AE{Y es dancV}AF{tZ a leapV}B{par On the }C{ day of C}D
    {hristmas }E{my true l}F{ove gave }G{to mepar}H{a partrid}I{ge in a p}J
    {ear tree.}K{two turtl}L{e dovespar a}M{three fre}N{nch henspar}O{four
    call}P{ing birds}Q{five gold}R{six geese}S{ a laying}T{seven swa}U{ns a
    swim}V{ingpar}W{eight mai}X{ds a milk}Y{nine ladi}Z{en lords } A{ }B
    firstAAAB secondAAKL nd AB thirdAAMNKL nd AB fourthAAACAAB
    fifthAAQA ringAG sixthAAADAG seventhAATU mVADAG eighthAAWXVT
    U mVADAG ninthAAAHAG tenthAAAFAHAG eleventhAA eleven pipers pipV
    AFAHAG twelfthAA twelve drummers drummV eleven pipers pipVAFAH spar
    ACAHIJAbye


    which is 878 bytes long. This should be compared to the 767 bytes of xii.tex, which produces the same lyrics.



    RequirePackage{expl3}
    ExplSyntaxOn
    cs_generate_variant:Nn tl_replace_all:Nnn { Nxx, NVc, Nxn }
    cs_generate_variant:Nn tl_replace_once:Nnn { Nxx }
    cs_generate_variant:Nn tl_if_in:NnT { No }
    cs_generate_variant:Nn int_min:nn { c, cc }
    cs_generate_variant:Nn tl_if_eq:NNTF { cc }
    cs_generate_variant:Nn tl_if_eq:nnTF { xx }
    cs_generate_variant:Nn tl_if_eq:nnT { xx }
    cs_generate_variant:Nn str_count:N { c }
    cs_generate_variant:Nn regex_match:nnF { nV }
    cs_generate_variant:Nn str_set:Nn { NV }
    str_new:N l__find_prefix_str
    int_new:N l__find_prefix_int
    tl_new:N l__find_tl
    tl_new:N l__find_chunks_tl
    int_step_inline:nnnn { 1 } { 1 } { 9 }
    { seq_new:c { l__find_chunks_#1_seq } }
    int_new:N l__find_common_int
    int_new:N l__find_nesting_int
    tl_new:N l__find_previous_tl
    seq_new:N l__find_chunks_seq
    int_new:N l__find_best_score_int
    int_new:N l__find_macro_int
    tl_new:N l__find_macros_tl
    tl_new:N l__find_result_tl
    int_new:N l__find_length_int
    int_new:N l__find_previous_length_int
    tl_new:N l__find_display_tl

    cs_new_protected_nopar:Npn dot: { tex_message:D { . } }

    cs_new_protected:Npn find_matches:nnN #1#2#3
    {
    % '#1' is the prefix, '#2' is the token list to study.
    %
    __find_set_prefix:n {#1}
    tl_set:Nn l__find_tl { ~! #2 }
    __find_escape_spaces:xN { l__find_prefix_str A } l__find_tl
    int_set_eq:NN l__find_macro_int c_one
    __find_get_length:V l__find_prefix_str
    iow_term:x { int_use:N l__find_length_int }
    int_set_eq:NN l__find_length_int c_max_int
    __find_matches_aux:V l__find_prefix_str
    tl_replace_once:Nnn l__find_tl { ! } { { ~ } }
    tl_set:Nx #3
    {
    __find_preamble:
    exp_not:V l__find_tl
    }
    }
    cs_new_protected:Npn __find_escape_spaces:nN #1#2
    {
    regex_replace_all:nnN { cS } { c{#1} } #2
    dot:
    }
    cs_generate_variant:Nn __find_escape_spaces:nN { x }
    cs_new_nopar:Npn __find_preamble:
    {
    exp_not:n { long def } exp_not:c { l__find_prefix_str A } ####1####2
    {
    exp_not:N def ####1{####2}
    exp_not:c { l__find_prefix_str A }
    }
    exp_not:c { l__find_prefix_str A }
    }
    cs_new_protected:Npn __find_matches_aux:n #1
    {
    int_set_eq:NN l__find_previous_length_int l__find_length_int
    tl_set_eq:NN l__find_previous_tl l__find_tl
    __find_escape_tl:nN {#1} l__find_tl
    int_compare:nNnTF { tl_count:N l__find_tl } < c_nine
    { tl_set_eq:Nn l__find_tl l__find_previous_tl }
    {
    __find_set_chunks:
    __find_sort_chunks:
    __find_common:
    __find_best_macros:
    __find_undefine_tmp:
    tl_set_eq:NN l__find_tl l__find_result_tl
    __find_unescape_tl:nN {#1} l__find_tl
    __find_get_length:n {#1}
    iow_term:x { int_use:N l__find_length_int }
    int_compare:nNnTF l__find_length_int < l__find_previous_length_int
    { __find_matches_aux:n {#1} }
    {
    iow_term:n { }
    iow_term:x { l__find_display_tl }
    iow_term:n { }
    tl_set_eq:NN l__find_tl l__find_previous_tl
    }
    }
    }
    cs_generate_variant:Nn __find_matches_aux:n { V }

    cs_new_protected:Npn __find_get_length:n #1
    {
    tl_set_eq:NN l__find_display_tl l__find_tl
    tl_replace_once:Nxx l__find_display_tl
    { exp_not:c { #1 A } ! }
    { ~ exp_not:c { #1 A } {~} }
    str_set:NV l__find_display_tl l__find_display_tl
    tl_replace_all:Nxn l__find_display_tl
    { c_backslash_str #1 token_to_str:N A ~ } c_space_tl
    tl_replace_all:Nnn l__find_display_tl
    { ~ c_space_tl } { ~ exp_not:c { #1 A } }
    dot:
    str_set:Nx l__find_display_tl { __find_preamble: l__find_display_tl }
    tl_replace_all:Nxx l__find_display_tl
    { c_hash_str c_hash_str } { c_hash_str }
    dot:
    regex_replace_all:nnN
    { (\[A-Za-z]+) ([A-Za-z]) }
    { 1 2 }
    l__find_display_tl
    dot:
    regex_replace_all:nnN
    { (\[A-Za-z]+) }
    { 1 c{c__iow_wrap_zwbs_marker_tl} }
    l__find_display_tl
    dot:
    iow_wrap:nnnN { l__find_display_tl } { } { } __find_get_length_aux:n
    }
    cs_generate_variant:Nn __find_get_length:n { V }
    cs_new_protected:Npn __find_get_length_aux:n #1
    {
    int_set:Nn l__find_length_int { str_count:n {#1} }
    tl_set:Nn l__find_display_tl {#1}
    % iow_term:n { }
    % iow_term:n {#1}
    % iow_term:n { }
    }



    cs_new_protected:Npn __find_set_prefix:n #1
    {
    % Check that the prefix |#1| is made only of alphabetic characters.
    %
    str_set:Nx l__find_prefix_str {#1}
    int_set:Nn l__find_prefix_int { str_count:N l__find_prefix_str }
    regex_match:nVF { Aw*Z } l__find_prefix_str
    {
    msg_error:nnx { find } { invalid-prefix }
    { l__find_prefix_str }
    }
    dot:
    }

    cs_new_protected:Npn __find_escape_tl:nN #1#2
    {
    % During the 'study' step, we manipulate the token list |#2|
    % with all begin-group and end-group tokens replaced by a
    % control sequence built from the prefix. We must change both
    % begin-group and end-group tokens in one pass, to avoid getting an
    % unbalanced result. Also replace macro parameters because they
    % cannot be used as delimiters for macros. Spaces have been
    % turned into a control sequence earlier. At this stage, every
    % token in |#2| can be grabbed as an N-type argument.
    %
    regex_replace_all:nnN { cB. } { cB{ } #2
    dot:
    regex_replace_all:nnN { cE. } { cE} } #2
    dot:
    regex_replace_all:nnN { cP. } { c{ #1 # } } #2
    dot:
    regex_replace_all:nnN { c[BEP]. } { c{ #1 } } #2
    dot:
    }




    cs_new_protected_nopar:Npn __find_set_chunks:
    {
    % Build a token list whose items are each nine consecutive tokens
    % of the original token list, in a running window. So for instance
    % |ABCDEFGHIJKL| would lead to the following (12) items:
    % |ABCDEFGHI|, |BCDEFGHIJ|, |CDEFGHIJK|, |DEFGHIJKL|, |EFGHIJKL|,
    % |FGHIJKL|, |GHIJKL|, |HIJKL|, |IJKL|, |JKL|, |KL|, |L|. The items
    % of this token list are built in an |x|-expanded loop.
    % A special case arises if the |find| token list is too short to
    % safely perform the loop: then our whole algorithm is not going to
    % do any good anyways, so we build an empty chunk list.
    %
    tl_set:Nx l__find_chunks_tl
    {
    exp_after:wN __find_set_chunks_loop:NNNNNNNNN l__find_tl
    q_recursion_tail q_recursion_stop
    }
    }
    cs_new:Npn __find_set_chunks_loop:NNNNNNNNN #1#2#3#4#5#6#7#8#9
    {
    % As usual in a TeX loop, first check for the end-loop marker (here,
    % cs{q_recursion_tail}). If it is reached, we fill in the last few
    % chunks (which become shorter and shorter as we go). Otherwise,
    % add (to the token list we are building) an item containing (9)
    % tokens, and loop, dropping the first of the items.
    %
    quark_if_recursion_tail_stop_do:Nn #9
    { __find_set_chunks_end:NNNNNNNN #1 #2 #3 #4 #5 #6 #7 #8 }
    { exp_not:n { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
    __find_set_chunks_loop:NNNNNNNNN #2#3#4#5#6#7#8#9
    }
    cs_new:Npn __find_set_chunks_end:NNNNNNNN #1#2#3#4#5#6#7#8
    {
    exp_not:n
    {
    { #1 #2 #3 #4 #5 #6 #7 #8 }
    { #2 #3 #4 #5 #6 #7 #8 }
    { #3 #4 #5 #6 #7 #8 }
    { #4 #5 #6 #7 #8 }
    { #5 #6 #7 #8 }
    { #6 #7 #8 }
    { #7 #8 }
    { #8 }
    }
    }
    cs_new_protected:Npn __find_sort_chunks:
    {
    tl_sort:Nn l__find_chunks_tl
    {
    int_compare:nNnTF
    {
    pdftex_strcmp:D
    { exp_not:n {##1} }
    { exp_not:n {##2} }
    }
    > c_zero
    { sort_return_swapped: }
    { sort_return_same: }
    }
    }

    cs_new_protected:Npn __find_common:
    {
    int_step_inline:nnnn { 1 } { 1 } { 9 }
    { seq_clear:c { l__find_chunks_##1_seq } }
    exp_after:wN __find_common_loop:nn l__find_chunks_tl
    q_recursion_tail q_recursion_tail q_recursion_stop
    int_step_inline:nnnn { 4 } { 1 } { 9 }
    {
    seq_map_inline:cn { l__find_chunks_##1_seq }
    {
    tl_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {####1} ' _tl }
    {
    tl_put_right:cn
    { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } { i }
    }
    {
    cs_set_eq:cN
    { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } c_empty_tl
    }
    }
    }
    }

    cs_new_protected:Npn __find_common_loop:nn #1#2
    {
    quark_if_recursion_tail_stop:n {#2}
    int_zero:N l__find_common_int
    __find_count_common_aux:nn {#1} {#2}
    use:c { __find_common_ int_use:N l__find_common_int :w }
    #1 X X X X X X X X X q_stop
    __find_common_loop:nn {#2}
    }
    cs_new_protected:Npn __find_count_common_aux:nn #1#2
    {
    tl_if_empty:nF {#1}
    {
    tl_if_empty:nF {#2}
    {
    tl_if_eq:xxT { tl_head:n {#1} } { tl_head:n {#2} }
    {
    int_incr:N l__find_common_int
    __find_count_common_aux:xx
    { tl_tail:n {#1} } { tl_tail:n {#2} }
    }
    }
    }
    }
    cs_generate_variant:Nn __find_count_common_aux:nn { xx }

    cs_new_eq:cN { __find_common_0:w } use_none_delimit_by_q_stop:w
    cs_new_protected:cpn { __find_common_1:w } #1
    { __find_common_auxii:nnw { 1 } {#1} }
    cs_new_protected:cpn { __find_common_2:w } #1#2
    { __find_common_auxii:nnw { 2 } { #1 #2 } }
    cs_new_protected:cpn { __find_common_3:w } #1#2#3
    { __find_common_auxii:nnw { 3 } { #1 #2 #3 } }
    cs_new_protected:cpn { __find_common_4:w } #1#2#3#4
    { __find_common_auxii:nnw { 4 } { #1 #2 #3 #4 } }
    cs_new_protected:cpn { __find_common_5:w } #1#2#3#4#5
    {
    dot:
    __find_common_auxii:nnw { 5 } { #1 #2 #3 #4 #5 }
    }
    cs_new_protected:cpn { __find_common_6:w } #1#2#3#4#5#6
    { __find_common_auxii:nnw { 6 } { #1 #2 #3 #4 #5 #6 } }
    cs_new_protected:cpn { __find_common_7:w } #1#2#3#4#5#6#7
    { __find_common_auxii:nnw { 7 } { #1 #2 #3 #4 #5 #6 #7 } }
    cs_new_protected:cpn { __find_common_8:w } #1#2#3#4#5#6#7#8
    { __find_common_auxii:nnw { 8 } { #1 #2 #3 #4 #5 #6 #7 #8 } }
    cs_new_protected:cpn { __find_common_9:w } #1#2#3#4#5#6#7#8#9
    { __find_common_auxii:nnw { 9 } { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
    cs_new_protected:Npn __find_common_auxii:nnw #1#2#3 q_stop
    {
    int_zero:N l__find_nesting_int
    tl_map_inline:nn {#2}
    {
    str_case_x:nnn { exp_not:n {##1} }
    {
    { exp_not:c { l__find_prefix_str c_lbrace_str } }
    { int_incr:N l__find_nesting_int }
    { exp_not:c { l__find_prefix_str c_rbrace_str } }
    {
    int_compare:nNnF l__find_nesting_int > c_zero
    { use_none_delimit_by_q_stop:w }
    int_decr:N l__find_nesting_int
    }
    }
    { }
    }
    int_compare:nNnF l__find_nesting_int = c_zero
    { use_none_delimit_by_q_stop:w }
    seq_put_right:cn { l__find_chunks_#1_seq } {#2}
    use_none_delimit_by_q_stop:w
    q_stop
    }

    cs_new_protected_nopar:Npn __find_best_macros:
    {
    tl_clear:N l__find_macros_tl
    tl_clear:N l__find_result_tl
    __find_best_macros_aux:
    tl_put_left:NV l__find_result_tl l__find_macros_tl
    }
    cs_new_protected:Npn __find_best_macros_aux:
    {
    exp_after:wN __find_best_macros_auxii:NNNNNNNNN l__find_tl
    q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_stop
    tl_if_empty:NF l__find_tl { __find_best_macros_aux: }
    }
    cs_new_protected:Npn __find_best_macros_auxii:NNNNNNNNN #1#2#3#4#5#6#7#8#9
    {
    int_zero:N l__find_best_score_int
    tl_clear:N l__find_best_chunk_tl
    tl_map_inline:nn
    {
    {#1} {#1#2} {#1#2#3} {#1#2#3#4} {#1#2#3#4#5} {#1#2#3#4#5#6}
    {#1#2#3#4#5#6#7} {#1#2#3#4#5#6#7#8} {#1#2#3#4#5#6#7#8#9}
    }
    {
    int_compare:nNnT
    { __find_score:n {##1} } > l__find_best_score_int
    {
    tl_set:Nn l__find_best_chunk_tl {##1}
    int_set:Nn l__find_best_score_int
    { __find_score:n {##1} }
    }
    }
    tl_if_empty:NF l__find_best_chunk_tl
    {
    int_incr:N l__find_macro_int
    tl_put_right:Nx l__find_macros_tl
    {
    exp_not:c
    { l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
    { exp_not:V l__find_best_chunk_tl }
    }
    tl_replace_all:NVc l__find_tl l__find_best_chunk_tl
    { l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
    dot:
    }
    tl_put_right:Nx l__find_result_tl { tl_head:N l__find_tl }
    tl_set:Nx l__find_tl { tl_tail:N l__find_tl }
    use_none_delimit_by_q_stop:w
    }
    cs_new:Npn __find_score:n #1
    {
    % Turning ####1 into a length (p+2) macro
    % (e.g. |<prefix>A|) saves this number of chars.
    % Good if non-negative.
    cs_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
    {
    int_eval:n
    {
    tl_count:c { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
    * ( str_count:n {#1} - 3 - l__find_prefix_int )
    - 2 * l__find_prefix_int - 9
    }
    }
    { -1 }
    }

    cs_new_protected:Npn __find_undefine_tmp:
    {
    int_step_inline:nnnn { 4 } { 1 } { 9 }
    {
    seq_map_inline:cn { l__find_chunks_##1_seq }
    { cs_undefine:c { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } }
    }
    }

    cs_new_protected:Npn __find_unescape_tl:nN #1#2
    {
    regex_replace_all:nnN { c{#1{} } { cB{ cE? } #2
    dot:
    regex_replace_all:nnN { c{#1}} } { cB? cE} } #2
    dot:
    regex_replace_all:nnN { c[BE]? } { } #2
    dot:
    regex_replace_all:nnN { c{#1#} } { cP# } #2
    dot:
    }

    %%%% Hack LaTeX3's iow_wrap to allow zero-width breaking spaces
    tl_const:Nx c__iow_wrap_zwbs_marker_tl
    {
    c_catcode_other_space_tl
    c__iow_wrap_marker_tl
    c_catcode_other_space_tl
    zwbs
    c_catcode_other_space_tl
    }
    cs_new_protected_nopar:Npn __iow_wrap_zwbs:
    {
    tl_put_right:Nn l__iow_current_line_tl
    { tex_romannumeral:D -`0 }
    int_decr:N l__iow_current_line_int
    }
    %%% End of hack.

    msg_new:nnn { find } { invalid-prefix }
    { The~prefix~used,~'#1',~must~be~made~of~letters. }

    cs_set_protected:Npn FindMatches #1#2
    {
    find_matches:nnN {#1} {#2} l_tmpa_tl
    tl_use:N l_tmpa_tl
    }
    ExplSyntaxOff

    % Set up some basic things to behave a bit like plain TeX.
    %
    documentclass{article}
    begin{document}
    setlength{parindent}{0pt}
    defbye{end{document}}
    %
    % End of setup.

    FindMatches{}{% Let's go!

    On the first day of Christmas my true love gave to mepar
    a partridge in a pear tree.

    bigskip

    On the second day of Christmas my true love gave to mepar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the third day of Christmas my true love gave to mepar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the fourth day of Christmas my true love gave to mepar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the fifth day of Christmas my true love gave to mepar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the sixth day of Christmas my true love gave to mepar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the seventh day of Christmas my true love gave to mepar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the eighth day of Christmas my true love gave to mepar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the ninth day of Christmas my true love gave to mepar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the tenth day of Christmas my true love gave to mepar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the eleventh day of Christmas my true love gave to mepar
    eleven pipers pipingpar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the twelfth day of Christmas my true love gave to mepar
    twelve drummers drummingpar
    eleven pipers pipingpar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.
    bye
    }





    share|improve this answer



















    • 5




      Awesome as always, Bruno! :)
      – Paulo Cereda
      May 3 '13 at 0:54






    • 5




      This is one of the more outrageous things I've ever seen here. (It is very impressive!)
      – Ryan Reich
      May 3 '13 at 0:56










    • Hehe! Thanks @RyanReich. By the way, do you know of any algorithm in the vein of LZW that would be able to produce macros with varying arguments?
      – Bruno Le Floch
      May 3 '13 at 1:10












    • @Bruno All I know about LZW is the idea, unfortunately. You mean, you want the "dictionary" to be macros that take an argument, or that take a variable number of arguments? Isn't this the same as having an array and then having the macro argument index into it?
      – Ryan Reich
      May 3 '13 at 4:15






    • 2




      @BrunoLeFloch no wonder we get stalled on the middle layer of L3, but at least nobody can any longer deny that expl3 is up to arbitrary programming tasks :-)
      – Frank Mittelbach
      May 4 '13 at 20:16














    22












    22








    22






    EDIT: I had implemented some version of the Lempel-Ziv-Welch algorithm, following a comment of Ryan Reich. I've now changed the code completely and I don't know if the algorithm has a name.



    I fed the new code the 12 days of Christmas, as David Carlisle asked, and obtained the following result (for plain TeX)



    longdefA#1#2{def#1{#2}A}AAG{sparACAAB}AH{AEWXVTU mVAD}AA
    {CDEFG}AB{HIJAparbigskipB}AC{OPparMNKL nd}AD{RSparQ
    A ring}AE{Y es dancV}AF{tZ a leapV}B{par On the }C{ day of C}D
    {hristmas }E{my true l}F{ove gave }G{to mepar}H{a partrid}I{ge in a p}J
    {ear tree.}K{two turtl}L{e dovespar a}M{three fre}N{nch henspar}O{four
    call}P{ing birds}Q{five gold}R{six geese}S{ a laying}T{seven swa}U{ns a
    swim}V{ingpar}W{eight mai}X{ds a milk}Y{nine ladi}Z{en lords } A{ }B
    firstAAAB secondAAKL nd AB thirdAAMNKL nd AB fourthAAACAAB
    fifthAAQA ringAG sixthAAADAG seventhAATU mVADAG eighthAAWXVT
    U mVADAG ninthAAAHAG tenthAAAFAHAG eleventhAA eleven pipers pipV
    AFAHAG twelfthAA twelve drummers drummV eleven pipers pipVAFAH spar
    ACAHIJAbye


    which is 878 bytes long. This should be compared to the 767 bytes of xii.tex, which produces the same lyrics.



    RequirePackage{expl3}
    ExplSyntaxOn
    cs_generate_variant:Nn tl_replace_all:Nnn { Nxx, NVc, Nxn }
    cs_generate_variant:Nn tl_replace_once:Nnn { Nxx }
    cs_generate_variant:Nn tl_if_in:NnT { No }
    cs_generate_variant:Nn int_min:nn { c, cc }
    cs_generate_variant:Nn tl_if_eq:NNTF { cc }
    cs_generate_variant:Nn tl_if_eq:nnTF { xx }
    cs_generate_variant:Nn tl_if_eq:nnT { xx }
    cs_generate_variant:Nn str_count:N { c }
    cs_generate_variant:Nn regex_match:nnF { nV }
    cs_generate_variant:Nn str_set:Nn { NV }
    str_new:N l__find_prefix_str
    int_new:N l__find_prefix_int
    tl_new:N l__find_tl
    tl_new:N l__find_chunks_tl
    int_step_inline:nnnn { 1 } { 1 } { 9 }
    { seq_new:c { l__find_chunks_#1_seq } }
    int_new:N l__find_common_int
    int_new:N l__find_nesting_int
    tl_new:N l__find_previous_tl
    seq_new:N l__find_chunks_seq
    int_new:N l__find_best_score_int
    int_new:N l__find_macro_int
    tl_new:N l__find_macros_tl
    tl_new:N l__find_result_tl
    int_new:N l__find_length_int
    int_new:N l__find_previous_length_int
    tl_new:N l__find_display_tl

    cs_new_protected_nopar:Npn dot: { tex_message:D { . } }

    cs_new_protected:Npn find_matches:nnN #1#2#3
    {
    % '#1' is the prefix, '#2' is the token list to study.
    %
    __find_set_prefix:n {#1}
    tl_set:Nn l__find_tl { ~! #2 }
    __find_escape_spaces:xN { l__find_prefix_str A } l__find_tl
    int_set_eq:NN l__find_macro_int c_one
    __find_get_length:V l__find_prefix_str
    iow_term:x { int_use:N l__find_length_int }
    int_set_eq:NN l__find_length_int c_max_int
    __find_matches_aux:V l__find_prefix_str
    tl_replace_once:Nnn l__find_tl { ! } { { ~ } }
    tl_set:Nx #3
    {
    __find_preamble:
    exp_not:V l__find_tl
    }
    }
    cs_new_protected:Npn __find_escape_spaces:nN #1#2
    {
    regex_replace_all:nnN { cS } { c{#1} } #2
    dot:
    }
    cs_generate_variant:Nn __find_escape_spaces:nN { x }
    cs_new_nopar:Npn __find_preamble:
    {
    exp_not:n { long def } exp_not:c { l__find_prefix_str A } ####1####2
    {
    exp_not:N def ####1{####2}
    exp_not:c { l__find_prefix_str A }
    }
    exp_not:c { l__find_prefix_str A }
    }
    cs_new_protected:Npn __find_matches_aux:n #1
    {
    int_set_eq:NN l__find_previous_length_int l__find_length_int
    tl_set_eq:NN l__find_previous_tl l__find_tl
    __find_escape_tl:nN {#1} l__find_tl
    int_compare:nNnTF { tl_count:N l__find_tl } < c_nine
    { tl_set_eq:Nn l__find_tl l__find_previous_tl }
    {
    __find_set_chunks:
    __find_sort_chunks:
    __find_common:
    __find_best_macros:
    __find_undefine_tmp:
    tl_set_eq:NN l__find_tl l__find_result_tl
    __find_unescape_tl:nN {#1} l__find_tl
    __find_get_length:n {#1}
    iow_term:x { int_use:N l__find_length_int }
    int_compare:nNnTF l__find_length_int < l__find_previous_length_int
    { __find_matches_aux:n {#1} }
    {
    iow_term:n { }
    iow_term:x { l__find_display_tl }
    iow_term:n { }
    tl_set_eq:NN l__find_tl l__find_previous_tl
    }
    }
    }
    cs_generate_variant:Nn __find_matches_aux:n { V }

    cs_new_protected:Npn __find_get_length:n #1
    {
    tl_set_eq:NN l__find_display_tl l__find_tl
    tl_replace_once:Nxx l__find_display_tl
    { exp_not:c { #1 A } ! }
    { ~ exp_not:c { #1 A } {~} }
    str_set:NV l__find_display_tl l__find_display_tl
    tl_replace_all:Nxn l__find_display_tl
    { c_backslash_str #1 token_to_str:N A ~ } c_space_tl
    tl_replace_all:Nnn l__find_display_tl
    { ~ c_space_tl } { ~ exp_not:c { #1 A } }
    dot:
    str_set:Nx l__find_display_tl { __find_preamble: l__find_display_tl }
    tl_replace_all:Nxx l__find_display_tl
    { c_hash_str c_hash_str } { c_hash_str }
    dot:
    regex_replace_all:nnN
    { (\[A-Za-z]+) ([A-Za-z]) }
    { 1 2 }
    l__find_display_tl
    dot:
    regex_replace_all:nnN
    { (\[A-Za-z]+) }
    { 1 c{c__iow_wrap_zwbs_marker_tl} }
    l__find_display_tl
    dot:
    iow_wrap:nnnN { l__find_display_tl } { } { } __find_get_length_aux:n
    }
    cs_generate_variant:Nn __find_get_length:n { V }
    cs_new_protected:Npn __find_get_length_aux:n #1
    {
    int_set:Nn l__find_length_int { str_count:n {#1} }
    tl_set:Nn l__find_display_tl {#1}
    % iow_term:n { }
    % iow_term:n {#1}
    % iow_term:n { }
    }



    cs_new_protected:Npn __find_set_prefix:n #1
    {
    % Check that the prefix |#1| is made only of alphabetic characters.
    %
    str_set:Nx l__find_prefix_str {#1}
    int_set:Nn l__find_prefix_int { str_count:N l__find_prefix_str }
    regex_match:nVF { Aw*Z } l__find_prefix_str
    {
    msg_error:nnx { find } { invalid-prefix }
    { l__find_prefix_str }
    }
    dot:
    }

    cs_new_protected:Npn __find_escape_tl:nN #1#2
    {
    % During the 'study' step, we manipulate the token list |#2|
    % with all begin-group and end-group tokens replaced by a
    % control sequence built from the prefix. We must change both
    % begin-group and end-group tokens in one pass, to avoid getting an
    % unbalanced result. Also replace macro parameters because they
    % cannot be used as delimiters for macros. Spaces have been
    % turned into a control sequence earlier. At this stage, every
    % token in |#2| can be grabbed as an N-type argument.
    %
    regex_replace_all:nnN { cB. } { cB{ } #2
    dot:
    regex_replace_all:nnN { cE. } { cE} } #2
    dot:
    regex_replace_all:nnN { cP. } { c{ #1 # } } #2
    dot:
    regex_replace_all:nnN { c[BEP]. } { c{ #1 } } #2
    dot:
    }




    cs_new_protected_nopar:Npn __find_set_chunks:
    {
    % Build a token list whose items are each nine consecutive tokens
    % of the original token list, in a running window. So for instance
    % |ABCDEFGHIJKL| would lead to the following (12) items:
    % |ABCDEFGHI|, |BCDEFGHIJ|, |CDEFGHIJK|, |DEFGHIJKL|, |EFGHIJKL|,
    % |FGHIJKL|, |GHIJKL|, |HIJKL|, |IJKL|, |JKL|, |KL|, |L|. The items
    % of this token list are built in an |x|-expanded loop.
    % A special case arises if the |find| token list is too short to
    % safely perform the loop: then our whole algorithm is not going to
    % do any good anyways, so we build an empty chunk list.
    %
    tl_set:Nx l__find_chunks_tl
    {
    exp_after:wN __find_set_chunks_loop:NNNNNNNNN l__find_tl
    q_recursion_tail q_recursion_stop
    }
    }
    cs_new:Npn __find_set_chunks_loop:NNNNNNNNN #1#2#3#4#5#6#7#8#9
    {
    % As usual in a TeX loop, first check for the end-loop marker (here,
    % cs{q_recursion_tail}). If it is reached, we fill in the last few
    % chunks (which become shorter and shorter as we go). Otherwise,
    % add (to the token list we are building) an item containing (9)
    % tokens, and loop, dropping the first of the items.
    %
    quark_if_recursion_tail_stop_do:Nn #9
    { __find_set_chunks_end:NNNNNNNN #1 #2 #3 #4 #5 #6 #7 #8 }
    { exp_not:n { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
    __find_set_chunks_loop:NNNNNNNNN #2#3#4#5#6#7#8#9
    }
    cs_new:Npn __find_set_chunks_end:NNNNNNNN #1#2#3#4#5#6#7#8
    {
    exp_not:n
    {
    { #1 #2 #3 #4 #5 #6 #7 #8 }
    { #2 #3 #4 #5 #6 #7 #8 }
    { #3 #4 #5 #6 #7 #8 }
    { #4 #5 #6 #7 #8 }
    { #5 #6 #7 #8 }
    { #6 #7 #8 }
    { #7 #8 }
    { #8 }
    }
    }
    cs_new_protected:Npn __find_sort_chunks:
    {
    tl_sort:Nn l__find_chunks_tl
    {
    int_compare:nNnTF
    {
    pdftex_strcmp:D
    { exp_not:n {##1} }
    { exp_not:n {##2} }
    }
    > c_zero
    { sort_return_swapped: }
    { sort_return_same: }
    }
    }

    cs_new_protected:Npn __find_common:
    {
    int_step_inline:nnnn { 1 } { 1 } { 9 }
    { seq_clear:c { l__find_chunks_##1_seq } }
    exp_after:wN __find_common_loop:nn l__find_chunks_tl
    q_recursion_tail q_recursion_tail q_recursion_stop
    int_step_inline:nnnn { 4 } { 1 } { 9 }
    {
    seq_map_inline:cn { l__find_chunks_##1_seq }
    {
    tl_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {####1} ' _tl }
    {
    tl_put_right:cn
    { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } { i }
    }
    {
    cs_set_eq:cN
    { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } c_empty_tl
    }
    }
    }
    }

    cs_new_protected:Npn __find_common_loop:nn #1#2
    {
    quark_if_recursion_tail_stop:n {#2}
    int_zero:N l__find_common_int
    __find_count_common_aux:nn {#1} {#2}
    use:c { __find_common_ int_use:N l__find_common_int :w }
    #1 X X X X X X X X X q_stop
    __find_common_loop:nn {#2}
    }
    cs_new_protected:Npn __find_count_common_aux:nn #1#2
    {
    tl_if_empty:nF {#1}
    {
    tl_if_empty:nF {#2}
    {
    tl_if_eq:xxT { tl_head:n {#1} } { tl_head:n {#2} }
    {
    int_incr:N l__find_common_int
    __find_count_common_aux:xx
    { tl_tail:n {#1} } { tl_tail:n {#2} }
    }
    }
    }
    }
    cs_generate_variant:Nn __find_count_common_aux:nn { xx }

    cs_new_eq:cN { __find_common_0:w } use_none_delimit_by_q_stop:w
    cs_new_protected:cpn { __find_common_1:w } #1
    { __find_common_auxii:nnw { 1 } {#1} }
    cs_new_protected:cpn { __find_common_2:w } #1#2
    { __find_common_auxii:nnw { 2 } { #1 #2 } }
    cs_new_protected:cpn { __find_common_3:w } #1#2#3
    { __find_common_auxii:nnw { 3 } { #1 #2 #3 } }
    cs_new_protected:cpn { __find_common_4:w } #1#2#3#4
    { __find_common_auxii:nnw { 4 } { #1 #2 #3 #4 } }
    cs_new_protected:cpn { __find_common_5:w } #1#2#3#4#5
    {
    dot:
    __find_common_auxii:nnw { 5 } { #1 #2 #3 #4 #5 }
    }
    cs_new_protected:cpn { __find_common_6:w } #1#2#3#4#5#6
    { __find_common_auxii:nnw { 6 } { #1 #2 #3 #4 #5 #6 } }
    cs_new_protected:cpn { __find_common_7:w } #1#2#3#4#5#6#7
    { __find_common_auxii:nnw { 7 } { #1 #2 #3 #4 #5 #6 #7 } }
    cs_new_protected:cpn { __find_common_8:w } #1#2#3#4#5#6#7#8
    { __find_common_auxii:nnw { 8 } { #1 #2 #3 #4 #5 #6 #7 #8 } }
    cs_new_protected:cpn { __find_common_9:w } #1#2#3#4#5#6#7#8#9
    { __find_common_auxii:nnw { 9 } { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
    cs_new_protected:Npn __find_common_auxii:nnw #1#2#3 q_stop
    {
    int_zero:N l__find_nesting_int
    tl_map_inline:nn {#2}
    {
    str_case_x:nnn { exp_not:n {##1} }
    {
    { exp_not:c { l__find_prefix_str c_lbrace_str } }
    { int_incr:N l__find_nesting_int }
    { exp_not:c { l__find_prefix_str c_rbrace_str } }
    {
    int_compare:nNnF l__find_nesting_int > c_zero
    { use_none_delimit_by_q_stop:w }
    int_decr:N l__find_nesting_int
    }
    }
    { }
    }
    int_compare:nNnF l__find_nesting_int = c_zero
    { use_none_delimit_by_q_stop:w }
    seq_put_right:cn { l__find_chunks_#1_seq } {#2}
    use_none_delimit_by_q_stop:w
    q_stop
    }

    cs_new_protected_nopar:Npn __find_best_macros:
    {
    tl_clear:N l__find_macros_tl
    tl_clear:N l__find_result_tl
    __find_best_macros_aux:
    tl_put_left:NV l__find_result_tl l__find_macros_tl
    }
    cs_new_protected:Npn __find_best_macros_aux:
    {
    exp_after:wN __find_best_macros_auxii:NNNNNNNNN l__find_tl
    q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_stop
    tl_if_empty:NF l__find_tl { __find_best_macros_aux: }
    }
    cs_new_protected:Npn __find_best_macros_auxii:NNNNNNNNN #1#2#3#4#5#6#7#8#9
    {
    int_zero:N l__find_best_score_int
    tl_clear:N l__find_best_chunk_tl
    tl_map_inline:nn
    {
    {#1} {#1#2} {#1#2#3} {#1#2#3#4} {#1#2#3#4#5} {#1#2#3#4#5#6}
    {#1#2#3#4#5#6#7} {#1#2#3#4#5#6#7#8} {#1#2#3#4#5#6#7#8#9}
    }
    {
    int_compare:nNnT
    { __find_score:n {##1} } > l__find_best_score_int
    {
    tl_set:Nn l__find_best_chunk_tl {##1}
    int_set:Nn l__find_best_score_int
    { __find_score:n {##1} }
    }
    }
    tl_if_empty:NF l__find_best_chunk_tl
    {
    int_incr:N l__find_macro_int
    tl_put_right:Nx l__find_macros_tl
    {
    exp_not:c
    { l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
    { exp_not:V l__find_best_chunk_tl }
    }
    tl_replace_all:NVc l__find_tl l__find_best_chunk_tl
    { l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
    dot:
    }
    tl_put_right:Nx l__find_result_tl { tl_head:N l__find_tl }
    tl_set:Nx l__find_tl { tl_tail:N l__find_tl }
    use_none_delimit_by_q_stop:w
    }
    cs_new:Npn __find_score:n #1
    {
    % Turning ####1 into a length (p+2) macro
    % (e.g. |<prefix>A|) saves this number of chars.
    % Good if non-negative.
    cs_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
    {
    int_eval:n
    {
    tl_count:c { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
    * ( str_count:n {#1} - 3 - l__find_prefix_int )
    - 2 * l__find_prefix_int - 9
    }
    }
    { -1 }
    }

    cs_new_protected:Npn __find_undefine_tmp:
    {
    int_step_inline:nnnn { 4 } { 1 } { 9 }
    {
    seq_map_inline:cn { l__find_chunks_##1_seq }
    { cs_undefine:c { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } }
    }
    }

    cs_new_protected:Npn __find_unescape_tl:nN #1#2
    {
    regex_replace_all:nnN { c{#1{} } { cB{ cE? } #2
    dot:
    regex_replace_all:nnN { c{#1}} } { cB? cE} } #2
    dot:
    regex_replace_all:nnN { c[BE]? } { } #2
    dot:
    regex_replace_all:nnN { c{#1#} } { cP# } #2
    dot:
    }

    %%%% Hack LaTeX3's iow_wrap to allow zero-width breaking spaces
    tl_const:Nx c__iow_wrap_zwbs_marker_tl
    {
    c_catcode_other_space_tl
    c__iow_wrap_marker_tl
    c_catcode_other_space_tl
    zwbs
    c_catcode_other_space_tl
    }
    cs_new_protected_nopar:Npn __iow_wrap_zwbs:
    {
    tl_put_right:Nn l__iow_current_line_tl
    { tex_romannumeral:D -`0 }
    int_decr:N l__iow_current_line_int
    }
    %%% End of hack.

    msg_new:nnn { find } { invalid-prefix }
    { The~prefix~used,~'#1',~must~be~made~of~letters. }

    cs_set_protected:Npn FindMatches #1#2
    {
    find_matches:nnN {#1} {#2} l_tmpa_tl
    tl_use:N l_tmpa_tl
    }
    ExplSyntaxOff

    % Set up some basic things to behave a bit like plain TeX.
    %
    documentclass{article}
    begin{document}
    setlength{parindent}{0pt}
    defbye{end{document}}
    %
    % End of setup.

    FindMatches{}{% Let's go!

    On the first day of Christmas my true love gave to mepar
    a partridge in a pear tree.

    bigskip

    On the second day of Christmas my true love gave to mepar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the third day of Christmas my true love gave to mepar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the fourth day of Christmas my true love gave to mepar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the fifth day of Christmas my true love gave to mepar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the sixth day of Christmas my true love gave to mepar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the seventh day of Christmas my true love gave to mepar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the eighth day of Christmas my true love gave to mepar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the ninth day of Christmas my true love gave to mepar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the tenth day of Christmas my true love gave to mepar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the eleventh day of Christmas my true love gave to mepar
    eleven pipers pipingpar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the twelfth day of Christmas my true love gave to mepar
    twelve drummers drummingpar
    eleven pipers pipingpar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.
    bye
    }





    share|improve this answer














    EDIT: I had implemented some version of the Lempel-Ziv-Welch algorithm, following a comment of Ryan Reich. I've now changed the code completely and I don't know if the algorithm has a name.



    I fed the new code the 12 days of Christmas, as David Carlisle asked, and obtained the following result (for plain TeX)



    longdefA#1#2{def#1{#2}A}AAG{sparACAAB}AH{AEWXVTU mVAD}AA
    {CDEFG}AB{HIJAparbigskipB}AC{OPparMNKL nd}AD{RSparQ
    A ring}AE{Y es dancV}AF{tZ a leapV}B{par On the }C{ day of C}D
    {hristmas }E{my true l}F{ove gave }G{to mepar}H{a partrid}I{ge in a p}J
    {ear tree.}K{two turtl}L{e dovespar a}M{three fre}N{nch henspar}O{four
    call}P{ing birds}Q{five gold}R{six geese}S{ a laying}T{seven swa}U{ns a
    swim}V{ingpar}W{eight mai}X{ds a milk}Y{nine ladi}Z{en lords } A{ }B
    firstAAAB secondAAKL nd AB thirdAAMNKL nd AB fourthAAACAAB
    fifthAAQA ringAG sixthAAADAG seventhAATU mVADAG eighthAAWXVT
    U mVADAG ninthAAAHAG tenthAAAFAHAG eleventhAA eleven pipers pipV
    AFAHAG twelfthAA twelve drummers drummV eleven pipers pipVAFAH spar
    ACAHIJAbye


    which is 878 bytes long. This should be compared to the 767 bytes of xii.tex, which produces the same lyrics.



    RequirePackage{expl3}
    ExplSyntaxOn
    cs_generate_variant:Nn tl_replace_all:Nnn { Nxx, NVc, Nxn }
    cs_generate_variant:Nn tl_replace_once:Nnn { Nxx }
    cs_generate_variant:Nn tl_if_in:NnT { No }
    cs_generate_variant:Nn int_min:nn { c, cc }
    cs_generate_variant:Nn tl_if_eq:NNTF { cc }
    cs_generate_variant:Nn tl_if_eq:nnTF { xx }
    cs_generate_variant:Nn tl_if_eq:nnT { xx }
    cs_generate_variant:Nn str_count:N { c }
    cs_generate_variant:Nn regex_match:nnF { nV }
    cs_generate_variant:Nn str_set:Nn { NV }
    str_new:N l__find_prefix_str
    int_new:N l__find_prefix_int
    tl_new:N l__find_tl
    tl_new:N l__find_chunks_tl
    int_step_inline:nnnn { 1 } { 1 } { 9 }
    { seq_new:c { l__find_chunks_#1_seq } }
    int_new:N l__find_common_int
    int_new:N l__find_nesting_int
    tl_new:N l__find_previous_tl
    seq_new:N l__find_chunks_seq
    int_new:N l__find_best_score_int
    int_new:N l__find_macro_int
    tl_new:N l__find_macros_tl
    tl_new:N l__find_result_tl
    int_new:N l__find_length_int
    int_new:N l__find_previous_length_int
    tl_new:N l__find_display_tl

    cs_new_protected_nopar:Npn dot: { tex_message:D { . } }

    cs_new_protected:Npn find_matches:nnN #1#2#3
    {
    % '#1' is the prefix, '#2' is the token list to study.
    %
    __find_set_prefix:n {#1}
    tl_set:Nn l__find_tl { ~! #2 }
    __find_escape_spaces:xN { l__find_prefix_str A } l__find_tl
    int_set_eq:NN l__find_macro_int c_one
    __find_get_length:V l__find_prefix_str
    iow_term:x { int_use:N l__find_length_int }
    int_set_eq:NN l__find_length_int c_max_int
    __find_matches_aux:V l__find_prefix_str
    tl_replace_once:Nnn l__find_tl { ! } { { ~ } }
    tl_set:Nx #3
    {
    __find_preamble:
    exp_not:V l__find_tl
    }
    }
    cs_new_protected:Npn __find_escape_spaces:nN #1#2
    {
    regex_replace_all:nnN { cS } { c{#1} } #2
    dot:
    }
    cs_generate_variant:Nn __find_escape_spaces:nN { x }
    cs_new_nopar:Npn __find_preamble:
    {
    exp_not:n { long def } exp_not:c { l__find_prefix_str A } ####1####2
    {
    exp_not:N def ####1{####2}
    exp_not:c { l__find_prefix_str A }
    }
    exp_not:c { l__find_prefix_str A }
    }
    cs_new_protected:Npn __find_matches_aux:n #1
    {
    int_set_eq:NN l__find_previous_length_int l__find_length_int
    tl_set_eq:NN l__find_previous_tl l__find_tl
    __find_escape_tl:nN {#1} l__find_tl
    int_compare:nNnTF { tl_count:N l__find_tl } < c_nine
    { tl_set_eq:Nn l__find_tl l__find_previous_tl }
    {
    __find_set_chunks:
    __find_sort_chunks:
    __find_common:
    __find_best_macros:
    __find_undefine_tmp:
    tl_set_eq:NN l__find_tl l__find_result_tl
    __find_unescape_tl:nN {#1} l__find_tl
    __find_get_length:n {#1}
    iow_term:x { int_use:N l__find_length_int }
    int_compare:nNnTF l__find_length_int < l__find_previous_length_int
    { __find_matches_aux:n {#1} }
    {
    iow_term:n { }
    iow_term:x { l__find_display_tl }
    iow_term:n { }
    tl_set_eq:NN l__find_tl l__find_previous_tl
    }
    }
    }
    cs_generate_variant:Nn __find_matches_aux:n { V }

    cs_new_protected:Npn __find_get_length:n #1
    {
    tl_set_eq:NN l__find_display_tl l__find_tl
    tl_replace_once:Nxx l__find_display_tl
    { exp_not:c { #1 A } ! }
    { ~ exp_not:c { #1 A } {~} }
    str_set:NV l__find_display_tl l__find_display_tl
    tl_replace_all:Nxn l__find_display_tl
    { c_backslash_str #1 token_to_str:N A ~ } c_space_tl
    tl_replace_all:Nnn l__find_display_tl
    { ~ c_space_tl } { ~ exp_not:c { #1 A } }
    dot:
    str_set:Nx l__find_display_tl { __find_preamble: l__find_display_tl }
    tl_replace_all:Nxx l__find_display_tl
    { c_hash_str c_hash_str } { c_hash_str }
    dot:
    regex_replace_all:nnN
    { (\[A-Za-z]+) ([A-Za-z]) }
    { 1 2 }
    l__find_display_tl
    dot:
    regex_replace_all:nnN
    { (\[A-Za-z]+) }
    { 1 c{c__iow_wrap_zwbs_marker_tl} }
    l__find_display_tl
    dot:
    iow_wrap:nnnN { l__find_display_tl } { } { } __find_get_length_aux:n
    }
    cs_generate_variant:Nn __find_get_length:n { V }
    cs_new_protected:Npn __find_get_length_aux:n #1
    {
    int_set:Nn l__find_length_int { str_count:n {#1} }
    tl_set:Nn l__find_display_tl {#1}
    % iow_term:n { }
    % iow_term:n {#1}
    % iow_term:n { }
    }



    cs_new_protected:Npn __find_set_prefix:n #1
    {
    % Check that the prefix |#1| is made only of alphabetic characters.
    %
    str_set:Nx l__find_prefix_str {#1}
    int_set:Nn l__find_prefix_int { str_count:N l__find_prefix_str }
    regex_match:nVF { Aw*Z } l__find_prefix_str
    {
    msg_error:nnx { find } { invalid-prefix }
    { l__find_prefix_str }
    }
    dot:
    }

    cs_new_protected:Npn __find_escape_tl:nN #1#2
    {
    % During the 'study' step, we manipulate the token list |#2|
    % with all begin-group and end-group tokens replaced by a
    % control sequence built from the prefix. We must change both
    % begin-group and end-group tokens in one pass, to avoid getting an
    % unbalanced result. Also replace macro parameters because they
    % cannot be used as delimiters for macros. Spaces have been
    % turned into a control sequence earlier. At this stage, every
    % token in |#2| can be grabbed as an N-type argument.
    %
    regex_replace_all:nnN { cB. } { cB{ } #2
    dot:
    regex_replace_all:nnN { cE. } { cE} } #2
    dot:
    regex_replace_all:nnN { cP. } { c{ #1 # } } #2
    dot:
    regex_replace_all:nnN { c[BEP]. } { c{ #1 } } #2
    dot:
    }




    cs_new_protected_nopar:Npn __find_set_chunks:
    {
    % Build a token list whose items are each nine consecutive tokens
    % of the original token list, in a running window. So for instance
    % |ABCDEFGHIJKL| would lead to the following (12) items:
    % |ABCDEFGHI|, |BCDEFGHIJ|, |CDEFGHIJK|, |DEFGHIJKL|, |EFGHIJKL|,
    % |FGHIJKL|, |GHIJKL|, |HIJKL|, |IJKL|, |JKL|, |KL|, |L|. The items
    % of this token list are built in an |x|-expanded loop.
    % A special case arises if the |find| token list is too short to
    % safely perform the loop: then our whole algorithm is not going to
    % do any good anyways, so we build an empty chunk list.
    %
    tl_set:Nx l__find_chunks_tl
    {
    exp_after:wN __find_set_chunks_loop:NNNNNNNNN l__find_tl
    q_recursion_tail q_recursion_stop
    }
    }
    cs_new:Npn __find_set_chunks_loop:NNNNNNNNN #1#2#3#4#5#6#7#8#9
    {
    % As usual in a TeX loop, first check for the end-loop marker (here,
    % cs{q_recursion_tail}). If it is reached, we fill in the last few
    % chunks (which become shorter and shorter as we go). Otherwise,
    % add (to the token list we are building) an item containing (9)
    % tokens, and loop, dropping the first of the items.
    %
    quark_if_recursion_tail_stop_do:Nn #9
    { __find_set_chunks_end:NNNNNNNN #1 #2 #3 #4 #5 #6 #7 #8 }
    { exp_not:n { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
    __find_set_chunks_loop:NNNNNNNNN #2#3#4#5#6#7#8#9
    }
    cs_new:Npn __find_set_chunks_end:NNNNNNNN #1#2#3#4#5#6#7#8
    {
    exp_not:n
    {
    { #1 #2 #3 #4 #5 #6 #7 #8 }
    { #2 #3 #4 #5 #6 #7 #8 }
    { #3 #4 #5 #6 #7 #8 }
    { #4 #5 #6 #7 #8 }
    { #5 #6 #7 #8 }
    { #6 #7 #8 }
    { #7 #8 }
    { #8 }
    }
    }
    cs_new_protected:Npn __find_sort_chunks:
    {
    tl_sort:Nn l__find_chunks_tl
    {
    int_compare:nNnTF
    {
    pdftex_strcmp:D
    { exp_not:n {##1} }
    { exp_not:n {##2} }
    }
    > c_zero
    { sort_return_swapped: }
    { sort_return_same: }
    }
    }

    cs_new_protected:Npn __find_common:
    {
    int_step_inline:nnnn { 1 } { 1 } { 9 }
    { seq_clear:c { l__find_chunks_##1_seq } }
    exp_after:wN __find_common_loop:nn l__find_chunks_tl
    q_recursion_tail q_recursion_tail q_recursion_stop
    int_step_inline:nnnn { 4 } { 1 } { 9 }
    {
    seq_map_inline:cn { l__find_chunks_##1_seq }
    {
    tl_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {####1} ' _tl }
    {
    tl_put_right:cn
    { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } { i }
    }
    {
    cs_set_eq:cN
    { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } c_empty_tl
    }
    }
    }
    }

    cs_new_protected:Npn __find_common_loop:nn #1#2
    {
    quark_if_recursion_tail_stop:n {#2}
    int_zero:N l__find_common_int
    __find_count_common_aux:nn {#1} {#2}
    use:c { __find_common_ int_use:N l__find_common_int :w }
    #1 X X X X X X X X X q_stop
    __find_common_loop:nn {#2}
    }
    cs_new_protected:Npn __find_count_common_aux:nn #1#2
    {
    tl_if_empty:nF {#1}
    {
    tl_if_empty:nF {#2}
    {
    tl_if_eq:xxT { tl_head:n {#1} } { tl_head:n {#2} }
    {
    int_incr:N l__find_common_int
    __find_count_common_aux:xx
    { tl_tail:n {#1} } { tl_tail:n {#2} }
    }
    }
    }
    }
    cs_generate_variant:Nn __find_count_common_aux:nn { xx }

    cs_new_eq:cN { __find_common_0:w } use_none_delimit_by_q_stop:w
    cs_new_protected:cpn { __find_common_1:w } #1
    { __find_common_auxii:nnw { 1 } {#1} }
    cs_new_protected:cpn { __find_common_2:w } #1#2
    { __find_common_auxii:nnw { 2 } { #1 #2 } }
    cs_new_protected:cpn { __find_common_3:w } #1#2#3
    { __find_common_auxii:nnw { 3 } { #1 #2 #3 } }
    cs_new_protected:cpn { __find_common_4:w } #1#2#3#4
    { __find_common_auxii:nnw { 4 } { #1 #2 #3 #4 } }
    cs_new_protected:cpn { __find_common_5:w } #1#2#3#4#5
    {
    dot:
    __find_common_auxii:nnw { 5 } { #1 #2 #3 #4 #5 }
    }
    cs_new_protected:cpn { __find_common_6:w } #1#2#3#4#5#6
    { __find_common_auxii:nnw { 6 } { #1 #2 #3 #4 #5 #6 } }
    cs_new_protected:cpn { __find_common_7:w } #1#2#3#4#5#6#7
    { __find_common_auxii:nnw { 7 } { #1 #2 #3 #4 #5 #6 #7 } }
    cs_new_protected:cpn { __find_common_8:w } #1#2#3#4#5#6#7#8
    { __find_common_auxii:nnw { 8 } { #1 #2 #3 #4 #5 #6 #7 #8 } }
    cs_new_protected:cpn { __find_common_9:w } #1#2#3#4#5#6#7#8#9
    { __find_common_auxii:nnw { 9 } { #1 #2 #3 #4 #5 #6 #7 #8 #9 } }
    cs_new_protected:Npn __find_common_auxii:nnw #1#2#3 q_stop
    {
    int_zero:N l__find_nesting_int
    tl_map_inline:nn {#2}
    {
    str_case_x:nnn { exp_not:n {##1} }
    {
    { exp_not:c { l__find_prefix_str c_lbrace_str } }
    { int_incr:N l__find_nesting_int }
    { exp_not:c { l__find_prefix_str c_rbrace_str } }
    {
    int_compare:nNnF l__find_nesting_int > c_zero
    { use_none_delimit_by_q_stop:w }
    int_decr:N l__find_nesting_int
    }
    }
    { }
    }
    int_compare:nNnF l__find_nesting_int = c_zero
    { use_none_delimit_by_q_stop:w }
    seq_put_right:cn { l__find_chunks_#1_seq } {#2}
    use_none_delimit_by_q_stop:w
    q_stop
    }

    cs_new_protected_nopar:Npn __find_best_macros:
    {
    tl_clear:N l__find_macros_tl
    tl_clear:N l__find_result_tl
    __find_best_macros_aux:
    tl_put_left:NV l__find_result_tl l__find_macros_tl
    }
    cs_new_protected:Npn __find_best_macros_aux:
    {
    exp_after:wN __find_best_macros_auxii:NNNNNNNNN l__find_tl
    q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_nil q_stop
    tl_if_empty:NF l__find_tl { __find_best_macros_aux: }
    }
    cs_new_protected:Npn __find_best_macros_auxii:NNNNNNNNN #1#2#3#4#5#6#7#8#9
    {
    int_zero:N l__find_best_score_int
    tl_clear:N l__find_best_chunk_tl
    tl_map_inline:nn
    {
    {#1} {#1#2} {#1#2#3} {#1#2#3#4} {#1#2#3#4#5} {#1#2#3#4#5#6}
    {#1#2#3#4#5#6#7} {#1#2#3#4#5#6#7#8} {#1#2#3#4#5#6#7#8#9}
    }
    {
    int_compare:nNnT
    { __find_score:n {##1} } > l__find_best_score_int
    {
    tl_set:Nn l__find_best_chunk_tl {##1}
    int_set:Nn l__find_best_score_int
    { __find_score:n {##1} }
    }
    }
    tl_if_empty:NF l__find_best_chunk_tl
    {
    int_incr:N l__find_macro_int
    tl_put_right:Nx l__find_macros_tl
    {
    exp_not:c
    { l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
    { exp_not:V l__find_best_chunk_tl }
    }
    tl_replace_all:NVc l__find_tl l__find_best_chunk_tl
    { l__find_prefix_str int_to_Alph:n { l__find_macro_int } }
    dot:
    }
    tl_put_right:Nx l__find_result_tl { tl_head:N l__find_tl }
    tl_set:Nx l__find_tl { tl_tail:N l__find_tl }
    use_none_delimit_by_q_stop:w
    }
    cs_new:Npn __find_score:n #1
    {
    % Turning ####1 into a length (p+2) macro
    % (e.g. |<prefix>A|) saves this number of chars.
    % Good if non-negative.
    cs_if_exist:cTF { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
    {
    int_eval:n
    {
    tl_count:c { l__find_chunk_ ' tl_to_str:n {#1} ' _tl }
    * ( str_count:n {#1} - 3 - l__find_prefix_int )
    - 2 * l__find_prefix_int - 9
    }
    }
    { -1 }
    }

    cs_new_protected:Npn __find_undefine_tmp:
    {
    int_step_inline:nnnn { 4 } { 1 } { 9 }
    {
    seq_map_inline:cn { l__find_chunks_##1_seq }
    { cs_undefine:c { l__find_chunk_ ' tl_to_str:n {####1} ' _tl } }
    }
    }

    cs_new_protected:Npn __find_unescape_tl:nN #1#2
    {
    regex_replace_all:nnN { c{#1{} } { cB{ cE? } #2
    dot:
    regex_replace_all:nnN { c{#1}} } { cB? cE} } #2
    dot:
    regex_replace_all:nnN { c[BE]? } { } #2
    dot:
    regex_replace_all:nnN { c{#1#} } { cP# } #2
    dot:
    }

    %%%% Hack LaTeX3's iow_wrap to allow zero-width breaking spaces
    tl_const:Nx c__iow_wrap_zwbs_marker_tl
    {
    c_catcode_other_space_tl
    c__iow_wrap_marker_tl
    c_catcode_other_space_tl
    zwbs
    c_catcode_other_space_tl
    }
    cs_new_protected_nopar:Npn __iow_wrap_zwbs:
    {
    tl_put_right:Nn l__iow_current_line_tl
    { tex_romannumeral:D -`0 }
    int_decr:N l__iow_current_line_int
    }
    %%% End of hack.

    msg_new:nnn { find } { invalid-prefix }
    { The~prefix~used,~'#1',~must~be~made~of~letters. }

    cs_set_protected:Npn FindMatches #1#2
    {
    find_matches:nnN {#1} {#2} l_tmpa_tl
    tl_use:N l_tmpa_tl
    }
    ExplSyntaxOff

    % Set up some basic things to behave a bit like plain TeX.
    %
    documentclass{article}
    begin{document}
    setlength{parindent}{0pt}
    defbye{end{document}}
    %
    % End of setup.

    FindMatches{}{% Let's go!

    On the first day of Christmas my true love gave to mepar
    a partridge in a pear tree.

    bigskip

    On the second day of Christmas my true love gave to mepar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the third day of Christmas my true love gave to mepar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the fourth day of Christmas my true love gave to mepar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the fifth day of Christmas my true love gave to mepar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the sixth day of Christmas my true love gave to mepar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the seventh day of Christmas my true love gave to mepar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the eighth day of Christmas my true love gave to mepar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the ninth day of Christmas my true love gave to mepar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the tenth day of Christmas my true love gave to mepar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the eleventh day of Christmas my true love gave to mepar
    eleven pipers pipingpar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.

    bigskip

    On the twelfth day of Christmas my true love gave to mepar
    twelve drummers drummingpar
    eleven pipers pipingpar
    ten lords a leapingpar
    nine ladies dancingpar
    eight maids a milkingpar
    seven swans a swimmingpar
    six geese a layingpar
    five gold ringspar
    four calling birdspar
    three french henspar
    two turtle dovespar
    and a partridge in a pear tree.
    bye
    }






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 28 mins ago

























    answered May 3 '13 at 0:48









    Bruno Le Floch

    33.8k5114211




    33.8k5114211








    • 5




      Awesome as always, Bruno! :)
      – Paulo Cereda
      May 3 '13 at 0:54






    • 5




      This is one of the more outrageous things I've ever seen here. (It is very impressive!)
      – Ryan Reich
      May 3 '13 at 0:56










    • Hehe! Thanks @RyanReich. By the way, do you know of any algorithm in the vein of LZW that would be able to produce macros with varying arguments?
      – Bruno Le Floch
      May 3 '13 at 1:10












    • @Bruno All I know about LZW is the idea, unfortunately. You mean, you want the "dictionary" to be macros that take an argument, or that take a variable number of arguments? Isn't this the same as having an array and then having the macro argument index into it?
      – Ryan Reich
      May 3 '13 at 4:15






    • 2




      @BrunoLeFloch no wonder we get stalled on the middle layer of L3, but at least nobody can any longer deny that expl3 is up to arbitrary programming tasks :-)
      – Frank Mittelbach
      May 4 '13 at 20:16














    • 5




      Awesome as always, Bruno! :)
      – Paulo Cereda
      May 3 '13 at 0:54






    • 5




      This is one of the more outrageous things I've ever seen here. (It is very impressive!)
      – Ryan Reich
      May 3 '13 at 0:56










    • Hehe! Thanks @RyanReich. By the way, do you know of any algorithm in the vein of LZW that would be able to produce macros with varying arguments?
      – Bruno Le Floch
      May 3 '13 at 1:10












    • @Bruno All I know about LZW is the idea, unfortunately. You mean, you want the "dictionary" to be macros that take an argument, or that take a variable number of arguments? Isn't this the same as having an array and then having the macro argument index into it?
      – Ryan Reich
      May 3 '13 at 4:15






    • 2




      @BrunoLeFloch no wonder we get stalled on the middle layer of L3, but at least nobody can any longer deny that expl3 is up to arbitrary programming tasks :-)
      – Frank Mittelbach
      May 4 '13 at 20:16








    5




    5




    Awesome as always, Bruno! :)
    – Paulo Cereda
    May 3 '13 at 0:54




    Awesome as always, Bruno! :)
    – Paulo Cereda
    May 3 '13 at 0:54




    5




    5




    This is one of the more outrageous things I've ever seen here. (It is very impressive!)
    – Ryan Reich
    May 3 '13 at 0:56




    This is one of the more outrageous things I've ever seen here. (It is very impressive!)
    – Ryan Reich
    May 3 '13 at 0:56












    Hehe! Thanks @RyanReich. By the way, do you know of any algorithm in the vein of LZW that would be able to produce macros with varying arguments?
    – Bruno Le Floch
    May 3 '13 at 1:10






    Hehe! Thanks @RyanReich. By the way, do you know of any algorithm in the vein of LZW that would be able to produce macros with varying arguments?
    – Bruno Le Floch
    May 3 '13 at 1:10














    @Bruno All I know about LZW is the idea, unfortunately. You mean, you want the "dictionary" to be macros that take an argument, or that take a variable number of arguments? Isn't this the same as having an array and then having the macro argument index into it?
    – Ryan Reich
    May 3 '13 at 4:15




    @Bruno All I know about LZW is the idea, unfortunately. You mean, you want the "dictionary" to be macros that take an argument, or that take a variable number of arguments? Isn't this the same as having an array and then having the macro argument index into it?
    – Ryan Reich
    May 3 '13 at 4:15




    2




    2




    @BrunoLeFloch no wonder we get stalled on the middle layer of L3, but at least nobody can any longer deny that expl3 is up to arbitrary programming tasks :-)
    – Frank Mittelbach
    May 4 '13 at 20:16




    @BrunoLeFloch no wonder we get stalled on the middle layer of L3, but at least nobody can any longer deny that expl3 is up to arbitrary programming tasks :-)
    – Frank Mittelbach
    May 4 '13 at 20:16











    5














    What I present is a total kludge, stealing things from my titlecaps package. So the format is not optimal, but it might give a place from which to proceed. Nice thing is that punctuation is screened out of the search in the target-text.



    When a search-string and target-text (not exceeding one paragraph) is passed to the seekphrase, it will output where each word of the search-phrase appears in the target-text. I output it as a pair "target-location:search-word-index". To try to make sense of it, I output a [ before a search-word-index=1 match and a ]after a search-word-index=n match. Thus to qualify as a matching "phrase" in, for example, a 3-word search, the output would have to contain an instance of something like [214:1 215:2, 216:3]



    Further refinement is clearly possible, but it would take more time than I have at the moment.



    documentclass{article}
    usepackage{titlecaps}
    usepackage{lipsum}
    makeatletter
    renewcommandseek@lcwords{%
    kill@punct%
    setcounter{word@count}{0}%
    whiledo{value{word@count} < narg}{%
    addtocounter{word@count}{1}%
    protected@edefcurrent@word{csname argroman{word@count}endcsname}%
    deffound@word{F}%
    setcounter{lcword@index}{0}%
    expandafterdefcsname%
    found@wordroman{word@count}endcsname{F}%
    whiledo{value{lcword@index} < value{lc@words}}{%
    addtocounter{lcword@index}{1}%
    protected@edefcurrent@lcword{%
    csname lcwordroman{lcword@index}endcsname}%
    %% THE FOLLOWING THREE LINES ARE FROM DAVID CARLISLE
    protected@edeftmp{noexpandscantokens{defnoexpandtmp%
    {noexpandifthenelse{noexpandequal{current@word}{current@lcword}}}}}%
    tmpifhmodeunskipfitmp
    %%
    {expandafterdefcsname%
    found@wordroman{word@count}endcsname{T}%
    ifthenelse{equal{value{lcword@index}}{1}}{[}{}%
    arabic{word@count}:arabic{lcword@index}%
    ifthenelse{equal{value{lcword@index}}{value{lc@words}}}{]}{ }%
    setcounter{lcword@index}{value{lc@words}}}%
    {}%
    }%
    }%
    restore@punct%
    }
    letgetargsCget@argsC

    newcommandseekphrase[2]{%
    Seeking phrase ``#1'':\%
    Addlcwords{#1}%
    redefine@tertius%
    getargsC{#2}%
    seek@lcwords%
    Resetlcwords%
    par%
    }
    makeatother

    begin{document}

    lipsum[1]
    lipsum[1]

    defx{%
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
    purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
    Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
    consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
    Pellentesque habitant morbi tristique senectus et netus et malesuada
    fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
    Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
    sit amet tortor gravida placerat. Integer sapien est, iaculis in,
    pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
    bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
    at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
    eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
    diam. Duis eget orci sit amet orci dignissim rutrum.
    %
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
    purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
    Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
    consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
    Pellentesque habitant morbi tristique senectus et netus et malesuada
    fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
    Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
    sit amet tortor gravida placerat. Integer sapien est, iaculis in,
    pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
    bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
    at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
    eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
    diam. Duis eget orci sit amet orci dignissim rutrum.
    }

    seekphrase{et}{x}

    seekphrase{et netus}{x}

    seekphrase{bibendum Aenean}{x}

    seekphrase{eget sem vel}{x}

    end{document}


    enter image description here






    share|improve this answer




























      5














      What I present is a total kludge, stealing things from my titlecaps package. So the format is not optimal, but it might give a place from which to proceed. Nice thing is that punctuation is screened out of the search in the target-text.



      When a search-string and target-text (not exceeding one paragraph) is passed to the seekphrase, it will output where each word of the search-phrase appears in the target-text. I output it as a pair "target-location:search-word-index". To try to make sense of it, I output a [ before a search-word-index=1 match and a ]after a search-word-index=n match. Thus to qualify as a matching "phrase" in, for example, a 3-word search, the output would have to contain an instance of something like [214:1 215:2, 216:3]



      Further refinement is clearly possible, but it would take more time than I have at the moment.



      documentclass{article}
      usepackage{titlecaps}
      usepackage{lipsum}
      makeatletter
      renewcommandseek@lcwords{%
      kill@punct%
      setcounter{word@count}{0}%
      whiledo{value{word@count} < narg}{%
      addtocounter{word@count}{1}%
      protected@edefcurrent@word{csname argroman{word@count}endcsname}%
      deffound@word{F}%
      setcounter{lcword@index}{0}%
      expandafterdefcsname%
      found@wordroman{word@count}endcsname{F}%
      whiledo{value{lcword@index} < value{lc@words}}{%
      addtocounter{lcword@index}{1}%
      protected@edefcurrent@lcword{%
      csname lcwordroman{lcword@index}endcsname}%
      %% THE FOLLOWING THREE LINES ARE FROM DAVID CARLISLE
      protected@edeftmp{noexpandscantokens{defnoexpandtmp%
      {noexpandifthenelse{noexpandequal{current@word}{current@lcword}}}}}%
      tmpifhmodeunskipfitmp
      %%
      {expandafterdefcsname%
      found@wordroman{word@count}endcsname{T}%
      ifthenelse{equal{value{lcword@index}}{1}}{[}{}%
      arabic{word@count}:arabic{lcword@index}%
      ifthenelse{equal{value{lcword@index}}{value{lc@words}}}{]}{ }%
      setcounter{lcword@index}{value{lc@words}}}%
      {}%
      }%
      }%
      restore@punct%
      }
      letgetargsCget@argsC

      newcommandseekphrase[2]{%
      Seeking phrase ``#1'':\%
      Addlcwords{#1}%
      redefine@tertius%
      getargsC{#2}%
      seek@lcwords%
      Resetlcwords%
      par%
      }
      makeatother

      begin{document}

      lipsum[1]
      lipsum[1]

      defx{%
      Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
      purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
      Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
      consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
      Pellentesque habitant morbi tristique senectus et netus et malesuada
      fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
      Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
      sit amet tortor gravida placerat. Integer sapien est, iaculis in,
      pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
      bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
      at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
      eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
      diam. Duis eget orci sit amet orci dignissim rutrum.
      %
      Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
      purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
      Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
      consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
      Pellentesque habitant morbi tristique senectus et netus et malesuada
      fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
      Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
      sit amet tortor gravida placerat. Integer sapien est, iaculis in,
      pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
      bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
      at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
      eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
      diam. Duis eget orci sit amet orci dignissim rutrum.
      }

      seekphrase{et}{x}

      seekphrase{et netus}{x}

      seekphrase{bibendum Aenean}{x}

      seekphrase{eget sem vel}{x}

      end{document}


      enter image description here






      share|improve this answer


























        5












        5








        5






        What I present is a total kludge, stealing things from my titlecaps package. So the format is not optimal, but it might give a place from which to proceed. Nice thing is that punctuation is screened out of the search in the target-text.



        When a search-string and target-text (not exceeding one paragraph) is passed to the seekphrase, it will output where each word of the search-phrase appears in the target-text. I output it as a pair "target-location:search-word-index". To try to make sense of it, I output a [ before a search-word-index=1 match and a ]after a search-word-index=n match. Thus to qualify as a matching "phrase" in, for example, a 3-word search, the output would have to contain an instance of something like [214:1 215:2, 216:3]



        Further refinement is clearly possible, but it would take more time than I have at the moment.



        documentclass{article}
        usepackage{titlecaps}
        usepackage{lipsum}
        makeatletter
        renewcommandseek@lcwords{%
        kill@punct%
        setcounter{word@count}{0}%
        whiledo{value{word@count} < narg}{%
        addtocounter{word@count}{1}%
        protected@edefcurrent@word{csname argroman{word@count}endcsname}%
        deffound@word{F}%
        setcounter{lcword@index}{0}%
        expandafterdefcsname%
        found@wordroman{word@count}endcsname{F}%
        whiledo{value{lcword@index} < value{lc@words}}{%
        addtocounter{lcword@index}{1}%
        protected@edefcurrent@lcword{%
        csname lcwordroman{lcword@index}endcsname}%
        %% THE FOLLOWING THREE LINES ARE FROM DAVID CARLISLE
        protected@edeftmp{noexpandscantokens{defnoexpandtmp%
        {noexpandifthenelse{noexpandequal{current@word}{current@lcword}}}}}%
        tmpifhmodeunskipfitmp
        %%
        {expandafterdefcsname%
        found@wordroman{word@count}endcsname{T}%
        ifthenelse{equal{value{lcword@index}}{1}}{[}{}%
        arabic{word@count}:arabic{lcword@index}%
        ifthenelse{equal{value{lcword@index}}{value{lc@words}}}{]}{ }%
        setcounter{lcword@index}{value{lc@words}}}%
        {}%
        }%
        }%
        restore@punct%
        }
        letgetargsCget@argsC

        newcommandseekphrase[2]{%
        Seeking phrase ``#1'':\%
        Addlcwords{#1}%
        redefine@tertius%
        getargsC{#2}%
        seek@lcwords%
        Resetlcwords%
        par%
        }
        makeatother

        begin{document}

        lipsum[1]
        lipsum[1]

        defx{%
        Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
        purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
        Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
        consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
        Pellentesque habitant morbi tristique senectus et netus et malesuada
        fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
        Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
        sit amet tortor gravida placerat. Integer sapien est, iaculis in,
        pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
        bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
        at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
        eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
        diam. Duis eget orci sit amet orci dignissim rutrum.
        %
        Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
        purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
        Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
        consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
        Pellentesque habitant morbi tristique senectus et netus et malesuada
        fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
        Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
        sit amet tortor gravida placerat. Integer sapien est, iaculis in,
        pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
        bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
        at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
        eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
        diam. Duis eget orci sit amet orci dignissim rutrum.
        }

        seekphrase{et}{x}

        seekphrase{et netus}{x}

        seekphrase{bibendum Aenean}{x}

        seekphrase{eget sem vel}{x}

        end{document}


        enter image description here






        share|improve this answer














        What I present is a total kludge, stealing things from my titlecaps package. So the format is not optimal, but it might give a place from which to proceed. Nice thing is that punctuation is screened out of the search in the target-text.



        When a search-string and target-text (not exceeding one paragraph) is passed to the seekphrase, it will output where each word of the search-phrase appears in the target-text. I output it as a pair "target-location:search-word-index". To try to make sense of it, I output a [ before a search-word-index=1 match and a ]after a search-word-index=n match. Thus to qualify as a matching "phrase" in, for example, a 3-word search, the output would have to contain an instance of something like [214:1 215:2, 216:3]



        Further refinement is clearly possible, but it would take more time than I have at the moment.



        documentclass{article}
        usepackage{titlecaps}
        usepackage{lipsum}
        makeatletter
        renewcommandseek@lcwords{%
        kill@punct%
        setcounter{word@count}{0}%
        whiledo{value{word@count} < narg}{%
        addtocounter{word@count}{1}%
        protected@edefcurrent@word{csname argroman{word@count}endcsname}%
        deffound@word{F}%
        setcounter{lcword@index}{0}%
        expandafterdefcsname%
        found@wordroman{word@count}endcsname{F}%
        whiledo{value{lcword@index} < value{lc@words}}{%
        addtocounter{lcword@index}{1}%
        protected@edefcurrent@lcword{%
        csname lcwordroman{lcword@index}endcsname}%
        %% THE FOLLOWING THREE LINES ARE FROM DAVID CARLISLE
        protected@edeftmp{noexpandscantokens{defnoexpandtmp%
        {noexpandifthenelse{noexpandequal{current@word}{current@lcword}}}}}%
        tmpifhmodeunskipfitmp
        %%
        {expandafterdefcsname%
        found@wordroman{word@count}endcsname{T}%
        ifthenelse{equal{value{lcword@index}}{1}}{[}{}%
        arabic{word@count}:arabic{lcword@index}%
        ifthenelse{equal{value{lcword@index}}{value{lc@words}}}{]}{ }%
        setcounter{lcword@index}{value{lc@words}}}%
        {}%
        }%
        }%
        restore@punct%
        }
        letgetargsCget@argsC

        newcommandseekphrase[2]{%
        Seeking phrase ``#1'':\%
        Addlcwords{#1}%
        redefine@tertius%
        getargsC{#2}%
        seek@lcwords%
        Resetlcwords%
        par%
        }
        makeatother

        begin{document}

        lipsum[1]
        lipsum[1]

        defx{%
        Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
        purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
        Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
        consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
        Pellentesque habitant morbi tristique senectus et netus et malesuada
        fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
        Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
        sit amet tortor gravida placerat. Integer sapien est, iaculis in,
        pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
        bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
        at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
        eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
        diam. Duis eget orci sit amet orci dignissim rutrum.
        %
        Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut
        purus elit, vestibulum ut, placerat ac, adipiscing vitae, felis.
        Curabitur dictum gravida mauris. Nam arcu libero, nonummy eget,
        consectetuer id, vulputate a, magna. Donec vehicula augue eu neque.
        Pellentesque habitant morbi tristique senectus et netus et malesuada
        fames ac turpis egestas. Mauris ut leo. Cras viverra metus rhoncus sem.
        Nulla et lectus vestibulum urna fringilla ultrices. Phasellus eu tellus
        sit amet tortor gravida placerat. Integer sapien est, iaculis in,
        pretium quis, viverra ac, nunc. Praesent eget sem vel leo ultrices
        bibendum. Aenean faucibus. Morbi dolor nulla, malesuada eu, pulvinar
        at, mollis ac, nulla. Curabitur auctor semper nulla. Donec varius orci
        eget risus. Duis nibh mi, congue eu, accumsan eleifend, sagittis quis,
        diam. Duis eget orci sit amet orci dignissim rutrum.
        }

        seekphrase{et}{x}

        seekphrase{et netus}{x}

        seekphrase{bibendum Aenean}{x}

        seekphrase{eget sem vel}{x}

        end{document}


        enter image description here







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited May 2 '13 at 20:21

























        answered May 2 '13 at 20:10









        Steven B. Segletes

        152k9192400




        152k9192400






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to TeX - LaTeX Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f101434%2fsearch-for-most-frequently-occurring-patterns-in-text-to-replace-them-by-macros%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Accessing regular linux commands in Huawei's Dopra Linux

            Can't connect RFCOMM socket: Host is down

            Kernel panic - not syncing: Fatal Exception in Interrupt