3

I have a variable (defined with \csdef{id}{value}) which contains a normal (albeit long) string of text.

I need a macro (?) which does roughly the following:

  • Squeezes this string into a "virtual" (invisible) minipage
  • To do this, it uses standard hyphenation rules to break the string into separate lines
  • This minipage has a certain width, which I've specified
  • The lines are written, say, into a string array (maybe with the same \csdef{id}{value} command).

These two questions are about something very similar, but I don't have some specially structured text which I need to parse. I just need to make use of the standard hyphenation function of LaTeX, except "behind the scenes", without actually typesetting anything.

3
  • 1
    Hyphenation happens at a deep stage in TeX processing and the word parts are not available for retrieval.
    – egreg
    Commented Oct 13, 2016 at 21:45
  • @egreg well I guess that answers it. Could you please convert your comment to an answer so I could accept it?
    – ScumCoder
    Commented Oct 13, 2016 at 22:08
  • 2
    is luatex a possibility? (if so we may be able to stop @egreg getting a tick, which is always a good objective) Commented Oct 13, 2016 at 22:59

2 Answers 2

3

With Luatex you can recreate character tokens from typeset box nodes, a direction not possible in classic tex.

\documentclass{article}
\usepackage{color}
\begin{document}

\def\dohyphens#1#2{%
\directlua{
res='\string\\gdef\string\\#1\iftrue\string{\else}\fi'
local G = node.id("glyph")
local D = node.id("disc")
local K = node.id("kern")
gethyph = function (head)
local l = -1
for n in node.traverse(head) do
 if n.id==G then
   if l\string~=G then
     res = res ..'\string\\HYchars\iftrue\string{\else}\fi'
   end
     res = res .. string.char(n.char)
 else 
   if l\string==G then
     res = res .. '\iffalse{\else\string}\fi'
   end
 if n.id==D then
   res = res .. '\string\\HYhyphen'
 else if (n.id ==K) then
 else if l==G or l==D then
    res = res .. '\string\\HYspace'
 end
 end
 end
 end
 l=n.id
end
res = res ..'\iffalse{\else\string}\fi'
return true
end 
luatexbase.add_to_callback('pre_linebreak_filter',gethyph,'get hyphens')
}%
\setbox0\vbox{\hsize\maxdimen\ttfamily\hyphenchar\font=`\-#2}%
\directlua{
luatexbase.remove_from_callback('pre_linebreak_filter','get hyphens')
tex.sprint(res)
}%
}

\protected\def\HYchars#1{\color{red}{#1}}
\protected\def\HYhyphen{\colorbox{yellow}{\color{blue}{---}}}
\protected\def\HYspace{ }

\dohyphens{tmp}{supercalifragilisticexpialidocious}

\typeout{tmp is}
\typeout{\meaning\tmp}

\tmp
\end{document}

The above splits up supercalifragilisticexpialidocious putting the result into \tmp which it shows on the terminal as

tmp is
macro:->\HYchars {su}\HYhyphen \HYchars {p}\HYchars {er}\HYhyphen \HYchars {cal
}\HYhyphen \HYchars {ifrag}\HYhyphen \HYchars {ilis}\HYhyphen \HYchars {tic}\HY
hyphen \HYchars {ex}\HYhyphen \HYchars {pi}\HYhyphen \HYchars {ali}\HYhyphen \H
Ychars {do}\HYhyphen \HYchars {cious}\HYspace 

so basically runs of letters end up in \HYchars discretionary hyphens as \HYhyphen inter-letter kerns are removed (although there should be no kerns as using ttfamily to also avoid ligatures) and anything else is \HYspace

You can define the \HY... commands to do whatever you want here it just adds some colour and exaggerated, non discretionary, dashes

enter image description here


Following comments in the comments it appears you want something more like this, using the post_linebreak_filter to collect up lines after linebreaking.

enter image description here

\documentclass{article}
\usepackage{color}
\begin{document}

\def\foo{a supercalifragilisticexpialidocious one two three four
supercalifragilisticexpialidocious one two three four
apples oranges carrots bananas
difficult find finger floor Vouch}


\def\showlines{\directlua{tex.print(res)}}

\def\getlines#1#2{%
\directlua{
local N = node.id("hlist")
local G = node.id("glyph")
local D = node.id("disc")
local K = node.id("kern")
local GLUE = node.id("glue")
%
getlines = function (head)
res=''
for vnode in node.traverse(head) do
if vnode.id==N then
  res=res .. getlinetext(vnode.head)
end
end
return true
end
%
getlinetext = function (head)
local linetext=''
for n in node.traverse(head) do
 if n.id==G then
   if n.subtype==2 then
     for nn in node.traverse(n.components) do
      linetext = linetext  .. string.char(nn.char)
     end
   else
     linetext = linetext   .. string.char(n.char)
   end
 else if n.id==GLUE then
     linetext = linetext .. ' '
 else if n.id==D then
     if n.replace \string~= nil then
       for nn in node.traverse(n.replace) do
        if nn.char \string~= nil then
          if nn.char==14 then
            linetext = linetext .. 'ffi' % OT1 encoding specific
          else
            linetext = linetext .. string.char(nn.char)
           end
         end
       end
     end
 end
 end
 end
end
texio.write_nl('\string\\textline{' .. linetext .. '}')
return '\string\\textline{' .. linetext .. '}'
end 
luatexbase.add_to_callback('post_linebreak_filter',getlines,'get lines')
}%
\vbox{\hsize#1\relax#2}% show typeset result to check: add \setbox0= to hide
\directlua{
luatexbase.remove_from_callback('post_linebreak_filter','get lines')
}%
}



\getlines{5cm}{\foo}

produces the following linebreaks

\bigskip

\def\textline#1{\texttt{#1}\par}

\showlines

\end{document}
7
  • Thanks a lot, but I can't understand what do I do next. In your example, I would need to specify some \HypWidth parameter (say 15), and then get something like \HypResult1 -> contains "supercalifrag-"; \HypResult2 -> contains "ilisticexpiali-"; \HypResult3 -> contains "docious". I guess I could do some further processing, like using xparse, but then I'll need to count the length of each syllable, add them up, concatenate those whose total length is less then 15 (removing \HYhyphens), leave hyphens at the end of each line, and then assign each line to a new variable.
    – ScumCoder
    Commented Oct 15, 2016 at 12:25
  • @ScumCoder basically I just addressed this part of your question "I just need to make use of the standard hyphenation function of LaTeX, except "behind the scenes", without actually typesetting anything." which is exactly what this does, it gives you all the potential break points that the hyphenation algorithm finds as tex macros. to do the line breaking are you assuming monospace fonts in which case you could just add syllable at a time, Commented Oct 15, 2016 at 17:00
  • @ScumCoder otherwise you didn't really want the hyphenation algorithm but the linebreaking algorithm, in which case you want a different lua callback after linebreaking and then do as above but collect each line back as text. post_linebreak_filter not pre_linebreak_filter Commented Oct 15, 2016 at 17:01
  • you are absolutely right, I didn't express myself properly, sorry for that. What I meant by "hyphenation" was actually linebreaking.
    – ScumCoder
    Commented Oct 15, 2016 at 19:56
  • @ScumCoder actually I'm sort of interested in experimenting with luatex in this area so i may add something over the weekend (no promises:-) Commented Oct 15, 2016 at 20:37
1

It is possible to access to the word parts after complete hyphenation has been performed, but only in the form of an already typeset box, which is pretty much what \showhyphens in xltxtra or the macros in the testhyphens package do. However these word parts cannot be retrieved at the token processing stage, because hyphenation happens when paragraphs are being split into lines and there's no user control on that (other than for setting parameters).

One might imagine a two stage pass, where the word parts are written in the log file via a \showhyphens like macro; a filter in Perl or similar language might retrieve the word parts from the log file between LaTeX runs.

Maybe this is possible with LuaTeX, it would be an interesting research project.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .