29

I need to split a string into one or more substrings. I know that I could use the xstring package, but I'd like to do it using only inbuilt TeX/LaTeX commands. So, if I say

\def\MyTeXKnowledge{Not good enough}

what is the simplest way to extract the substrings "Not", "good" and "enough" from the macro \MyTexKnowledge and store them in variables?

4 Answers 4

34

You need to define a macro which has the separation character in the parameter text:

\def\testthreewords#1{\threewords#1\relax}
\def\threewords#1 #2 #3\relax{ First: (#1), Second: (#2), Third: (#3) }
\testthreewords{Now good enough}

If you want to be able to provide a macro as argument you need to expand it first. This can be either done once (only first macro is expanded once):

\def\testthreewords#1{\expandafter\threewords#1\relax}

or completely:

\def\testthreewords#1{%
    \begingroup
    \edef\@tempa{#1}%
    \expandafter\endgroup
    \expandafter\threewords\@tempa\relax
}

The \relax here is used as an end marker and must not occur in the argument, otherwise a different macro should be used, like \@nnil. The grouping is added to keep the temporary definitions local.

However this setup fails with an error if the two spaces are not included in the argument. To be on the safe side you should read every substring on its own and add the separation character to the end as a fail-safe. Then you test if the end was reached:

\def\testwords#1{%
    \begingroup
    \edef\@tempa{#1\space}%
    \expandafter\endgroup
    \expandafter\readwords\@tempa\relax
}
\def\readwords#1 #2\relax{%
      \doword{#1}%  #1 = substr, #2 = rest of string
      \begingroup
      \ifx\relax#2\relax  % is #2 empty?
         \def\next{\endgroup\endtestwords}% your own end-macro if required
      \else
         \def\next{\endgroup\readwords#2\relax}%
      \fi
      \next
}
\def\doword#1{(#1)}
\def\endtestwords{}


\testwords{Now good enough}% Gives `(Now)(good)(enough)`
\testwords{Now good}% Gives `(Now)(good)`

 

4
  • thanks. With your help I have achieved the result that I needed. I think a modification is needed before I accept your answer, though. If I define \MyTeXKnowledge as above and then say \testthreewords{\MyTeXKnowledge} I get an error (presumably because \MyTeXKnowledge counts as only one argument). Commented Mar 6, 2011 at 22:49
  • @Ian: I wasn't sure about the exact interface you want to use. You need to expand the macro first. I will update my answer. Commented Mar 6, 2011 at 22:53
  • 1
    I have edited the wording of my question to clarify. Commented Mar 6, 2011 at 23:09
  • 2
    Any chance of a MWE? This is such a good example - but rather complicated for a newbie like myself. An MWE will allow me to play around with it and help me understand it!
    – 3kstc
    Commented Jul 24, 2019 at 4:40
9

Another way : the words are stocked in macros \worda \wordb etc.

\documentclass[a4paper]{article}  

\newcount\nbofwords
\makeatletter  
\def\myutil@empty{}
\def\multiwords#1 #2\@nil{% 
 \def\NextArg{#2}%
 \advance\nbofwords by  1 %   
 \expandafter\edef\csname word\@alph\nbofwords\endcsname{#1}% 
 \ifx\myutil@empty\NextArg
     \let\next\@gobble
 \fi
 \next#2\@nil
}%    

\def\GetWords#1{%
   \let\next\multiwords 
   \nbofwords=0 %
   \expandafter\next#1 \@nil %
}% 
\makeatother

\begin{document}
 \def\MyTeXKnowledge{Not good  enough the end}
\GetWords{\MyTeXKnowledge}

There are \the\nbofwords\  words:  \worda; \wordb; \wordc;\wordd;\worde.

\end{document} 

Now \MyTeXKnowledgeis accepted.

2
  • @Ian: Now \MyTeXKnowledgeis accepted and the substrings from the macro \MyTexKnowledge are stored in variables. Commented Mar 7, 2011 at 7:48
  • @AlainMatthes: I really like this solution since we can use it for an arbitrary number of words, right? I'm just learning TeX but will try to understand better this code since it appears to solve some other question I posted. Commented May 12, 2014 at 15:58
3

As of 2023 there are other options. E.g. using the expl3 programming environment:

\documentclass{book}

\ExplSyntaxOn
\NewDocumentCommand{\getNth}{mmm}
  {
    % #1 string, #2 separator, #3 index
    \seq_set_split:Nnx \l_tmpa_seq { #2 } { #1 }
    \seq_item:Nn \l_tmpa_seq { #3 }
  }
\ExplSyntaxOff

\begin{document}

\def\mywords{first second third last}

% split by spaces and get the first item
\getNth{\mywords}{ }{1}

% split by spaces and get the last item
\getNth{\mywords}{ }{-1}

\end{document}

outputs

first
last

Or if you want to apply a function to every item:

\documentclass{book}

\ExplSyntaxOn
\NewDocumentCommand{\mapToFunction}{m}
  {
    % split by space "~"
    \seq_set_split:Nnx \l_tmpa_seq { ~ } { #1 }
    \seq_map_indexed_function:NN \l_tmpa_seq \__xyz_myfunction:nn
  }
\cs_new:Nn \__xyz_myfunction:nn
  {
    % #1 is the 1-based index and #2 is the current item
    % if necessary check the index with \int_compare, \int_case, or \bool_case
    % do something
    \par #1~#2 
  }
\ExplSyntaxOff

\begin{document}

\def\mywords{first second third last}

\mapToFunction{\mywords}

\end{document}

outputs

1 first
2 second
3 third
4 last
1
  • I'd not use x expansion, but possibly o; however, I'd prefer a different approach with a *-version that takes as argument a control sequence. There's large room for improvements.
    – egreg
    Commented Oct 27, 2023 at 12:06
2

Inspired by wolfrevo's attempt:

\documentclass{article}

\ExplSyntaxOn
% the prefix is `clint' because of the OP's avatar

\NewDocumentCommand{\definechunkcontainer}{s m O{~} m}
 {% #1 = boolean
  % #2 = symbolic name
  % #3 = separator (default a space)
  % #4 = text or control sequence
  \IfBooleanTF { #1 }
   {
    \clint_chunk_define:onnn { #4 } { #2 } { #3 }
   }
   {
    \clint_chunk_define:nnnn { #4 } { #2 } { #3 }
   }
 }

\NewExpandableDocumentCommand{\getchunk}{o m}
 {% #1 = chunk number; if omitted we get the number of chunks
  % #2 = symbolic name
  \IfNoValueTF { #1 }
   {
    \seq_count:c { l__clint_chunk_#2_seq }
   }
   {
    \seq_item:cn { l__clint_chunk_#2_seq } { #1 }
   }
 }

\NewDocumentCommand{\processchunks}{m o +m}
 {% #1 = symbolic name
  % #2 = optional tokens to be inserted between chunks
  % #3 = template where #1 stands for the chunk number and #2 for the chunk
  \IfNoValueTF { #2 }
   {% easier processing
    \clint_chunk_process:nn { #1 } { #3 }
   }
   {% more complex processing
    \clint_chunk_process:nnn { #1 } { #2 } { #3 }
   }
 }

\seq_new:N \l__clint_chunk_temp_seq
\cs_generate_variant:Nn \seq_set_split:Nnn { c }
\cs_generate_variant:Nn \seq_map_indexed_function:NN { c }
\cs_generate_variant:Nn \seq_map_indexed_inline:Nn { c }

\cs_new_protected:Nn \clint_chunk_define:nnnn
 {% #1 = text to be split
  % #2 = symbolic name
  % #3 = separator
  \seq_clear_new:c { l__clint_chunk_#2_seq }
  \seq_set_split:cnn { l__clint_chunk_#2_seq } { #3 } { #1 }
 }
\cs_generate_variant:Nn \clint_chunk_define:nnnn { o }

\cs_new_protected:Nn \clint_chunk_process:nn
 {
  \cs_set:Nn \__clint_chunk_process_do:nn { #2 }
  \seq_map_indexed_function:cN { l__clint_chunk_#1_seq } \__clint_chunk_process_do:nn
 }

\cs_new_protected:Nn \clint_chunk_process:nnn
 {
  \seq_clear:N \l__clint_chunk_temp_seq
  \cs_set:Nn \__clint_chunk_process_do:nn { #3 }
  \seq_map_indexed_inline:cn { l__clint_chunk_#1_seq }
   {
    \seq_put_right:Nn \l__clint_chunk_temp_seq { \__clint_chunk_process_do:nn { ##1 } { ##2 } }
   }
  \seq_use:Nn \l__clint_chunk_temp_seq { #2 }
 }

\ExplSyntaxOff

\begin{document}

% a couple of containers
\definechunkcontainer{myTeXknowledge}{not good enough}

\newcommand{\gbu}{The Good -- The Bad -- The Ugly}

\definechunkcontainer*{movie}[--]{\gbu}

% now let's test

\getchunk{myTeXknowledge} (expected: 3)

\getchunk[2]{myTeXknowledge} (expected: good)

\getchunk[3]{movie} (expected: The Ugly)

\processchunks{myTeXknowledge}{#1: #2\par}

\processchunks{movie}[/]{#2}

\begin{itemize}
\processchunks{movie}{\item[#1)] #2}
\end{itemize}

\begin{enumerate}
\processchunks{movie}{\item #2}
\end{enumerate}

\end{document}

enter image description here

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .