How do I split a string?

Question

I need to split a string into one or more substrings. I know that I could use the xstring package, but I'd like to do it using only inbuilt TeX/LaTeX commands. So, if I say

\def\MyTeXKnowledge{Not good enough}

what is the simplest way to extract the substrings "Not", "good" and "enough" from the macro \MyTexKnowledge and store them in variables?

Jing Li · Accepted Answer · 2019-06-08 10:28:53Z

You need to define a macro which has the separation character in the parameter text:

\def\testthreewords#1{\threewords#1\relax}
\def\threewords#1 #2 #3\relax{ First: (#1), Second: (#2), Third: (#3) }
\testthreewords{Now good enough}

If you want to be able to provide a macro as argument you need to expand it first. This can be either done once (only first macro is expanded once):

\def\testthreewords#1{\expandafter\threewords#1\relax}

or completely:

\def\testthreewords#1{%
    \begingroup
    \edef\@tempa{#1}%
    \expandafter\endgroup
    \expandafter\threewords\@tempa\relax
}

The \relax here is used as an end marker and must not occur in the argument, otherwise a different macro should be used, like \@nnil. The grouping is added to keep the temporary definitions local.

However this setup fails with an error if the two spaces are not included in the argument. To be on the safe side you should read every substring on its own and add the separation character to the end as a fail-safe. Then you test if the end was reached:

\def\testwords#1{%
    \begingroup
    \edef\@tempa{#1\space}%
    \expandafter\endgroup
    \expandafter\readwords\@tempa\relax
}
\def\readwords#1 #2\relax{%
      \doword{#1}%  #1 = substr, #2 = rest of string
      \begingroup
      \ifx\relax#2\relax  % is #2 empty?
         \def\next{\endgroup\endtestwords}% your own end-macro if required
      \else
         \def\next{\endgroup\readwords#2\relax}%
      \fi
      \next
}
\def\doword#1{(#1)}
\def\endtestwords{}


\testwords{Now good enough}% Gives `(Now)(good)(enough)`
\testwords{Now good}% Gives `(Now)(good)`

thanks. With your help I have achieved the result that I needed. I think a modification is needed before I accept your answer, though. If I define \MyTeXKnowledge as above and then say \testthreewords{\MyTeXKnowledge} I get an error (presumably because \MyTeXKnowledge counts as only one argument). — Ian Thompson, Commented Mar 6, 2011 at 22:49
@Ian: I wasn't sure about the exact interface you want to use. You need to expand the macro first. I will update my answer. — Martin Scharrer, Commented Mar 6, 2011 at 22:53
Any chance of a MWE? This is such a good example - but rather complicated for a newbie like myself. An MWE will allow me to play around with it and help me understand it! — 3kstc, Commented Jul 24, 2019 at 4:40

Alain Matthes · Accepted Answer · 2011-03-07 07:47:12Z

9

Another way : the words are stocked in macros \worda \wordb etc.

\documentclass[a4paper]{article}  

\newcount\nbofwords
\makeatletter  
\def\myutil@empty{}
\def\multiwords#1 #2\@nil{% 
 \def\NextArg{#2}%
 \advance\nbofwords by  1 %   
 \expandafter\edef\csname word\@alph\nbofwords\endcsname{#1}% 
 \ifx\myutil@empty\NextArg
     \let\next\@gobble
 \fi
 \next#2\@nil
}%    

\def\GetWords#1{%
   \let\next\multiwords 
   \nbofwords=0 %
   \expandafter\next#1 \@nil %
}% 
\makeatother

\begin{document}
 \def\MyTeXKnowledge{Not good  enough the end}
\GetWords{\MyTeXKnowledge}

There are \the\nbofwords\  words:  \worda; \wordb; \wordc;\wordd;\worde.

\end{document}

Now \MyTeXKnowledgeis accepted.

edited Mar 7, 2011 at 7:47

answered Mar 6, 2011 at 23:40

Alain Matthes

95.9k9 gold badges212 silver badges348 bronze badges

@Ian: Now \MyTeXKnowledgeis accepted and the substrings from the macro \MyTexKnowledge are stored in variables.
– Alain Matthes
Commented Mar 7, 2011 at 7:48
@AlainMatthes: I really like this solution since we can use it for an arbitrary number of words, right? I'm just learning TeX but will try to understand better this code since it appears to solve some other question I posted.
– Sergio Parreiras
Commented May 12, 2014 at 15:58

Add a comment |

wolfrevo · Accepted Answer · 2023-10-27 09:06:42Z

As of 2023 there are other options. E.g. using the expl3 programming environment:

\documentclass{book}

\ExplSyntaxOn
\NewDocumentCommand{\getNth}{mmm}
  {
    % #1 string, #2 separator, #3 index
    \seq_set_split:Nnx \l_tmpa_seq { #2 } { #1 }
    \seq_item:Nn \l_tmpa_seq { #3 }
  }
\ExplSyntaxOff

\begin{document}

\def\mywords{first second third last}

% split by spaces and get the first item
\getNth{\mywords}{ }{1}

% split by spaces and get the last item
\getNth{\mywords}{ }{-1}

\end{document}

outputs

first
last

Or if you want to apply a function to every item:

\documentclass{book}

\ExplSyntaxOn
\NewDocumentCommand{\mapToFunction}{m}
  {
    % split by space "~"
    \seq_set_split:Nnx \l_tmpa_seq { ~ } { #1 }
    \seq_map_indexed_function:NN \l_tmpa_seq \__xyz_myfunction:nn
  }
\cs_new:Nn \__xyz_myfunction:nn
  {
    % #1 is the 1-based index and #2 is the current item
    % if necessary check the index with \int_compare, \int_case, or \bool_case
    % do something
    \par #1~#2 
  }
\ExplSyntaxOff

\begin{document}

\def\mywords{first second third last}

\mapToFunction{\mywords}

\end{document}

outputs

1 first
2 second
3 third
4 last

I'd not use x expansion, but possibly o; however, I'd prefer a different approach with a *-version that takes as argument a control sequence. There's large room for improvements. — egreg, Commented Oct 27, 2023 at 12:06

egreg · Accepted Answer · 2023-10-27 12:58:39Z

Inspired by wolfrevo's attempt:

\documentclass{article}

\ExplSyntaxOn
% the prefix is `clint' because of the OP's avatar

\NewDocumentCommand{\definechunkcontainer}{s m O{~} m}
 {% #1 = boolean
  % #2 = symbolic name
  % #3 = separator (default a space)
  % #4 = text or control sequence
  \IfBooleanTF { #1 }
   {
    \clint_chunk_define:onnn { #4 } { #2 } { #3 }
   }
   {
    \clint_chunk_define:nnnn { #4 } { #2 } { #3 }
   }
 }

\NewExpandableDocumentCommand{\getchunk}{o m}
 {% #1 = chunk number; if omitted we get the number of chunks
  % #2 = symbolic name
  \IfNoValueTF { #1 }
   {
    \seq_count:c { l__clint_chunk_#2_seq }
   }
   {
    \seq_item:cn { l__clint_chunk_#2_seq } { #1 }
   }
 }

\NewDocumentCommand{\processchunks}{m o +m}
 {% #1 = symbolic name
  % #2 = optional tokens to be inserted between chunks
  % #3 = template where #1 stands for the chunk number and #2 for the chunk
  \IfNoValueTF { #2 }
   {% easier processing
    \clint_chunk_process:nn { #1 } { #3 }
   }
   {% more complex processing
    \clint_chunk_process:nnn { #1 } { #2 } { #3 }
   }
 }

\seq_new:N \l__clint_chunk_temp_seq
\cs_generate_variant:Nn \seq_set_split:Nnn { c }
\cs_generate_variant:Nn \seq_map_indexed_function:NN { c }
\cs_generate_variant:Nn \seq_map_indexed_inline:Nn { c }

\cs_new_protected:Nn \clint_chunk_define:nnnn
 {% #1 = text to be split
  % #2 = symbolic name
  % #3 = separator
  \seq_clear_new:c { l__clint_chunk_#2_seq }
  \seq_set_split:cnn { l__clint_chunk_#2_seq } { #3 } { #1 }
 }
\cs_generate_variant:Nn \clint_chunk_define:nnnn { o }

\cs_new_protected:Nn \clint_chunk_process:nn
 {
  \cs_set:Nn \__clint_chunk_process_do:nn { #2 }
  \seq_map_indexed_function:cN { l__clint_chunk_#1_seq } \__clint_chunk_process_do:nn
 }

\cs_new_protected:Nn \clint_chunk_process:nnn
 {
  \seq_clear:N \l__clint_chunk_temp_seq
  \cs_set:Nn \__clint_chunk_process_do:nn { #3 }
  \seq_map_indexed_inline:cn { l__clint_chunk_#1_seq }
   {
    \seq_put_right:Nn \l__clint_chunk_temp_seq { \__clint_chunk_process_do:nn { ##1 } { ##2 } }
   }
  \seq_use:Nn \l__clint_chunk_temp_seq { #2 }
 }

\ExplSyntaxOff

\begin{document}

% a couple of containers
\definechunkcontainer{myTeXknowledge}{not good enough}

\newcommand{\gbu}{The Good -- The Bad -- The Ugly}

\definechunkcontainer*{movie}[--]{\gbu}

% now let's test

\getchunk{myTeXknowledge} (expected: 3)

\getchunk[2]{myTeXknowledge} (expected: good)

\getchunk[3]{movie} (expected: The Ugly)

\processchunks{myTeXknowledge}{#1: #2\par}

\processchunks{movie}[/]{#2}

\begin{itemize}
\processchunks{movie}{\item[#1)] #2}
\end{itemize}

\begin{enumerate}
\processchunks{movie}{\item #2}
\end{enumerate}

\end{document}

Stack Exchange Network

How do I split a string?

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
macros
tex-core
strings
.

Linked

Hot Network Questions

How do I split a string?

4 Answers 4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged macrostex-corestrings.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
macros
tex-core
strings
.