Reference

Class

RiTa

Name

tokens

Description Return an array containing all unique alphabetical words (tokens) in the text.
Example

sentence = "A small one is like a big one.";
tokens = RiTa.tokens(sentence);

-> [ 'a', 'small', 'one', 'is', 'like', 'big', '.' ]


sentence = "One had escaped, she'd thought.";
tokens = RiTa.tokens(sentence, { splitContractions: true });

-> [ 'one', 'had', 'escaped', ',', 'she', 'had', 'thought', '.' ]

Parameters
Stringthe input
Object
(or Map in Java)
options (optional), the relevant options for the function:

{String or Regex} options.regex:
customized regex for the tokenization,

{boolean} options.splitContractions:
Convert contractions (e.g., "I'd" or "she'll") into multiple individual tokens

{boolean} options.includePunct:
Include punctuation tokens in the output

{boolean} options.caseSensitive:
Treat differently cased strings as separate tokens

{boolean} options.ignoreStopWords:
Ignore words like 'the', 'and', 'a', 'of', etc, as specified in RiTa.STOP_WORDS

{boolean} options.sort:
Return the results in sorted order

Returns
String[]the set of unique alphabetical tokens in the text
Syntax
RiTa.tokens(text);
RiTa.tokens(text, options);
Platform Java / JavaScript