Reference

Class

RiTa

Name

tokenize

Description Tokenizes a string (into words) according to the Penn Treebank conventions.
Example

sentence = "The doctors treated dogs.";
words = RiTa.tokenize(sentence);
words = RiTa.tokenize(sentence, { regex: "\\s" });

Parameters
Stringthe input
Object
(or Map<String, Object> in Java)
options (optional) the relevant options for the function:

{boolean} options.splitContractions:
Convert contractions (e.g., "I'd" or "she'll") into multiple individual tokens

{string or regex} options.regex:
Regular expression to use for splitting the input

Returns
String[]Array in which each element is a single token (generally a word or single punctuation character)
Syntax
RiTa.tokenize(text);
RiTa.tokenize(text, options);
Platform Java / JavaScript