Reference

Class

RiTa

Name

tokenize

Description Tokenizes a string (into words) according to the Penn Treebank conventions.
Example

sentence = "The doctors treated dogs.";
words = RiTa.tokenize(sentence);
arrayOfWords = RiTa.tokenize(sentence, "\\s");

Parameters
Stringthe input
String OR Regex
(in JS)
regex (optional) the pattern to be used for tokenization
Returns
String[]in which each element is a single token (or word)
Syntax
RiTa.tokenize(text);
RiTa.tokenize(text, regex);
Platform Java / JavaScript