Reference

Class

RiTa

Name

RiTa.tokenize

Description Tokenizes a string (into words) according to Penn Treebank conventions
See: ftp://ftp.cis.upenn.edu/pub/treebank/public_html/tokenization.html
Syntax
RiTa.tokenize(text);
RiTa.tokenize(text, regex);
Parameters
Stringthe input
String OR Regex
(in JS)
regex (optional) the pattern to be used for tokenization
Returns
String[]in which each element is a single token (or word)
Example

sentence = "The doctors treated dogs";
wordArray = RiTa.tokenize(sentence);
wordArray = RiTa.tokenize(sentence, "\\s");

Platform Java / Javascript