Class |
RiTa
|
Name |
tokenize
|
Description |
Tokenizes a string (into words) according to the Penn Treebank conventions.
|
Example |
sentence = "The doctors treated dogs."; words = RiTa.tokenize(sentence); words = RiTa.tokenize(sentence, { regex: "\\s" });
|
Parameters |
String | the input |
---|
Object (or Map<String, Object> in Java) | options (optional) the relevant options for the function:
{boolean} options.splitContractions: Convert contractions (e.g., "I'd" or "she'll") into multiple individual tokens
{string or regex} options.regex: Regular expression to use for splitting the input
|
---|
|
Returns |
String[] | Array in which each element is a single token (generally a word or single punctuation character) |
---|
|
Related |
RiTa.tokens() RiTa.untokenize() |
Note |
tmp_note |
Syntax |
RiTa.tokenize(text); RiTa.tokenize(text, options);
|
Platform |
Java / JavaScript |
|
|