Class Ferret::Analysis::WhiteSpaceTokenizer
In: ext/r_analysis.c
Parent: Ferret::Analysis::TokenStream

Summary

A WhiteSpaceTokenizer is a tokenizer that divides text at white-space. Adjacent sequences of non-WhiteSpace characters form tokens.

Example

  "Dave's résumé, at http://www.davebalmain.com/ 1234"
    => ["Dave's", "résumé,", "at", "http://www.davebalmain.com", "1234"]

Methods

new  

Public Class methods

Create a new WhiteSpaceTokenizer which optionally downcases tokens. Downcasing is done according the current locale.

lower:set to false if you don‘t wish to downcase tokens

[Validate]