TODO

Path: TODO
Last Update: Fri Dec 08 15:34:11 +0000 2023

TODO

  • C
    • IMPORTANT: + FIX file descriptor overflow. See Tickets 341 and 343
    • add .. operator to query parser. For example, [100 200] could be written as 100..200 or 100...201 like in Ruby Ranges
    • remove exception handling from C code. All errors to be handled by return values.
    • Move to sqlite‘s locking model. Ferret should work fine in a multi-process environment.
    • Add optional logging. To be enabled at compilation time, perhaps?
    • Add support for changing zlib and bzlib compression parameters
    • Improve unit test coverage to 100%
    • Add benchmark suite
    • Add Rakefile for development purposes + task to publish gcov and benchmark results to ferret wiki
    • Index rebuilding of old versioned indexes.
    • Add a globally accessable, threadsafe symbol table. This will be very useful for storing field names so that no objects need to strdup the field-names but can just store the symbol representative instead. + this has been done but it can be improved using actual Symbol structs
        instead of plain char*
      
    • Make threading optional at compile time
    • to_json should limit output to prevent memory overflow on large indexes. Perhaps we could use some type of buffered read for this.
    • Make BitVector run as fast as bitset from C++ STL. See;
        c/benchmark/bm_bitvector.c
      
    • Add a symbol table for field names. This will mean that we won‘t need to worry about mallocing and freeing field names which happens all over the place.
    • Divide the headers into public and private (the private headers to be stored in the src directory).
    • Group-by search. ie you should be able to pass a field to group search results by
    • Auto-loading of documents during search. ie actual documents get returned instead of document numbers.
  • Ruby bindings
    • argument checking for every method. We need a new api for argument checking so that the arguments get checked at the start of each method that could cause a segfault.
    • improve memory management. It was way to complex at the moment. I also need to document how it works so that other developers understand what is going on.
    • Replace Data_Wrap_Struct with ferret alternative which handles rewrapping of structs automatically and also knows when to release a struct by using refcounting.
  • Ruby
    • integrate rcov
    • improve unit test coverage to 100%
  • Documentation.
    • generate Ruby binding documentation with custom build template similar jaxdoc rubyforge.org/projects/jaxdoc
    • all documentation should meet DOCUMENTATION_STANDARDS
    • documentation in C code to be generated by doxygen

Someday Maybe

=======
  • apply for Google Summer of Code 2009
  • optimize read and write vint
    • test the following outside of ferret before implementing
    • perform a binary scan using bit-wise or to find out how many bytes need to be written
    • if the write/read will overflow the buffer, split it into two, refreshing the buffer in between
    • use Duff‘s device to write bytes now that we know how many we need
  • add a super fast language based dictionary compression
  • add portable stacktrace function. Perhaps implement as an external library.
  • investigate unscored searching
  • user defined sorting
  • Fix highlighting to work for external fields
  • investigate faster string hashing method

Done

  • add rake install task
  • FIX :create parameter so that it only deletes the files owned by Ferret.
  • fix compression. Currently nothing is happening if you set a field to :compress. I guess we‘ll just assume zlib is installed, as I think it has to be for Ruby to be installed.
  • add bzlib support
  • integrate gcov
  • add a field cache to IndexReader
  • setup email alerts for svn commits
  • Ranged, unordered searching. Ie search through the index until you have the required number of documents and then break. This will require the ability to start searches from a particular doc-num. + See searcher_search_unordered in the C code and Searcher#scan in Ruby
  • improve unit test code. I‘d like to implement some way to print out a stack trace when a test fails so that it is easy to find the source of the error.
  • catch segfaults and print stack trace so users can post helpful bug tickets. again, see the same links for adding stacktrace to unit tests.
  • Add string Sort descripter
  • fix memory bug
  • add MultiReader interface
  • add lexicographical sort (byte sort)
  • Add highlighting
  • add field compression
  • Fix highlighting to work for compressed fields
  • Add Ferret::Index::Index
  • Fix: + Working Query: field1:value1 AND NOT field2:value2 + Failing Query: field1:value1 AND ( NOT field2:value2 )
  • update benchmark suite to use getrusage

[Validate]