Searchable Text Database

Open Source Options

  1. Full Text Search
    1. http://en.wikipedia.org/wiki/Full_text_search#Software
    2. http://www.mediawiki.org/wiki/Fulltext_search_engines
  2. Interesting Search in My Opinion
    1. Sphinx
    2. MySql Search
    3. Sql Server search
    4. Lucene and Elastic Search on top of Lucene.
  3. Full Text search comparison
    1. http://full-text-search.findthebest.com/
    2. A very nice comparison http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/
    3. http://taschenorakel.de/mathias/2012/04/18/fulltext-search-benchmarks/
    4. http://www.dbbest.com/blog/lucene-vs-sql-server-fts/
    5. http://beerpla.net/2009/09/03/comparison-between-solr-and-sphinx-search-servers-solr-vs-sphinx-fight/
  4. Sphinx http://sphinxsearch.com/
    1. I have personally used Sphinx with a Ruby on Rails project by installing sphinx in the background, installing a gem to interact with sphinx, defining which attributes in the model file to index, how to do searching, using delta index to speed up the process by using a delayed gem which would make a local copy of the change and when the index was updated after some period sometimes days sometimes after a week (done through cron job) then it would move the delta changes to the full index. I found the sphinx server easy to use once i got the hang of it. The delta index is normally smaller then the full index and is often the most recent changes which have not been integrated into the full index. This is normally used to avoid updating the whole index(as re-indexing is a time consuming process which can take long time based on the index size). 
    2. http://en.wikipedia.org/wiki/Sphinx_(search_engine)
    3. Can be used as stand-alone or with  MySQLMariaDB and PostgreSQL, or by using ODBC with ODBC-compliant DBMS’s
    4. Sphinx latest release download http://sphinxsearch.com/downloads/release/
    5. Documentation 
    6. Support for many programming languages integration and highly scalable.
    7. Has a lot of features related to natural language processing like using stopwords, tokenization etc.
    8. Note that the original contents of the fields are not stored in the Sphinx index. The text that you send to Sphinx gets processed, and a full-text index (a special data structure that enables quick searches for a keyword) gets built from that text. But the original text contents are then simply discarded. Sphinx assumes that you store those contents elsewhere anyway
    9. There are multiple modes of searching which can be found
    10. http://stackoverflow.com/questions/737275/comparison-of-full-text-search-engine-lucene-sphinx-postgresql-mysql
  5. MySql Full Text Search 
    1. Modes of search:
      1. A boolean search interprets the search string using the rules of a special query language
      2. A natural language search interprets the search string as a phrase in natural human language
      3. A query expansion search is a modification of a natural language search
  6. Sql Server Full Text Search http://technet.microsoft.com/en-us/library/ms142571.aspx
    1. The beginning of the article give overview of text search, functionality, architecture, and modes of searching.
    2. Interesting section on this page are the related tasks at the end which gives more detail on how exactly to do the search. The most helpful article is the first one on how to get started with full text search http://technet.microsoft.com/en-us/library/ms142497.aspx
  7. http://lucene.apache.org/solr/ Apache Solr/Lucene
    1. Rest Api
    2. Stand alone
    3. Tutorial http://lucene.apache.org/solr/4_6_0/tutorial.html
  8. Interesting Project on top of Lucene http://www.elasticsearch.org/overview/
    1. Interesting because it supports real time analytics and real time search, document oriented, restful(like lucene) and full text search
  9. BaseX http://basex.org/
    1. Xml Database with full text search using XPath for search.
  10. Datapark search http://www.dataparksearch.org/ for search within a website or group or intarnet
    1. Documentation http://www.dataparksearch.org/index.en.html
  11. ht://Dig http://www.htdig.org/
  12. Apache Lucy http://lucy.apache.org/
    1. Loose C port for Lucene(Java search engine)
    2. Full Text Search
  13. Lemur Project http://www.lemurproject.org/
  14. Search for Websites http://www.searchdaimon.com/
  15. http://swish-e.org/ Swish-e