-
Notifications
You must be signed in to change notification settings - Fork 7
Full text search
Relational databases are very good at what they're designed to do, but poor at many other things related to data that do not fit the relational model. Amongst such problematic areas is a full text search. Relational database systems provide only rudimental support for text search with wildcard and primitive normalisation (e.g. ignore upper and lower cases during search), which for mosts RDBMS systems results is full table scan -- simply unacceptable for producing search result in responsive manner.
Therefore, a more adequate solutions are needed. One of the requirements for such a solution in addition to perform full text search quickly, is its ability to integrated with an existing relational database.
A full text search mechanism preforms relevancy ranking for text based on the specified search criteria. The search result is usually stored in inverted index for fast retrieval. Basically, what the search mechanism does is indexes the data, which is in our case is stored in a relational database, and stores the result in a specialised data structure that is efficient specifically to perform very fast lookups that correspond to text search queries.
Naturally, there should be a background process that would perform such indexing of the data from a designated relational database. This means that there would always be a discrepancy (potentially insignificant) between the actually persisted data and the searchable data.
These two concepts are clearly partitioned from a system design perspective, with a search mechanism complementing the relational data. For example, the inventory management is done via the relational database, but the inventory searching is done using a full text search mechanism. Due to RDBMS's ACID properties, actions such as creation of a purchase order or a new inventory entry are processed through the relational database while flexible searches are handled by some full text mechanism.
There are many really cool and advanced algorithms to perform text relevancy ranking. However, instead of implementing something from scratch, at least at the beginning, an existing well proven technology such as Lucene Search should be used as the basis for incorporating a full text search into the platform.
Lucene is a set of Java libraries that implement a variety of search algorithms -- all listed on the referenced site. There are already products that are build on top of Lucene and provide additional services, including integration with relational databases.
Apache Solr is one of such products that is based on the Lucene engine. The communication link between Solr and the RDBMS from the indexing perspective can be handled by Solr’s DIH delta-import feature and a UNIX cron job to periodically invoke a Solr + DIH URL to index any changes or by pushing changes directly into Solr by POSTing updates.
In order to better understand the difference between Lucene (the engine) and Solr (the car) this site provides a comprehensive discussion.
Lucene provides a comprehensive query language to specify searches. This includes Wildcard Searches, Regular Expression, Fuzzy Searches and more
Per aspera ad astra
- Web UI Design and Web API
- Safe Communication and User Authentication
- Gitworkflow
- JavaScript: Testing with Maven
- Java Application Profiling
-
TG Development Guidelines
- TLS and HAProxy for development
- TG Development Checklist
- Entities and their validation
- Entity Properties
- Entity Type Enhancement
- EQL
- Tooltip How To
- All about Matchers
- Streaming data
- Synthetic entities
- Activatable entities
- Jasper Reports
- Opening Compound Master from another Compound Master
- Window management test plan
- Multi Time Zone Environment
- GraphQL Web API
- Guice
- Maven
- Full Text Search
- Deployment recipes
- Application Configuration
- JRebel Installation and Integration
- Compile-time mechanisms