More Postgres - Full Text Search
14 May 2015Hello! You may know from some of my previous posts that we use Postgres Full Text Search to power search functionality on one of our services.
Materialised Views
We've changed the way we maintain and update the search index due to performance issues with the materialised view based approach we took the first time round. Previously we'd denormalise the TSVECTOR
columns that are stored on various tables in the DB into a materialised view. The materialised view would then be refreshed at regular intervals to keep it fresh.
The benefits of this are that it requires very little SQL to set up and maintain, providing a workable solution at the start in a short amount of time. However as the DB grew in size and the materialised views grew in complexity and requirement we found the time taken to refresh was beyond acceptable.
Of particular importance is the blocking nature of a materialised view refresh: nothing can read from the view while it is being regenerated. A CONCURRENTLY
option was added in Postgres 9.4 that allows the view to be used while it regenerates. Unfortunately the resources required to compute the new index was still too much to be doing periodically while users are using the service.
Triggers
Ultimately we decided to turn our materialised view into a regular view, ensuring appropriate indices on the underlying tables for performance. This view is then queried as part of the initial data load and the result stored into a table that then acts as the search index. Triggers are assigned to the tables that underlie the view to update only the changed rows of the index on UPDATE
or INSERT
.
The downside to this is a lot of Postgres UDFs (or stored procedures) that have to be maintained and tested. However this is outweighed by the benefits of an "instantly" updating view and a far more performant DB in general due to less background reindexing.
From our experience using both approaches, we'd recommend strongly considering whether your use case is appropriate for materialised views. At first they seem a quick and simple way of computing and updating a result, but quickly become expensive and tricky to update without negatively affecting end-users.
An example of such a trigger, featuring the wonderful syntax of PLPGSQL:
As you can see, this code will only update the rows that have the same scene_id
as the record that has changed.
In general we've found Postgres' FTS capabilities to be quite sufficient for our use case. Hopefully these posts are helpful if you are trying to implement a search API using Postgres. Thanks for reading :)
Originally posted on MetaBroadcast's blog.