More Postgres - Full Text Search

Hello! You may know from some of my previous posts that we use Postgres Full Text Search to power search functionality on one of our services.

Materialised Views

We've changed the way we maintain and update the search index due to performance issues with the materialised view based approach we took the first time round. Previously we'd denormalise the TSVECTOR columns that are stored on various tables in the DB into a materialised view. The materialised view would then be refreshed at regular intervals to keep it fresh.

The benefits of this are that it requires very little SQL to set up and maintain, providing a workable solution at the start in a short amount of time. However as the DB grew in size and the materialised views grew in complexity and requirement we found the time taken to refresh was beyond acceptable.

Of particular importance is the blocking nature of a materialised view refresh: nothing can read from the view while it is being regenerated. A CONCURRENTLY option was added in Postgres 9.4 that allows the view to be used while it regenerates. Unfortunately the resources required to compute the new index was still too much to be doing periodically while users are using the service.

Triggers

Ultimately we decided to turn our materialised view into a regular view, ensuring appropriate indices on the underlying tables for performance. This view is then queried as part of the initial data load and the result stored into a table that then acts as the search index. Triggers are assigned to the tables that underlie the view to update only the changed rows of the index on UPDATE or INSERT.

The downside to this is a lot of Postgres UDFs (or stored procedures) that have to be maintained and tested. However this is outweighed by the benefits of an "instantly" updating view and a far more performant DB in general due to less background reindexing.

From our experience using both approaches, we'd recommend strongly considering whether your use case is appropriate for materialised views. At first they seem a quick and simple way of computing and updating a result, but quickly become expensive and tricky to update without negatively affecting end-users.

An example of such a trigger, featuring the wonderful syntax of PLPGSQL:

CREATE OR REPLACE FUNCTION updatescenesearchtextinvolvement()
  RETURNS TRIGGER AS
  $BODY$
  DECLARE
    updatesceneid INTEGER;
  BEGIN
    CASE TGOP
      WHEN 'DELETE'
      THEN
        updatesceneid := OLD.sceneid;
    ELSE
      updatesceneid := NEW.sceneid;
    END CASE;
    DELETE FROM scenesearchtext
    WHERE sceneid = updatesceneid;
    INSERT INTO scenesearchtext
      SELECT *
        FROM scenesearchview
       WHERE sceneid = updatescene_id;
    RETURN NULL;
  END; $BODY$
LANGUAGE plpgsql VOLATILE;

As you can see, this code will only update the rows that have the same scene_id as the record that has changed.

In general we've found Postgres' FTS capabilities to be quite sufficient for our use case. Hopefully these posts are helpful if you are trying to implement a search API using Postgres. Thanks for reading :)

Originally posted on MetaBroadcast's blog.