More Postgres - Full Text Search

Hello! You may know from some of my previous posts that we use Postgres Full Text Search to power search functionality on one of our services.

Materialised Views

We've changed the way we maintain and update the search index due to performance issues with the materialised view based approach we took the first time round. Previously we'd denormalise the TSVECTOR columns that are stored on various tables in the DB into a materialised view. The materialised view would then be refreshed at regular intervals to keep it fresh.

The benefits of this are that it requires very little SQL to set up and maintain, providing a workable solution at the start in a short amount of time. However as the DB grew in size and the materialised views grew in complexity and requirement we found the time taken to refresh was beyond acceptable.

Of particular importance is the blocking nature of a materialised view refresh: nothing can read from the view while it is being regenerated. A CONCURRENTLY option was added in Postgres 9.4 that allows the view to be used while it regenerates. Unfortunately the resources required to compute the new index was still too much to be doing periodically while users are using the service.

Triggers

Ultimately we decided to turn our materialised view into a regular view, ensuring appropriate indices on the underlying tables for performance. This view is then queried as part of the initial data load and the result stored into a table that then acts as the search index. Triggers are assigned to the tables that underlie the view to update only the changed rows of the index on UPDATE or INSERT.

The downside to this is a lot of Postgres UDFs (or stored procedures) that have to be maintained and tested. However this is outweighed by the benefits of an "instantly" updating view and a far more performant DB in general due to less background reindexing.

From our experience using both approaches, we'd recommend strongly considering whether your use case is appropriate for materialised views. At first they seem a quick and simple way of computing and updating a result, but quickly become expensive and tricky to update without negatively affecting end-users.

An example of such a trigger, featuring the wonderful syntax of PLPGSQL:

CREATE OR REPLACE FUNCTION updatescenesearchtextinvolvement()
  RETURNS TRIGGER AS
  $BODY$
  DECLARE
    updatesceneid INTEGER;
  BEGIN
    CASE TGOP
      WHEN 'DELETE'
      THEN
        updatesceneid := OLD.sceneid;
    ELSE
      updatesceneid := NEW.sceneid;
    END CASE;
    DELETE FROM scenesearchtext
    WHERE sceneid = updatesceneid;
    INSERT INTO scenesearchtext
      SELECT *
        FROM scenesearchview
       WHERE sceneid = updatescene_id;
    RETURN NULL;
  END; $BODY$
LANGUAGE plpgsql VOLATILE;

As you can see, this code will only update the rows that have the same scene_id as the record that has changed.

In general we've found Postgres' FTS capabilities to be quite sufficient for our use case. Hopefully these posts are helpful if you are trying to implement a search API using Postgres. Thanks for reading :)

Originally posted on MetaBroadcast's blog.

Java 8 Method References

The following is a fairly reasonable block of code that one might find in a project using Guava.

ImmutableList.copyOf(Iterables.limit(Iterables.filter(Iterables.transform(theIterable, new Function<Object, Object>() {
            @Override
            public Object apply(Object input) {
                return changeThing(input);
            }
        }), new Predicate<Object>() {
            @Override
            public boolean apply(Object input) {
                return testThing(input);
            }
        }), 5));

It's fairly hard to tell just from glancing at this code what it is actually doing. First it's transforming a group of Objects into different Objects, then it's filtering them based on the result of some predicate and then taking the first 5 results. Finally it forces the Iterables.doSomething()s to be evaluated by copying the result into an ImmutableList.

The main problem for this code's readability is those pesky anonymous classes for the Predicate and Function. The problem could be ameliorated by extracting them into constants or, if that's not possible static factory methods. However that is just moving the problem elsewhere. There is one line of code in each of those anonymous classes that we actually care about.

Lambdas and Stream can help things...

ImmutableList<Object> result = theIterable.stream()
                .map(s -> changeThing(s))
                .filter(s -> testThing(s))
                .limit(5)
                .collect(ImmutableCollectors.toList());

So that's turned 10 lines of code into 5. However s -> changeThing(s) this still feels like it could be less verbose, doesn't it? Stream.map() and filter() expects a Function and a Predicate to call respectively, which are both @FunctionalInterfaces and expose a single method. In which case, why are we telling Java how to use something if there is only one way to use it?

Method References

ImmutableList<Object> result = theIterable.stream()
                .map(this::changeThing)
                .filter(this::testThing)
                .limit(5)
                .collect(ImmutableCollectors.toList());

Rather than telling the Stream how to use the method, we simply say there exists a method in this class called changeThing and testThing, go use them.

Some more examples...

Iterables.transform(theIterable, this::changeThing);
Iterables.filter(theIterable, MyOtherClass::staticMethod);
Iterables.transform(theIterable, MyOtherClass::new);
Iterables.transform(theIterable, thingInstance::instanceMethod);
Multimaps.index(theIterable, keyMapper::getThing);

This is useful because it means we don't need to create an anonymous class just to invoke an existing method, we can simply reference it to use it as a lambda expression. In cases where you are using an existing project with Guava that has been upgraded to use Java 8 and you don't want to mix Iterables and Stream this lets you take advantage of method references to make the code a little less verbose.

Originally posted on MetaBroadcast's blog.