Recently, I was tasked with improving the performance of a CRM's search capabilities with SQL, that is, without going through the setup of ElasticSearch or Algolia.

The CRM's pagination was based Symfony's PagerFanta Bundle. It seems that optimizing this bundle for use with large table is a somewhat common issue.

In our case, the problematic queries took up to 15 seconds sometimes, for a simple paginated list of items. So, I dug into some SQL and got the query time down to 2 seconds. Here are some of the things I learned.

Use IN instead of = xxx OR

This one is simple enough. Using WHERE x IN (a, b, c) is faster than using WHERE x = a OR x = b OR x = c.

Symfony makes this simple with the query builder. Let's say we want to list users that are banned or suspended, we could do something like this:

$statuses = [
  1, // banned
  2, // suspended
];

$qb->andWhere($qb->expr()->in('user.status', $statuses));

In such a case, having an index on user.status helps too (see "Indexes" further down).

When joining, put ORDER BY in subquery

In SQL, at least in MariaDB, there is a performance issue affecting "late row lookups" and can be worked around by putting any ORDER BY clauses inside a subquery.

To me that sounded counter-intuitive at first, but the subquery does actually speed things up.

The idea is to change something like:

SELECT my_user.id, my_cars.id
FROM my_user
JOIN cars ON my_cars.user_id = my_user.id
ORDER BY my_user.id DESC

To something like:

SELECT DISTINCT my_user.id, my_cars.id
FROM
    (
        SELECT ordered_user.id
        FROM my_user ordered_user
        ORDER BY ordered_user.id DESC
        LIMIT 999999999
    ) ordered_user
JOIN my_user ON my_user.id = ordered_user.id
JOIN my_cars ON my_cars.user_id = my_user.id

The LIMIT 999999999 is required because MariaDB will only use ORDER BY in a subquery if a LIMIT is also provided.

Avoid prefix wildcards with LIKE

When running an SQL query with a prefix wildcard, indexes are not used.

Let's take this query as an example:

SELECT * FROM my_user WHERE email LIKE '%foobar@gmail.com%'

With this, even if the email column is indexed, the index will not be used. With a large table, this query will be slow.

Instead, what I've found is that more often than not, there are ways to query non prefix wildcards. For instance, in a CRM, it is often the case that a customer service representative will copy paste the email address of a customer. This means that emails are exact. And so you can run this query instead:

SELECT * FROM my_user WHERE email = 'foobar@gmail.com'

Or, if you still want speed and a bit of flexibility, you can do this:

SELECT * FROM my_user WHERE email LIKE 'foobar@gmail.com%'

This works especially well with data types such as phone numbers.

If a searched phone number is valid (which can be checked with Google's libphonenumber library), it probably OK to do a WHERE phone = xxx query. If, on the other hand, the phone number is incorrect, that may mean that it is incomplete, in which case you can revert to a slower but more accurate WHERE phone LIKE '%xxx%' query.

I've found that using data validation can be a good way to build efficient SQL queries on the fly and reduce query time in most cases, reverting to slow queries only when data validation leaves you unsure what to do.

Don't show page number, unless you need to

The CRM I was working on used PageFanta's Doctrine ORM adapter:

$pager = new Pagerfanta(new \Pagerfanta\Adapter\DoctrineORMAdapter($qb));

This Pagerfanta adapter counts the number of pages. This helps display something like this in your front-end:

pagination-1

This looks nice. Doesn't it?

As a developer, it makes us feel smart to display that kind of information. It seems like it has become kind of an automatism to display this data. We don't question it's usefulness.

But it is useful? In our case, and probably in lots of cases, it's mostly useless. What matters more is to display the right data on the first page.

And, fetching the number of pages from the database takes time. Getting rid of this completely was an acceptable solution in our case. This how we do that with PagerFanta:

$pager = new Pagerfanta(new \Pagerfanta\Adapter\CallbackAdapter(
    function() use($limit, $offset) {
        // approximation that allows always having one more page
        return $limit * 2 + $offset;
    },
    function() use($qb) {
        return $qb->getQuery()->getResult();
    }
));

Set indexes

We've discussed the fact that using indexes can speed up queries in some situations. In a Symfony website, adding indexes can be setup with the Index annotation on any entity's class:

use Doctrine\ORM\Mapping as ORM;

/**
 * @ORM\Table(name="my_user", indexes={
 *   @ORM\Index(name="idx_status", columns={"status"}),
 * })
 * @ORM\Entity(repositoryClass="AppBundle\Repository\UserRepository")
 */
class User
{
    /**
     * @var integer
     * 
     * @ORM\Column(type="integer")
     */
    protected $status;
}

You may also use composite indexes to speed up specific types of queries that use WHERE on multiple fields in a single query.