Memory management in long-lived Symfony commands

Memory management in long-lived Symfony commands

Here are some of the things I think are useful to keep memory usage under control in Symfony commands that process large databases.

Batching

Two naive approaches to looping over all contents of a database would be:

  • Select all data at once and migrate it in one run: the network is only accessed once but this can result in an “Out of memory” error because there is probably too much data for the server’s RAM;
  • Run the migration on each row, selecting rows one by one: memory usage is low because we only ever load one row into to memory at a time, but the network is accessed as many times as there are rows, which is very slow.

Using batching allows having a somewhat low and stable memory footprint while reducing network overhead. With batching, I select rows 100 by 100 (or 1000 by 100, or whatever fits your use case) and run the migration on each batch. It might look something like this in PHP:

$batchSize = 100;

// SQL's "OFFSET" tends to be slow on large tables. A faster approach is to select rows ordered by ID
// and to set a minimum ID with a "WHERE" clause. Something like:
//
//     SELECT * FROM user WHERE id > :minimumId ORDER BY id ASC
//
// This achieves the effect of an "OFFSET" but is significantly faster.
$minimumId = 0;

do {
    $batch = getBatchOfUsersOrderedById($batchSize);
    $batchCount = count($batch);
    $minimumId = $batch[$batchCount - 1]->getId();
    if ($batchCount > 0) {
        foreach ($batch as $user) {
            // ...
        }
    }
} while ($batchCount === $batchSize);

Putting the main loop’s body inside a separate method

PHP automatically clears memory of variables that go out of scope, somewhat like what C does with its stack variables. Any variable that is created in a function is cleared once the function ends. To allow PHP to clear more memory, I put the main loop’s content inside a separate method. It looks something like this:

$batchSize = 100;
$minimumId = 0;

do {
    list($batchCount, $minimumId) = $this->runOneBatch($batchSize, $minimumId);
} while ($batchCount === $batchSize);

And in a separate method named runOneBatch:

function runOneBatch(int $batchSize, int $minimumId): array {
    $batch = getBatchOfUsersOrderedById($batchSize);
    $batchCount = count($batch);
    $minimumId = $batch[$batchCount - 1]->getId();
    if ($batchCount > 0) {
        foreach ($batch as $user) {
            // ...
        }
    }
    return [$batchCount, $minimumId];
}

Marking unused variables as collectable by garbage collection

In PHP, variables are garbage collected, which means that I usually don’t have to manage the memory they use manually. Instead, PHP handles memory management automatically. This works well most of the time. But for long-running tasks, it has its limits. Because PHP does not always know that a variable has become useless, it keeps it in memory just in case. This increases memory usage.

To tell PHP that a variable has become useless, I set it to null and unset() it:

$users = loadLotsOfUsersFromMysql();

// ... do some stuff with the users ...

$users = null;
unset($users);

Triggering garbage collection

PHP’s garbage collector uses heuristics to figure out when the best time is to remove unused variables from memory. In long-running tasks, I find it useful to manually trigger a GC cycle:

gc_collect_cycles();

Clearing Doctrine’s entity manager

Internally, Doctrine, Symfony’s ORM, keeps a reference of each entity it has loaded into memory, unless I explicitly tell it to drop the reference. To keep memory usage down, I like to clear the entity manager, which will remove references to all entities currently known to Doctrine:

/** @var \Doctrine\ORM\EntityManagerInterface $em */
$em = $this->getContainer()->get('doctrine')->getManager();
$em->clear();

SQL logging

The SQL logger of Doctrine that is enabled in Symfony by default and can be a source of memory leaks during long-lived commands. I usually disable the SQL logger in configuration or with something like this:

/** @var \Doctrine\DBAL\Connection $connection */
$connection = $this->getContainer()->get('doctrine')->getManager()->getConnection();
$connection->getConfiguration()->setSQLLogger(null);