Memory management in long-lived Symfony commands
2018-04-22Here are some of the things I think are useful to keep memory usage under control in Symfony commands that process large databases.
Batching
Two naive approaches to looping over all contents of a database would be:
- Select all data at once and migrate it in one run: the network is only accessed once but this can result in an “Out of memory” error because there is probably too much data for the server’s RAM;
- Run the migration on each row, selecting rows one by one: memory usage is low because we only ever load one row into to memory at a time, but the network is accessed as many times as there are rows, which is very slow.
Using batching allows having a somewhat low and stable memory footprint while reducing network overhead. With batching, I select rows 100 by 100 (or 1000 by 100, or whatever fits your use case) and run the migration on each batch. It might look something like this in PHP:
$batchSize = 100;
// SQL's "OFFSET" tends to be slow on large tables. A faster approach is to select rows ordered by ID
// and to set a minimum ID with a "WHERE" clause. Something like:
//
// SELECT * FROM user WHERE id > :minimumId ORDER BY id ASC
//
// This achieves the effect of an "OFFSET" but is significantly faster.
$minimumId = 0;
do {
$batch = getBatchOfUsersOrderedById($batchSize);
$batchCount = count($batch);
$minimumId = $batch[$batchCount - 1]->getId();
if ($batchCount > 0) {
foreach ($batch as $user) {
// ...
}
}
} while ($batchCount === $batchSize);
Putting the main loop’s body inside a separate method
PHP automatically clears memory of variables that go out of scope, somewhat like what C does with its stack variables. Any variable that is created in a function is cleared once the function ends. To allow PHP to clear more memory, I put the main loop’s content inside a separate method. It looks something like this:
$batchSize = 100;
$minimumId = 0;
do {
list($batchCount, $minimumId) = $this->runOneBatch($batchSize, $minimumId);
} while ($batchCount === $batchSize);
And in a separate method named runOneBatch
:
function runOneBatch(int $batchSize, int $minimumId): array {
$batch = getBatchOfUsersOrderedById($batchSize);
$batchCount = count($batch);
$minimumId = $batch[$batchCount - 1]->getId();
if ($batchCount > 0) {
foreach ($batch as $user) {
// ...
}
}
return [$batchCount, $minimumId];
}
Marking unused variables as collectable by garbage collection
In PHP, variables are garbage collected, which means that I usually don’t have to manage the memory they use manually. Instead, PHP handles memory management automatically. This works well most of the time. But for long-running tasks, it has its limits. Because PHP does not always know that a variable has become useless, it keeps it in memory just in case. This increases memory usage.
To tell PHP that a variable has become useless, I set it to null
and unset()
it:
$users = loadLotsOfUsersFromMysql();
// ... do some stuff with the users ...
$users = null;
unset($users);
Triggering garbage collection
PHP’s garbage collector uses heuristics to figure out when the best time is to remove unused variables from memory. In long-running tasks, I find it useful to manually trigger a GC cycle:
gc_collect_cycles();
Clearing Doctrine’s entity manager
Internally, Doctrine, Symfony’s ORM, keeps a reference of each entity it has loaded into memory, unless I explicitly tell it to drop the reference. To keep memory usage down, I like to clear the entity manager, which will remove references to all entities currently known to Doctrine:
/** @var \Doctrine\ORM\EntityManagerInterface $em */
$em = $this->getContainer()->get('doctrine')->getManager();
$em->clear();
SQL logging
The SQL logger of Doctrine that is enabled in Symfony by default and can be a source of memory leaks during long-lived commands. I usually disable the SQL logger in configuration or with something like this:
/** @var \Doctrine\DBAL\Connection $connection */
$connection = $this->getContainer()->get('doctrine')->getManager()->getConnection();
$connection->getConfiguration()->setSQLLogger(null);