SimpleXml and PHP Allowed Memory Size Exchausted Fatal Error

Today, Daniel and me had one of the better PHP bugs of our career so far. For a current project, Daniel tried to import 150MB of XML data into MySQL using FLOW3's Doctrine persistence layer. Trying to import 1000 records from XML showed a nice PHP Allowed Memory Size Exhausted Fatal Error. Perhaps we should say, that PHP's memory limit was set to 1GB!

Since the importer mainly consists of SimpleXML objects and Doctrine persistence, Daniel tried to figure out, which part of the two tries to allocate more memory than allowed. So he only used a few XML records (like 100) and imported them 10 times (which results in the same amount of Doctrine objects). Since this worked fine, Daniel assumed that there must be a problem with SimpleXML.

We now accumulated the memory used for all the SimpleXML object calls via memory_get_usage() and did the same for all the Doctrine methods. To our surprise, SimpleXML seemed to produce only 6MB of memory usage, Doctrine around 5MB which makes 11MB altogether.

But even if the overall memory usage did not exceed 50MB, after importing 430 records, PHP crashed repeatedly.

After thinking about it for a while we tried to figure out which processes could occur that happen only after a certain amount of time and not after each iteration. We thought about some Doctrine background processes, but that would not explain why the behavior did not occur if there were only 100 XML records imported 10 times (which actually makes the same amount of Doctrine objects and operations).

It took us quite a while to think about Garbage Collection. So we thought that the Garbage Collector of PHP somehow would not be called and thus to much memory would be exhausted. Our next step was calling the GC manually after each iteration. To our surprise this lead to a Fatal error with exhausted memory only after one iteration.

So what happened? Although the object structure created by SimpleXML only took about 6MB of RAM, there must be millions of objects for 150MB of XML data (at least every node in the document will have its own node object in the structure). Nothing bad happened as long as the Garbage Collector did not try to figure out which objects are still alive and which are not as this somehow invokes traversing the object graph. Even if there are implementations of GCs that allow doing this with very little memory requirements, PHP does not seem to make use of such an implementation. In other words: Trying to keep the memory consumption low ended up in crashing the PHP process.

So here comes the solution to the problem: We simply turned of Garbage Collection completely using gc_disable(). Right after that, everything went fine... So if you ever have the problem of SimpleXML consuming to much memory, this could be the solution for your problem.


 
Inhalt © Michael Knoll 2009-2012  •  Powered by TYPO3  •  TypoScript Blogging by Fabrizio Branca  •  TYPO3 Photo Gallery Management by yag  •  Impressum