Earlier today I happened to see the CVS application for David Strauss; now, it’s not because I watch the CVS applications list (tho apparently I’m now qualified to approve CVS apps, it’s not on my todo list for time reasons) but because it got forwarded to another list due to a particular module that was interesting. I recognized him immediately from http://thatotherpaper.com, which is a site I was pointed to as an extremely well done newspaper site.
Then he popped on the IRC with a question about CVS, and I happened to end up looking at the first of many modules he is contributing, a preemptive caching module. In some discussions, he pointed out that Drupal’s basic page cache brought a site he has worked on to its knees. To quote:
“cache cleanup was hogging MySQL for about 2 min at a time on flixya.com. And that's a dual-processor Core 2 Duo with 3GB of RAM”
The pre-emptive cache system is a wrapper around a function; when you call your function this way, it will analyze your function and decide if it needs to cache the function during cron calls. If your function runs sufficiently quickly, no caching is done. If it’s slow, caching will be done. You can do smart things with environments, and cache flushes are amortized so you don’t get the entire cache table cleared at once. The downside here is that you can get a significant lag on busy sites, but this may be a necessary tradeoff.
I’m extremely excited by this module, and other work by David. He has a whole suite of tools that he’s developed that it looks like we’re going to get.
David, congratulations on being the first new developer to come into #drupal to excite me sufficiently to blog about you. And a hearty welcome aboard. I think your contributions to Drupal have the potential to be the next Views, so to speak, if I may be so arrogant.


It's a small, small world...
How bizarre is it that when I went to look at thatotherpaper.com, the first article is about Asaf Ronin, who I know from doing improv.
The mind, she boggles.
a few nits
yes, it is great to have a new talented developer on board. welcome!
a couple of minor points about this article
- the preempt module is largely a dupe of http://drupal.org/project/resultcache. perhaps try to join efforts
- the particular example you gave of expensive cache clearing was solved in drupal5 by moving page cache to own table. it can be truncated very quickly. this was indeed a problem in 4.7
I pointed that out to him,
I pointed that out to him, and he then commented that rebuilding the entire cache is also very expensive, and causes a lot of slowdown. The site in question has 400K+ nodes (I think).
indeed
yes it is. but what is the solution beyond the caching options we have now. if we keep the cache longer, some pages don't reflect reality. i don't yet understand how these modules solve the problem without sacraficing accuracy.
It solves a number of the problems
Many responsiveness problems are created by generating content as the page loads. With complicated sites, page load times can skyrocket to 60 seconds for basic queries. With enough concurrent page loads, page execution times approach infinity.
Some of these queries are impossible to speed up; they ask inherently difficult questions. The only solution is running them when users aren't waiting for pages to load, during cron. Preempt solves this by migrating these tasks to cron job updates.
But while large sites may find basic queries taking extremely long, small sites may have no problem. Existing Drupal caches force the developer to make a decision: perform the operation as the page loads or write a hook_cron function that caches their items. Preempt solves this by measuring performance and making the decision on a dynamic basis. This prevents cron from overloading and spares modules from implementing their own cron/live performance tests and logic.
But even if modules begin writing hook_crons to update their caches, the proliferation of such hooks causes problems. All cron hooks run sequentially. Module A has no idea Module B spent 100 seconds updating its cache. Preempt solves this by managing the cache updates in a fair, shared environment that ensures that one module doesn't dominate updates.
The Other Paper raises the bar for Drupal!
As I've been haphazardly trying to create an alt-weekly/public broadcasting style site - Lawrence.com (Ellington/Django CMS) was always the gold standard. David and the Four Kitchens folks have now created something as good (or better) on Drupal (and made great use of views and panels).
As the Drupal Dojo has been anxious to work on real world/good for Drupal projects (set to work on the 'Drupal 5 Themer Pack), I'm hoping we'll be able to make a contribution towards building a kick ass newspaper style distribution. It's great to have a few more Ninjas to learn from!
Welcome David - and thanks!
Post new comment