Currently the Drupal pager system is an interesting look into some of the past decisions in Drupal design. The system is actually a very powerful paging system that can handle multiple pagers on the same page that act completely independently. However, it is also a kind of messy piece of code that stores a lot of its data in very obscure globals in formats that don't make sense. It is difficult to document and even more difficult to understand from the outside. The fact that the pager works basically flawlessly for what it does guarantees that it will see no improvements in its current form, because the system is difficult enough to understand that few developers are going to bother looking at it, even though there is a definite demand for some more functionality out of it.

The current design of the pager system is based on the poorly named 'element ID', which is really the pager ID. This ID is a number that indicates which pager to use. 99.9% of the pagers in Drupal have a pager id of 0, which makes sense because most pagers are the only one in use on the page.

The pager ID corresponds exactly to which item you will get from $_GET['page'] if you explode it on a comma. To put that in more human terms, if you have a page with this URL: http://www.example.com/somepage?page=x,y,z then 'x' corresponds to element ID 0, 'y' corresponds to element ID 1, and 'z' corresponds to element ID 2.

This leads to a very important consideration about pager IDs: always use the lowest number possible. Using a pager ID of 25 will result in a URL like this: http://www.example.com/somepage?page=,,,,,,,,,,,,,,,,,,,,,,,,x where 'x' is the page number for that element. While this guarantees you probably wont' conflict with some other pager, it sure makes that URL awfully ugly.

There's nothing actually wrong with this, except the documentation doesn't do a good job of making it explicit what this means.

Now, where all this starts to get ugly is how this information is utilized and transmitted. When you call the pager_query() function, you send it a limit, which is the number of items per page, and a pager ID. It does the necessary magic to determine the page number, stores the limit somewhere, and adds the proper goo to the query. This is great, except if you ever need to access that data for any other reason. Then suddenly it's not so great.

This data is actually stored in a global named 'pager_page_array'. It's also stored in global 'pager_total' and and 'pager_total_items'. Just looking at those variable names, it's not actually clear which variable is which. It happens that $pager_page_array stores the current page for a given pager (assuming it's been referenced somewhere), $pager_total stores the total number of pages and $pager_total_items stores the total number of items. $limit is then not actually stored anywhere.

With the data scattered around, and some other concerns about whether or not the data is even paid attention to (pager_query() just rewrites $pager_page_array when it runs, which can lead to a weird situation where one pager can overwrite what another pager set if the page number was out of bounds), the system can't really be modified from the outside. But there are several reasons one might want to do this:

  1. Users have expressed the desire to have URL based paging. For example, http://www.example.com/somepath/1 instead of http://ww.example.com/somepath?page=1
  2. If you want an offset, it can't be done. This means that I may want a pager that completely skips the first result. This is not uncommon on sites that want the very first article to be hilited specially, and still want a pager.
  3. If you want to use the pager system without pager_query().
  4. If you want page 1 to actually be ?page=1. Right now, ?page=1 is actually page 2, because the first page is ?page=0.
  5. If you want to limit the number of pages that appear even if there are really more.
  6. Theming just one particular pager differently from others.

I'm sure there are other things I've forgotten, but as the maintainer of Views I've seen requests for all of these things. Of them all, none of them are possible unless you do it yourself. In Views, I actually don't use pager_query() and instead load the data manually, which allows Views to do 2 and 3, but it doesn't help with 1, 4 and 5.

I propose that the pager system be modernized. I don't personally have the time to write this, but some enterprising developer could spend a few hours writing this and try to get a patch through. (Really, getting the patch through might be more effort than actually writing it).

First, there should be no globals

The use of globals is something Drupal has been moving away from, so this could support that effort. Globals are difficult to maintain, difficult to document, and entirely too easy to have namespace conflicts. Pagers could be stored in a static array with an accessor and a mutator (getter and setter) much like many other systems currently in Drupal.

Second, the data should be organized

All of the data should be stored together. We can make a good use of an object here, even if we don't use methods. Objects are nicely documentable:

<?php
/**
* Information about a single pager.
*/
class pager {
 
/**
   * The ID of the pager, which corresponds to the position in $_GET['page']
   */
 
var $id = NULL;

 
/**
   * The number of items per page this pager is configured to use.
   */
 
var $items_per_page = NULL;

 
/**
   * The total number of pages in this pager.
   */
 
var $total_pages = NULL;

 
/**
   * The total number of items in this pager.
   */
 
var $total_items = NULL;

 
/**
   * The current page of this pager.
   */
 
var $current_page = 0;
}
?>

The above class represents the minimum required to support the existing functionality. In addition, we could add other features to this that don't currently exist: 'offset', 'max_pages', 'first_page', an identifier so that pages for different uses can be differentiated easily, etc. We could then easily access all of our pagers like this:

<?php
/**
* Set the information for a pager.
*
* @param $pager_id
*   The id of the pager to return. If not set, an array of all pagers will be returned.
*/
function &pager_get($pager_id = NULL) {
  static
$pagers = array();

 
// If no argument is set, return a reference to the entire array:
 
if (!isset($pager_id)) {
    return &
$pagers;
  }

 
// A little safety here could prevent some weirdness later:
 
$pager_id = (int) $pager_id;

  if (!isset(
$pagers[$pager_id])) {
   
$pagers[$pager_id] = new pager;
   
$pagers[$pager_id]->id = $pager_id;
  }
  return &
$pagers[$pager_id];
}
?>

This function then knows about all of the pagers, and it returns references to the objects so that they may be modified. Those with an OO background may actually be horrified because in reality this is just globals by another name, and there is very little actual protection. The pager object has no constructor nor setters or getters which would make the OO devs happier. But since Drupal does not have a strong OO base, I am not going to suggest these things, as they may simply be seen as things that get in the way even though they do provide a level of safety.

Make it easy to get and set common pager data

<?php
function &pager_create($pager_id, $limit, $total_items = NULL) {
 
$pager = pager_get($pager_id);
 
$pager->items_per_page = $limit;
  if (isset(
$total_items)) {
   
$pager->total_items = $items;
   
$pager->total_pages = ceil($items / $pager->items_per_page);
   
// Make sure our current page is not out of range.
   
if ($pager->current_page >= $pager->total_pages) {
     
// Remember that zero counting means that the highest page # is total pages - 1.
     
$pager->current_page = $pager->total_pages - 1;
    }
  }
  return
$pager;
}

function
pager_set_current_page($pager_id, $page) {
 
$pager = pager_get($pager_id);
 
// The max() insures that asking for page -1 doesn't do anything.
 
$pager->current_page = max(0, int($page));
}
?>

Certainly there are more, but the above make the API easy to use.

Automatically create pager objects in hook_init

We really shouldn't ever have to look in $_GET['page'] more than once, and we especially shouldn't have to worry about one pager ovewriting another pager's data. Luckily, that's what hook_init is for:

<?php
function system_init() {
  if (!empty(
$_GET['page'])) {
   
$pagers = array_filter(explode(',', $_GET['page']));
    foreach (
$pagers as $id => $current_page) {
     
pager_set_current_page($id, $current_page);
    }
  }
}
?>

The above function doesn't bother to set any pagers that have a current page of 0, since that's the default and it would serve no purpose to set that information for pagers that may not even be used.

Simplifying output

Right now, outputting the pager data is actually kind of a hard to use call to theme('pager'). But with this system, we could simply do this:

<?php
/**
* Render the pager for the given pager ID.
*
* @param $pager_id
*   The pager ID to use, which corresponds to the position in $_GET['page'].
*   If unspecified, this will assume a pager ID of 0.
*/
function pager_render($pager_id = 0) {
 
$pager = pager_get($pager_id);
  if (
$pager->total_pages) {
    return
theme('pager', $pager);
  }
}
?>

The implementation of theme('pager') is left as an exercise to the reader, or whoever decides to try and submit this as a patch.

Tying it all together

When we put this all together, the first obvious benefit is that pager_query() gets a nice, easy to read interaction with this (Code taken from D6, D7 will need additional code elsewhere):

<?php
function pager_query($query, $limit = 10, $element = 0, $count_query = NULL) {
 
// Substitute in query arguments.
 
$args = func_get_args();
 
$args = array_slice($args, 4);
 
// Alternative syntax for '...'
 
if (isset($args[0]) && is_array($args[0])) {
   
$args = $args[0];
  }

 
// Construct a count query if none was given.
 
if (!isset($count_query)) {
   
$count_query = preg_replace(array('/SELECT.*?FROM /As', '/ORDER BY .*/'), array('SELECT COUNT(*) FROM ', ''), $query);
  }

 
$pager = pager_create($element, $limit, db_result(db_query($count_query, $args)));
  return
db_query_range($query, $args, $pager->current_page * $limit, $limit);
}
?>

And since pager creation is now much easier, non-query based pagers can be more readily used, such as for paging module.

What improvements could come after this?

This system then has several good openings for improvement, many of them mentioned above. Adding an offset would be nice, as well as opening the door for some particular pagers to use their own URL and manually set pager data from it. But one item I mentioned bears more explanation, named pagers.

If a pager identifier were added, this could create named pagers so that themers could target individual pages. For example, take this pager taken from node_page_default():

<?php
  $result
= pager_query(db_rewrite_sql('SELECT n.nid, n.sticky, n.created FROM {node} n WHERE n.promote = 1 AND n.status = 1 ORDER BY n.sticky DESC, n.created DESC'), variable_get('default_nodes_main', 10));

 
// ... code snipped ...

 
$output .= theme('pager', NULL, variable_get('default_nodes_main', 10));
?>

It could rather look like this:

<?php
  $result
= pager_query(db_rewrite_sql('SELECT n.nid, n.sticky, n.created FROM {node} n WHERE n.promote = 1 AND n.status = 1 ORDER BY n.sticky DESC, n.created DESC'), 'front_page', variable_get('default_nodes_main', 10));

 
// ... code snipped ...

 
$output .= pager_render();
?>

And the new pager_render() gets a little twist:

<?php
 
return theme(array('pager_' . $pager->name, 'pager'), $pager);
?>

For those not familiar with a new benefit of the Drupal 6 theming system, that assuming the theme function is registered properly, means that one could implement:

<?php
 
function THEMENAME_pager_frontpage($pager) {
   
// ...
 
}
?>

And that would theme only that particular pager. Modules could use this too, as long as they do the right thing with theme registry.

Conclusion

Just writing this, I feel like I've written a lot of the code necessary to do this, though a lot of the complexity of the paging system is also in the rather intricate work done in the theme. However, much of the difficulty in working with the pager theming has to do with the obscurity of the data. By using a single $pager object with obvious names, it should be much easier to write a readable theme for the pager that is more readily modified, perhaps even templatized if the performance penalty of doing so seems worth it. I do hope someone decides to take this up, I'd certainly be willing to provide a code review.

Comments

Wow, what timing. I was just commenting on the pager system here: http://drupal.org/node/33809

Making the entire pager system OO rather than just the query part makes a fair bit of sense. Right now, for non-query pagers I have a bit of code I just copy around and modify, because it seems to work and I don't actually understand how. :-)

As Earl notes the potential for named pagers and separate theming per pager is slick, too. Overlay that with multiple pager engines (default, mini, sliding, variable size, etc.) and you have a lot of potential flexibility.

This particular design doesn't technically make it OO, it just makes it follow OO principles while still using procedural mechanisms, which is usually the easiest way to get OO stuff into Drupal. Tho it is just a semantic sugar step to get true OO, there's actually some nice bits here in that you don't always need the pager object, the pager ID is just fine. Though given PHP 5, you can do things like pager_get($pager_id)->set_current_page() if you go the OO route, a lot of people have an aversion to that syntax.

When you have millions of nodes, the query "SELECT COUNT(*) FROM node WHERE published=1 AND status=1" is a KILLER. We need to have a pager option that doesn't count the whole set, and doesn't tell you how many results there are, or let you go to the last page. Rather, this pager will just allow you to advance, one page at a time. See twitter. That's how they do it.

Check out "Efficient Pagination Using MySQL" by a guy from Yahoo presented at a recent Percona (Mysql performance) conference - http://www.percona.com/ppc2009/PPC2009_mysql_pagination.pdf

Really great view on this, I hope something can be done about this pager_query business.

nice, hopefully one day we will be able to reverse the pager numbers, so the 1st page is actually the 1st posts of the sites.
This would make sites more like archives or indexes, so going to page 162 today is the same when you back to it again a year later.

I'm currently building a poetry site for education, with all the material stored in two content types, Pages and Books. The navigation uses taxonomy to define and call the sections and sub-sections (based on educational level).

At node listing level, I'd like to see a list of Pages and 'parent' Books, but instead, the taxonomy query lists _all_ Book pages as well as Pages, using the pager - listing 5 pages.

Not to be outdone, I override taxonomy_render_nodes() to exclude 'child' Book pages. And now I get maybe two items followed by a pager listing 5 pages - if I'm lucky. Unlucky, and I'll get a blank page and the aforesaid pager. All because pager is running the taxonomy query before I can get to it to filter the results.

OK - I'll change the pager limit to something big like 100. How? That's when I discovered this article. It looks like the only way to modify the pager is to hide its navigation (which is no solution), or to hack the core.

Oh, and I can't use Views, because if I use a filter to exclude by Book depth, I exclude all Pages too.

Have I missed something completely obvious here or am I completely fubared?

Great, how to make the url of page navigation output as clean url.As my site http://drupalparlk.com/ robots.txt has ban all ? url to be index by SE.

Add new comment