Cursor based pagination

Traditional pagination existed for many years. But it’s not useful for platforms with huge amount of user generated content. Luckily, there is an alternative solution for such systems.

Traditional pagination

Before we jump into the solution, let’s look at traditional pagination and it’s drawbacks. Let’s say we have a database table for forum posts that looks like this:

IDTitleContentLastActiveDate
1Some postSome content2022-11-26T14:00:00

In this case, LastActiveDate—indicates the date when the post was last active (i.e., a comment was published).

We want to display a front page with all the posts sorted by LastActiveDate. Since we don’t know how much data we can have, we decide on an arbitrary page size of 10. And then, in order to display the posts on a particular page, we will use the following query:

SELECT * FROM posts
ORDER BY LastActiveDate
LIMIT ${pageSize}
OFFSET ${(currentPage - 1) * pageSize}

The math is very simple. Given a specific page, say 2, we will select all items starting from offset (2 - 1) * 10 = 10, limiting the selection to 10 elements, and the result will be all the items in group of [10,20).

Usually, with traditional pagination, we also need to have another query in order to generate the visual pagination itself. And for that we need to know the total amount of pages we have, which can be done by:

const totalItems = await sql("SELECT count(ID) FROM posts") as number;
const totalPages = Math.ceil(totalItems / pageSize);

Drawbacks

This pagination style, however, has one major drawback. Let’s say I’ve navigated to page 1 of the forum and then decided to go make some tea. After coming back and seeing that no posts on the first page are of interest to me, I navigate to the second page and find out that it includes 2 posts from the first page.

This happens because during the time I was making and drinking my tea, some older posts that were not among the 10 posts on the first page — were updated, hence changing their LastActiveDate to be the most recent. Upon navigating to the second page, I now see 10 more posts, but because of the shift in order, 2 posts from the first 10 — were pushed into the second 10 on the second page.

Usually, on platforms with low update rates, like a small forum — this drawback is not that critical. Occasionally users will see content on the next page, which appeared on the previous page — but the frequency during which it happens — is low. We can increase the page size to say 15, 20, 25, or even 30 posts per page — and since forums often have activity concentrated in few popular or new posts on the first page, the chance of seeing the same data on next pages — is small, and we can live with that.

However, for high throughput platforms like Twitter, or machine-to-machine streaming services — this case is unacceptable. Image navigating Twitter only to see the same content over and over again, because the rate at which content is created on Twitter is rapid, and thus content from the previous page is always pushed down to the next page.

In order to solve this problem, let me introduce — Cursor based pagination.

Cursor based pagination

Let’s say we want our forum pagination to never display data from previous pages. When a user navigates to the second page — she will always see data that suppose to be on that page. This can be achieved with cursor based pagination.

How cursor based pagination works

In cursor based pagination, we do not have logical pages. Instead, each page returns the data that suppose to be displayed on that page as well as a token to retrieve the next page. The token is usually opaque, and for the end user it has no meaning — but our system knows to read it and generate the next page based on it.

This brings me to the first drawback of cursor based pagination:

Attention

Cursor based pagination is forward. It does not support going back (i.e. navigating from second page back to first, or from third back to second); nor it supports skipping pages (i.e. going to page 5 from page 2).

On a platform like forum, this might not be very useful. Forums are typically meant to be browsed, so users always can jump to a specific page, navigate backwards and forwards, or event jump to the last page. However, — on platforms with infinite scrolling navigation, like feeds, this solution is perfect.

Cursor based pagination example

Let’s say our product manager came up with the terrible idea of turning our 90s looking forum with traditional pagination, to 2000s looking forum with infinite scroll feed. We’ve tried to warn him that this is a bad idea, but he insisted that users wanted it. And so let’s turn our forums’ traditional pagination — to cursor based instead.

First, we need to select the cursor. Since forum posts are often sorted in chronological order from newest to oldest post, it makes sense to use the LastActiveDate as a cursor. Hence, the retrieval of a particular pages’ data can be done like this:

async function getPosts(cursor: null | Date) {
	const posts = await table("posts").select("*").orderBy("LastActiveDate", "DESC").limit(PAGE_SIZE).where(q => {
		if(cursor) {
			q.where("LastActiveDate", "<", cursor);
		}
	});

	return {posts, nextPageCursor: posts.length ? posts[posts.length - 1]["LastActiveDate"] : null}
}

Disclaimer: The query builder is a pseudocode, but can be achieved with tools like KnexJS

For every call, we fetch the first PAGE_SIZE posts and provide a cursor to the next page, which is equal to the LastActiveDate of the last post in our current batch (or null if no posts exist). If the cursor is null — we start from the first page.

Since our posts are always sorted in chronologically descending order, our pagination becomes consistent.

Let’s try to do a visualization. Let’s imagine we have the following database:

IDTitle….LastActiveDate
1Title 12022-01-10T00:00:00
2Title 22022-01-09T00:00:00
3Title 32022-01-08T00:00:00
4Title 42022-01-07T00:00:00
5Title 52022-01-06T00:00:00
Disclaimer

I’ve intentionally sorted the posts by LastActiveDate to easier visualize the process. Obviously, the data does not need to be sorted in advance, since SQLs SORT BY LastActiveDate DESC handles the sorting.

For the sake of simplicity, let’s set our page size to 2. Upon first execution, we will get posts 1 and 2 and our nextPageCursor will be equal to 2022-01-09T00:00:00 (which is the LastActiveDate of the second post).

As we would like to retrieve the next page, we will provide the nextPageCursor to our function, and since the query looks at posts older than 2022-01-09T00:00:00 we will get two more posts — IDs 3 and 4.

Now — let's come back to our tea scenario. Imagine I’m looking at the forum's first page with posts 1 and 2, and before clicking next, I go to drink tea. During that time, someone posts a new comment in post ID 3 — thus altering its LastActiveDate to be today (let’s say today is 2022-01-11T00:00:00). The table will now look like this:

IDTitle….LastActiveDate
3Title 32022-01-11T00:00:00
1Title 12022-01-10T00:00:00
2Title 22022-01-09T00:00:00
4Title 42022-01-07T00:00:00
5Title 52022-01-06T00:00:00

When I come back to the computer, I click next page. In our traditional navigation — my action would have returned posts with IDs 2 and 4, thus “leaking” the second post from the first page to the second.

Cursor based navigation, with the cursor of 2022-01-09T00:00:00, would however return posts with IDs 4 and 5 as expected. Notice how post ID 3 is skipped and will never be seen by me, unless I start over from page 1. It’s important to point out that this flaw is not a problem of the cursor based pagination — but rather a problem with the way we’ve implemented cursor based pagination. Different use cases — have different implementations. It is possible to implement cursor base pagination which takes into account newer content as well. It, however, makes little sense to do so in a context of a forum, since in forums — going forward in pagination — means seeing older content.

As I’ve said before, we do lose some functionality — namely: going backwards (from page 2 to page 1 for example) and skipping pages (from page 2 to page 7 for example). So this approach is not useable for browsing content in a paginated manner, but it is a good approach to implement infinite scrolling without encountering duplicate content.

Bonus—don’t use dates and or timestamps as cursor

I could have ended the article here, but I’m pretty sure if I did so — I’d get criticized for using date as a cursor. And this will be a well deserved criticism.

You see — the entire point of cursor based navigation — is to create a consistent stream of data without repetitions. Yes — you give away some features, but in exchange you get fresh data for each page. As I’ve said — some systems might benefit from it.

However, — by using dates as cursors — you run into the same problem. You can encounter content repetition. Let’s assume we have a high traffic forum with a very active community. Let’s look at a subset of such forums posts table:

IDTitle….LastActiveDate
675Title 12022-01-10T12:31:22
123Title 22022-01-10T12:30:48
534Title 32022-01-10T12:30:23
301Title 42022-01-10T12:30:23
231Title 52022-01-10T12:30:23
945Title 62022-01-10T12:30:17

Disclaimer: as earlier, the data is sorted by LastActiveDate for demonstration purposes only.

For the sake of example, let’s say our page size is 2. And since we are looking at a subset of data, let’s say we hold a nextPageCursor of 2022-01-10T12:31:22. What will be the result of the invocation of getPosts("2022-01-10T12:31:22");? If you’ve read through the article and understood it well, you would be correct to answer: posts 123 and 534. And what will be the nextPageCursor value for that invocation? The correct answer is 2022-01-10T12:30:23.

Now, here’s the problem. Let’s say you want to navigate to the next page using nextPageCursor = '2022-01-10T12:30:23'. What would be the result? That’s true — post ID 945. Since the query we’ve written earlier, translates to:

SELECT * FROM posts
ORDER BY LastActiveDate DESC
WHERE LastActiveDate < '2022-01-10T12:30:23'
LIMIT 2

We are skipping posts ID 301 and ID 231. The hasty among you might just run and replace < to <=, but this introduces another issue. Now, instead of getting post ID 945 you get posts ID 534 and ID 301. So not only you are getting post ID 534 again, your nextPageCursor is set to 2022-01-10T12:30:23 so you're essentially stuck in an endless loop (unless someone decides to update one of the posts).

Dates, although provide chronological order, are not providers of uniqueness. In a system with high rate of content, you might end up with collisions. And I know what you are thinking about: I’ll just add milliseconds! While milliseconds might help you to prevent this issue in human generated content (although there is still a chance of collision, depending on the traffic and the nature of the system) — they are not a solution for machine generated content in systems with high rate of content generation. There are other solutions, like content versioning, to provide uniqueness of the cursor.

Also keep in mind that cursors do not have to be bound to one specific dimension. I once designed a system where the cursor was a JSON that held multiple vectors for data retrieval. This JSON was then base64 encoded and was used as a cursor for navigation. So you could combine multiple fields, and any other data you would want, inside the cursor. As long as you can pass the cursor in the URL and your system can decode it — it’s a viable solution.

Share this article

Published by

Dmitry Kudryavtsev

Dmitry Kudryavtsev

Senior Software Engineer / Tech Lead / Consultant