Lesson

Handling Large Datasets with Pagination

When working with large datasets in Neo4j, returning all data at once can be slow, consume excessive memory, and overwhelm clients. Pagination solves this by returning data in smaller, manageable chunks.

Why Pagination?

Consider the following tool that lists all movies:

typescript
server.registerTool("listAllMovies", {
  description: "List ALL movies in the database",
}, async () => {
  const { records } = await driver.executeQuery(
    "MATCH (m:Movie) RETURN m.title AS title ORDER BY m.title",
    {},
    { database }
  );
 
  // What if there are 100,000 movies?
  const movies = records.map(r => r.get("title"));
  return {
    content: [{ type: "text", text: movies.join("\n") }],
  };
});

This approach breaks down with large datasets. It loads every movie into memory, serializes a massive string, and sends it all at once to the client. Pagination lets you fetch and return data in smaller pages instead.

Understanding Cursor-Based Pagination

Pagination allows you to fetch data in smaller pages or batches. MCP uses cursor-based pagination, where a cursor (opaque string) marks your position in the dataset.

How it works:

  1. Client requests the first page (no cursor)
  2. Server returns the first batch plus a cursor to the next page
  3. Client requests the next page using the cursor
  4. Server returns the next batch plus a new cursor
  5. Process repeats until no cursor is returned (end of data)

Implementing Pagination in Neo4j

To implement pagination in a Cypher query, use Neo4j's SKIP and LIMIT clauses.

The following query returns the first 100 movies:

cypher:First 100 movies
MATCH (m:Movie)
RETURN m.title
ORDER BY m.title
SKIP 0 LIMIT 100  // First page (0-99)

The following query skips the first 100 movies and returns the next 100 movies:

cypher
MATCH (m:Movie)
RETURN m.title
ORDER BY m.title
SKIP 100 LIMIT 100  // Second page (100-199)

The cursor is simply the skip value encoded as a string.

Paginated Tool Implementation

You can implement pagination as a tool with cursor and pageSize parameters:

typescript
import { z } from "zod";
 
server.registerTool("listMoviesPaginated", {
  description: "List movies with pagination support",
  inputSchema: {
    cursor: z.string().default("0").describe("Pagination cursor (skip value as string, default '0')"),
    pageSize: z.number().default(50).describe("Number of movies per page (default 50)"),
  },
}, async ({ cursor, pageSize }) => {
  // Convert cursor to skip value
  const skip = parseInt(cursor, 10);
 
  console.error(`Fetching movies ${skip} to ${skip + pageSize}...`);
 
  // Query with SKIP and LIMIT
  const { records } = await driver.executeQuery(
    `
    MATCH (m:Movie)
    RETURN m.title AS title, m.released AS released
    ORDER BY m.title
    SKIP $skip
    LIMIT $limit
    `,
    { skip: neo4j.int(skip), limit: neo4j.int(pageSize) },
    { database }
  );
 
  const movies = records.map(record => record.toObject());
 
  // Calculate next cursor
  // If we got a full page, there might be more data
  const nextCursor = movies.length === pageSize
    ? String(skip + pageSize)
    : null;
 
  console.error(`Returned ${movies.length} movies`);
 
  return {
    content: [{
      type: "text",
      text: JSON.stringify({
        movies,
        nextCursor,
        currentPage: Math.floor(skip / pageSize),
        pageSize,
      }, null, 2),
    }],
  };
});

The key elements of this implementation:

  1. Cursor parsing - The cursor string is converted to a numeric skip value using parseInt()
  2. SKIP and LIMIT - The Cypher query uses these clauses to fetch only the requested page
  3. Next cursor calculation - If the result set is full (equal to pageSize), there may be more data. The next cursor is the current skip value plus the page size
  4. Structured response - The response includes both the data and pagination metadata

Best Practices for Pagination

  1. Consistent ordering - Always use ORDER BY to ensure consistent results across pages
  2. Reasonable page sizes - Default to 20-50 items per page for good user experience
  3. Include metadata - Return page number, total pages (if known), and hasMore flag
  4. Handle invalid cursors - Validate cursor values and handle errors gracefully
  5. Optimize queries - Use indexes on properties used in ORDER BY and WHERE clauses
  6. Consider total counts - For some UIs, include total count (but this adds query overhead)