Scrolling searches

The Data Lake Scroll API lets you retrieve very large result sets (hundreds of thousands or millions of documents) reliably and in batches.

A normal search is optimized for interactive queries and returns only the first page; the Scroll API, instead:

Creates a point-in-time snapshot of the matching results (as of the initial search).
Returns results in fixed-size pages (e.g., 1,000 docs per call).
Gives you a _scroll_id to request the next page repeatedly until no hits remain.

Use scrolling when you need to export, backfill, or reprocess many documents. For interactive UI queries, prefer normal search (or PIT + search_after).

Steps to query an index and iterate results

Step A — Run the initial search and start a scroll context

Set a scroll keep-alive (e.g., 1m = one minute).
Choose a page size (size) that fits your memory/network (1,000–5,000 is common).
Sort by "_doc" for efficient scrolling.

POST /datalakeapi/index_name/_search?scroll=1m
{
  "size": 1000,
  "sort": ["_doc"],
  "query": { "match_all": {} }
}

Response:

{
  "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAA....",
    "events": [
      { "_index": "index_name", "_id": "1", "_source": { ... } },
      ...
    ]
}

Save the returned _scroll_id.
Process the first page’s hits.

TIP

If you only need the _source fields, add "_source": ["fieldA","fieldB",... ] to reduce payload.

Step B — Request the next page using the last _scroll_id

Always send the latest _scroll_id you received.
Keep the same scroll keep-alive in each call (it refreshes the timeout).

POST /datalakeapi/_search/scroll

{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAA...."
}

Response:

{
  "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAA....NEW...",
    "events": [
      { "_index": "my-index", "_id": "1001", "_source": { ... } },
      ...
    ]
}

Process this page of events.
Replace your stored _scroll_id with the new one from this response.
Repeat Step B until events is an empty array [] (no more results).

Step C — End of results: clean up the scroll context (recommended)

When you finish (or on error), clear the server-side scroll context(s):

DELETE /datalakeapi/_search/scroll
{
  "scroll_id": ["DXF1ZXJ5QW5kRmV0Y2gBAAAA....LAST..."]
}

You can also pass multiple IDs in the array if you tracked more than one.

PreviousAggregations

Last updated 5 months ago

hashtagSteps to query an index and iterate results

hashtagStep A — Run the initial search and start a scroll context

hashtagStep B — Request the next page using the last _scroll_id

hashtagStep C — End of results: clean up the scroll context (recommended)

Steps to query an index and iterate results

Step A — Run the initial search and start a scroll context

Step B — Request the next page using the last _scroll_id

Step C — End of results: clean up the scroll context (recommended)