ElasticSearch Parallel Pagination by Kafka

GET /_search
{
"from": 5,
"size": 20,
"query": {
"match": {
"user.id": "jay"
}
}
}
GET /_search
{
"size": 10000,
"query": {
"match" : {
"user.id" : "jay"
}
},
"sort": [
{"@timestamp": "asc"}
]
}
{
"took" : ...,
"timed_out" : false,
"_shards" : ...,
"hits" : {
"total" : ...,
"max_score" : null,
"hits" : [
...
{
"_source" : ...,
"sort" : [
4098435132000
]
}
]
}
}
GET /_search
{
"size": 10000,
"query": {
"match" : {
"user.id" : "jay"
}
},
"sort": [
{"@timestamp": "asc"}
],
"search_after": [
4098435132000
]
}
  • let’s say your application dies or runs out of memory during these calls
  • how to know where to start after it failed
  • maybe this process is very slow for your application use-case

Solution By using “Slice”

One call to ES to get the count of the documents (which in this case is 1M), then create a Kafka producer to put the sequences (0–100) into your Kafka topic and define your partition size as the number of your consumer app. Let’s say you have 10 consumer applications.

GET /_search?scroll=1m
{
"slice": {
"id":0,
"max":100
},
"size": 10000,
"sort": [
{"@timestamp": "asc"}
]
}

Implementation with Java/Spring Cloud

Let's run your Elastic Search and Kafka in docker-compose by running the following command:

# Elastic Search
docker-compose -f docker-compose-es.yml up -d
# Kafka
docker-compose -f docker-compose-kafka.yml up -d

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store