My understanding of using ElasticSearch as cache

Rajat Nigam
FAUN — Developer Community 🐾
4 min readJul 8, 2020

--

Photo by Michael Dziedzic on Unsplash

Introduction

  • Elasticsearch is another NoSQL technology.
  • It is a distributed datastore.
  • Allow you to store, search, and analyze data.
  • It is indexation of data on top of Apache Lucene, provides a full-text search engine written in Java.

Which side of the CAP theorem?

Write Operations

  • At the time of sufficiently bad network partition, ElasticSearch’s compare and set operation will take over consistency over availability.
  • Try to preserve the logic that acknowledged writes should not fail, but unacknowledged writes may fail.

Read Operations

  • Read operations offer stronger availability than consistency.
  • A search may sometime return older results than failing.

Scaling

  • Vertical: By increasing the resources of the node. eg RAM, Disk space, etc.
  • Horizontal: Adding the node on the fly will increase High Availability and Resiliency.
  • As ElasticSearch performs auto-replication of data, in the case of node failure other nodes are there to serve requests.

Internal Architecture

  • ElasticSearch has a schemaless engine.
  • Documents are stored in JSON format.
  • Partitioning is done in the form of Shards.
  • A Shard is a Lucene index and the smallest unit of scale.
  • ElasticSearch index is a logical namespace that regroups the collection of shards.
  • Whenever a request comes to ElasticSearch it gets routed to the appropriate shard.

Shards

  • Shards are of two types: Primary and Replica

Internal working of Primary Shard

  • At the very start, there is only 1 Primary Shard.
  • As time increases, read/write throughput also increases.
  • 1 Primary Shard is not enough.
  • Shards cannot scale on the fly, But nodes can.
  • Requirement: Need a bigger share, 2 more Primary Shards to re-index all the data.
  • Solution: Adding a new node will increase cluster capacity.
  • As soon as the new nodes get added to the cluster, ElasticSearch copies shard over the network on the new node.

Internal working of Replica Shard

  • Replica shards only get created at failure.
  • If Primary shard dies, then Replica shard becomes primary shard.

KeyNote

  • An increase in replicas will not increase performance
  • Increasing nodes will increase performance.

Nodes

Master Nodes

  • Master Nodes hold no data, indexes, or search requests.
  • Perform cluster management.
  • Ensure cluster stability.
  • Recommended 3 Master Nodes (minimum), if 1 fails others will maintain redundancy.

Data Nodes

  • Holds data, indexes, and serves search requests.

Client Nodes

  • Ensure load balancing.
  • Sometimes perform part of the data node. For eg. scattering the requests over the nodes and gathering the responses.

Monitoring of ElastiSearch

Marvel

  • Plugin for ElasticSearch monitoring.
  • Throws metrics on Kibana.
  • Type of Info: Node info, indexes, shards, and CPU used.

Search Request Example

Search all documents under the yahoo(example) index.

curl -XGET localhost:9200/yahoo/_search -d '
{
"query": {
"match_all": {}
}
}
'

Use Cases

Using NoSQL as a cache in SQL architecture.

  • The search features in CouchBase are not really handy. That discourages using CouchBase alone.
  • Handling map-reduce functions are not easy if it is the case of simple aggregation.
  • CouchBase ElasticSearch Plug-in: Replicates all data using XDCR (Cross Data Center Replication), from CouchBase to ElasticSearch.

Why not ElasticSearch only?

  • ElasticSearch has the ability to index the entire object.
  • ElasticSearch has the ability to perform simple to complex search queries through search API.
  • As per best practices, the data should be stored in the document database instead of datastore

References

Subscribe to FAUN topics and get your weekly curated email of the must-read tech stories, news, and tutorials 🗞️

Follow us on Twitter 🐦 and Facebook 👥 and Instagram 📷 and join our Facebook and Linkedin Groups 💬

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇

--

--

I'm a lifetime student of software engineering. Professionally I work in the capacity of Individual Contributor, Trainer, Lead Engineer and an Architect.