My understanding of using ElasticSearch as cache

Published in

FAUN — Developer Community 🐾

4 min readJul 8, 2020

--

Photo by Michael Dziedzic on Unsplash

Introduction

Elasticsearch is another NoSQL technology.
It is a distributed datastore.
Allow you to store, search, and analyze data.
It is indexation of data on top of Apache Lucene, provides a full-text search engine written in Java.

Which side of the CAP theorem?

Write Operations

At the time of sufficiently bad network partition, ElasticSearch’s compare and set operation will take over consistency over availability.
Try to preserve the logic that acknowledged writes should not fail, but unacknowledged writes may fail.

Read Operations

Read operations offer stronger availability than consistency.
A search may sometime return older results than failing.

Scaling

Vertical: By increasing the resources of the node. eg RAM, Disk space, etc.
Horizontal: Adding the node on the fly will increase High Availability and Resiliency.
As ElasticSearch performs auto-replication of data, in the case of node failure other nodes are there to serve requests.

Internal Architecture

ElasticSearch has a schemaless engine.
Documents are stored in JSON format.
Partitioning is done in the form of Shards.
A Shard is a Lucene index and the smallest unit of scale.
ElasticSearch index is a logical namespace that regroups the collection of shards.
Whenever a request comes to ElasticSearch it gets routed to the appropriate shard.

Shards

Shards are of two types: Primary and Replica

Internal working of Primary Shard

At the very start, there is only 1 Primary Shard.
As time increases, read/write throughput also increases.
1 Primary Shard is not enough.
Shards cannot scale on the fly, But nodes can.
Requirement: Need a bigger share, 2 more Primary Shards to re-index all the data.
Solution: Adding a new node will increase cluster capacity.
As soon as the new nodes get added to the cluster, ElasticSearch copies shard over the network on the new node.

Internal working of Replica Shard

Replica shards only get created at failure.
If Primary shard dies, then Replica shard becomes primary shard.

KeyNote

An increase in replicas will not increase performance
Increasing nodes will increase performance.

Nodes

Master Nodes

Master Nodes hold no data, indexes, or search requests.
Perform cluster management.
Ensure cluster stability.
Recommended 3 Master Nodes (minimum), if 1 fails others will maintain redundancy.

Data Nodes

Holds data, indexes, and serves search requests.

Client Nodes

Ensure load balancing.
Sometimes perform part of the data node. For eg. scattering the requests over the nodes and gathering the responses.

Monitoring of ElastiSearch

Marvel

Plugin for ElasticSearch monitoring.
Throws metrics on Kibana.
Type of Info: Node info, indexes, shards, and CPU used.

Search Request Example

Search all documents under the yahoo(example) index.

curl -XGET localhost:9200/yahoo/_search -d '
    {
       "query": {
           "match_all": {}
       }
    }
   '

Use Cases

Using NoSQL as a cache in SQL architecture.

The search features in CouchBase are not really handy. That discourages using CouchBase alone.
Handling map-reduce functions are not easy if it is the case of simple aggregation.
CouchBase ElasticSearch Plug-in: Replicates all data using XDCR (Cross Data Center Replication), from CouchBase to ElasticSearch.

Why not ElasticSearch only?

ElasticSearch has the ability to index the entire object.
ElasticSearch has the ability to perform simple to complex search queries through search API.
As per best practices, the data should be stored in the document database instead of datastore

References

Scalable Big Data Architecture by Bahaaldine Azarmi.
https://discuss.elastic.co/t/which-side-of-cap-theorem-elasticsearch-satisfy/177810

Subscribe to FAUN topics and get your weekly curated email of the must-read tech stories, news, and tutorials 🗞️

Follow us on Twitter 🐦 and Facebook 👥 and Instagram 📷 and join our Facebook and Linkedin Groups 💬

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇

Rajat Nigam

Written by Rajat Nigam

Writer for

FAUN — Developer Community 🐾

I'm a lifetime student of software engineering. Professionally I work in the capacity of Individual Contributor, Trainer, Lead Engineer and an Architect.

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams