My understanding of using ElasticSearch as cache
Published in
4 min readJul 8, 2020
Introduction
- Elasticsearch is another NoSQL technology.
- It is a distributed datastore.
- Allow you to store, search, and analyze data.
- It is indexation of data on top of Apache Lucene, provides a full-text search engine written in Java.
Which side of the CAP theorem?
Write Operations
- At the time of sufficiently bad network partition, ElasticSearch’s compare and set operation will take over consistency over availability.
- Try to preserve the logic that acknowledged writes should not fail, but unacknowledged writes may fail.
Read Operations
- Read operations offer stronger availability than consistency.
- A search may sometime return older results than failing.
Scaling
- Vertical: By increasing the resources of the node. eg RAM, Disk space, etc.
- Horizontal: Adding the node on the fly will increase High Availability and Resiliency.
- As ElasticSearch performs auto-replication of data, in the case of node failure other nodes are there to serve requests.
Internal Architecture
- ElasticSearch has a schemaless engine.
- Documents are stored in JSON format.
- Partitioning is done in the form of Shards.
- A Shard is a Lucene index and the smallest unit of scale.
- ElasticSearch index is a logical namespace that regroups the collection of shards.
- Whenever a request comes to ElasticSearch it gets routed to the appropriate shard.
Shards
- Shards are of two types: Primary and Replica
Internal working of Primary Shard
- At the very start, there is only 1 Primary Shard.
- As time increases, read/write throughput also increases.
- 1 Primary Shard is not enough.
- Shards cannot scale on the fly, But nodes can.
- Requirement: Need a bigger share, 2 more Primary Shards to re-index all the data.
- Solution: Adding a new node will increase cluster capacity.
- As soon as the new nodes get added to the cluster, ElasticSearch copies shard over the network on the new node.
Internal working of Replica Shard
- Replica shards only get created at failure.
- If Primary shard dies, then Replica shard becomes primary shard.
KeyNote
- An increase in replicas will not increase performance
- Increasing nodes will increase performance.
Nodes
Master Nodes
- Master Nodes hold no data, indexes, or search requests.
- Perform cluster management.
- Ensure cluster stability.
- Recommended 3 Master Nodes (minimum), if 1 fails others will maintain redundancy.
Data Nodes
- Holds data, indexes, and serves search requests.
Client Nodes
- Ensure load balancing.
- Sometimes perform part of the data node. For eg. scattering the requests over the nodes and gathering the responses.
Monitoring of ElastiSearch
Marvel
- Plugin for ElasticSearch monitoring.
- Throws metrics on Kibana.
- Type of Info: Node info, indexes, shards, and CPU used.
Search Request Example
Search all documents under the yahoo(example) index.
curl -XGET localhost:9200/yahoo/_search -d '
{
"query": {
"match_all": {}
}
}
'
Use Cases
Using NoSQL as a cache in SQL architecture.
- The search features in CouchBase are not really handy. That discourages using CouchBase alone.
- Handling map-reduce functions are not easy if it is the case of simple aggregation.
- CouchBase ElasticSearch Plug-in: Replicates all data using XDCR (Cross Data Center Replication), from CouchBase to ElasticSearch.
Why not ElasticSearch only?
- ElasticSearch has the ability to index the entire object.
- ElasticSearch has the ability to perform simple to complex search queries through search API.
- As per best practices, the data should be stored in the document database instead of datastore
References
- Scalable Big Data Architecture by Bahaaldine Azarmi.
- https://discuss.elastic.co/t/which-side-of-cap-theorem-elasticsearch-satisfy/177810
Subscribe to FAUN topics and get your weekly curated email of the must-read tech stories, news, and tutorials 🗞️
Follow us on Twitter 🐦 and Facebook 👥 and Instagram 📷 and join our Facebook and Linkedin Groups 💬