Course details
In order to scale out to large datasets and large clusters, Kudu splits tables into smaller units called tablets. This splitting can be configured on a per-table basis to be based on hashing, range partitioning, or a combination thereof. This allows the operator to easily trade off between parallelism for analytic workloads and high concurrency for more online ones.
In order to keep your data safe and available at all times, Kudu uses the Raft consensus algorithm to replicate all operations for a given tablet. Raft, like Paxos, ensures that every write is persisted by at least two nodes before responding to the client request, ensuring that no data is ever lost due to a machine failure. When machines do fail, replicas reconfigure themselves within a few seconds to maintain extremely high system availability.
The use of majority consensus provides very low tail latencies even when some nodes may be stressed by concurrent workloads such as Spark jobs or heavy Impala queries. But unlike eventually consistent systems, Raft consensus ensures that all replicas will come to agreement around the state of the data, and by using a combination of logical and physical clocks, Kudu can offer strict snapshot consistency to clients that demand it.
Updated on 22 March, 2018- JavaScript Full stack web developer virtual internship Virtual Bootcamp + Internship at LaimoonAED 1,449Duration: Upto 30 Hours
- Big Data Hadoop: SQL & NoSQL Skill-UpUSD 14Duration: Upto 23 Hours
- USD 19
USD 95Duration: 28 Hours