In the first two blog posts (part 1 and part 2) we gave a couple of pointers about the settings which can be found in the driver and which you can leverage to improve the performance of application to Cassandra communication.Most of the times, this is enough to improve the performance significantly. However, we had a really tight SLA which made us go the extra mile. We needed to provide guarantees that 99.999% of read requests will perform below a certain threshold while we had both Cassandra cluster and application deployed on AWS infrastructure. When working with percentile requirements defined like this, even the smallest glitch can hurt your SLA. This was the main reason why we needed to work hard both on the Cassandra cluster side and the application side.
In order to provide full context, we need to explain the use case better. We have a heavy write and low latency read use case. The performance of our writes is not that important. Consistency is not an issue, we can write with consistency level ONE to get response fast. Reads need to be fast, our SLA says that we are aiming for five nines (99.999%) to be under 100ms and four nines (99.99%) to be under 50ms. The whole infrastructure is deployed to AWS as multi-DC deployment. We are using EBS volumes to have persistence across reboots and snapshotting possibility. This unlocks the ease of maintenance and operation but it comes with a price, that is, EBS volumes have disk latencies which can hurt your performance.
Separation of read and write session
As explained previously our write latency is not that important, while it is really important to satisfy the SLA constraint regarding reads. We needed to solve the problem of EBS volume disk latencies so the first thing we looked at was load balancing. Since we have multi DC as the base load balancing strategy we use theDCAwareRoundRobinPolicy policy. If we wrap it inTokenAwarePolicy, disk latency on certain nodes will still hurt us. We can land on a node which is holding replica but that node might have disk latency and all requests served from that node will have this latency added to request. We needed a way to avoid the nodes which had latency and we needed the best configuration for the application to figure that out quickly. We have an application that uses the same cluster session to read and write at the same time. Our SLA has clear requirements for reads only, and it would be much harder to tune that cluster session since reads and writes have different patterns and different requirements.
We decided to separate the sessions and run with one session for reads and one for writes. This resulted in some more connections to cluster but the benefits and flexibility that it brought compensated the amount of connections. We also put everything in place up front so we could monitor the connections, and this measurement proved that cluster could handle it. Now that we have a clear separation between reads and writes we can choose the best load balancing for both sessions. Also we can tune reads without thinking of their impact to writes. If you decide to go this way, it is of utmost importance to monitor the resources that both sessions are competing for, such as connections, thread pools etc.
Tuning reads or how to have the lowest impact of EBS disk latency
We have already mentioned the problem of disk latency. We have a few options here how to solve it: speculative executions with low threshold (request will timeout after threshold and go to the next node if the queried node has disk latency) and latency-aware load balancing policy (it keeps the HDRHistogram of node performance and can be tuned to avoid slow nodes). We wanted to try both and compare. First we tried speculative executions, which sounded clear, we set a threshold which fell into our SLA (30ms), we provided the maximum number of attempts (3 since we had 3 replicas) and we let it run. We calculated that if we had 2 timeouts we would still receive read in 90ms which fell under our SLA (99.999% reads to be under 100ms).
However, the price of having that many requests on cluster was something we could not handle with our cluster size. Thread pool queues had a lot of pending requests, monitoring on cluster side showed that limits for native transport could not handle this many requests. So we decided to try latency-aware policy.
When you browse the Internet for latency-aware policy, there are a lot of resources mentioning that it is a good idea but hard to tune. It is even said that the benefits you get in comparison with token aware policy are not worth the effort. However it seemed like the perfect solution to our problem. So we decided to give it a try. The initial run was with default parameters. Basically it would measure latency over time and make a decision based on the best performing node. We had 3 replicas and we wrappedTokenAwarePolicy in LatencyAwarePolicy which effectively meant that, out of 3 replicas, load balancing would choose fastest one. Most of the time it was the replica from the same AZ as the application which was doing the query but not necessarily. We noticed an increase in performance really fast. Most of the time we were meeting SLA except when we hit a node which did not have performance issues before, but just started to have issues. The good news is that latency-aware lets you tune sensitivity so you can detect the slow nodes faster. We tuned 3 parameters; first we lowered down the exclusion threshold from 2 to 1.2 (this was effectively telling load balancing to exclude nodes which were twice as bad as fast performing nodes but we lowered this to 1.2 since we wanted to have smaller margins and exclude faster, so after the change we were excluding nodes which were 20% worse than the fastest performing one). The second thing we changed was scale, where the default was 100ms which meant that fresh latencies inside 100ms would be more significant than measurements outside of 100ms from now. We wanted to exclude nodes faster and we needed a lower number here, so that fresh measurements could be more significant than older ones. We decided to lower this to 25ms. As the last thing, we wanted to change the retry period. There was a risk of excluding a lot of nodes with this aggressive tuning so we wanted to return the nodes faster. The default was 10 seconds and we lowered that by half and setting this to 5 seconds meant that each penalized node would be retried after 5 seconds again.
With those settings we got exactly what we wanted, read session had load balancing policy which escaped nodes fast after latency on EBS volume started to hit and it made the best effort to return nodes in query list as soon as the issues were over.
latencyAwarePolicyOptions: exclusionThreshold: 1.2 scaleInMilliseconds: 25 retryPeriodInSeconds: 5 updateRateInMilliseconds: 100 minimumMeasurements: 50 LatencyAwarePolicy.builder(new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder()).build()))
Fail early with socket timeout
When we think of reads and what is viable to our use case, we can conclude that those are reads in accordance with SLA. We do not have the value of reads stuck for a long time, creating load on cluster, and which are outside of our SLA. Client application communicating with our application has firmly set the threshold and all requests above that threshold get discarded. That’s why we wanted to do the same, discard requests above a certain threshold and ease up the load on cluster. Socket timeout option on the driver provided us that exact option. The details are explained intuning datastax driver blog part 1 but the main thing is that you set a timeout value in milliseconds for the driver and all queries above that threshold will timeout. It is up to you which retry policy you will choose and which makes the most sense for your use case. We have set timeout to 100ms so we can timeout all the queries above that value.
With this tuning walkthrough, we wanted to emphasize the importance of understanding your use case. There is no silver bullet, no guide how to tune DataStax java driver for Apache Cassandra. There are some better practices for sure but you must truly understand your use case to make some decisions. For us it made sense to discard queries above 100ms since we had already lost money but maybe in some other use case we would add retry policy which would give read a couple of more attempts before giving up. An important giveaway from this blog post would be exactly that, make sure you fully understand your use case and the reason for it before you start making changes to the driver and Cassandra settings.