Articles tagged nosql performance comparison

The views expressed in this blog are strictly personal, and do not necessarily represent the views of Evident Software.

By Bill Nigh

What special challenges are involved in pursuing useful NoSQL DB performance comparisons? Some of the challenges are: single point solutions, lack of a common query language/API to integrate metrics from different tiers into a common view, the heterogeneous nature of the technology, hard to predict lifecycles for objects such as nodes and caches, and other challenges, outlined here.

One approach to NoSQL DB performance comparison is to do time-on-time comparisons of single or multiple metrics with charts representing different monitored resources.  Don Jeffery puts it this way: “Depending on what your objective is when you are benchmarking, one of the things that you might do is to take a quiescent system, one that’s spun up but not incurring any load, and introduce some load to it over a period of time. For example, in Coherence, you establish the cluster, introduce, say, eight nodes, maybe eight JVMs over four loads, with caches introduced, but no clients.”

“You would monitor this quiescent system and inspect certain key metrics and expect them to be relatively flat. You’d then introduce loads and keep track of the times and loads introduced.” (Since we store history as time series data in Cassandra, getting this information for potentially long stretches of time is not an issue.)

Don continues: “If you keep track of the time and the load, then you can introduce time-on-time comparisons where you look at a number of key metrics. So now I have a benchmark; what I might want to do is vary that load in a certain way over a different time period, and perhaps compare against the initial benchmark and the previous set of measurements, to see if I can establish any kind of a trend. For example, I’ve attempted to increase the GET burden against the cache by 50%; does that translate into an equivalent translation of certain key metrics, or is there no linear relationship?”

We are about to introduce time on time measurements and thereby support this type of analysis, with some of the visualizatons we are planning in future releases of ClearStone. Multiple perspectives in the ClearStone real time dashboard, shared and customized, already support the presentation of charts from various resources in a free-form and easy to create manner.

Performance comparisons of different NoSQL DB technologies may be possible, Don said, if they are similar, say, two data grid technologies, with a deliberate and documented use of load and benchmarks. “You might look at response times, for example, or GETs, or look at cache hits.” This could enable a useful technology choice in situations where you know what kind, what size and what elasticity you anticipate in your data.

“Take the example of ehcache performance. The first consideration is to define what the key metrics are. One way to do that even with our system today is to establish reasonable thresholds and to use our thresholding policy tools to set those up and capture them and create events. That way we see if we violate any of those thresholds. We start with a quiescent system and introduce a load over an hour; during that time, ECS is running, so our collectors are doing their job. Line charts we assemble are being annotated with the events we’ve defined at the points of transition to the threshold.”

“We can do this with our product today, but not as conveniently as with 5.0 where we could use a single chart, we can do another test run and vary the load. For a data caching product, for example, we introduce additional caches or perhaps retune the network or cluster and start it up again with larger or smaller Java heap sizes or any number of other parameters we want to investigate; then we look at the behavior; then we run another load test; or perhaps same load, but vary the heap sizes. What we can do is to create a perspective in the real time dashboard; create two charts and compare them by using a metric that correlates the two time periods. What we hope to do soon is to have this comparison appear in the same chart. While we can do a better job of integrating the visualization, we can support those use cases today with two separate charts that are visually aligned in such as way as to allow comparison.”

Learn more about our performance monitoring solution for Java, NoSQL and web servers.

By Bill Nigh

SQL Performance Monitoring has a well understood model underlying some very mature monitoring and management tools. The RDBMS standard, of which SQL is a part, has a number of strengths that have earned it its central place in so many enterprises, features such as triggers, views, stored procedures, and declarative referential integrity. The salient RDBMS feature for SQL performance monitoring, however, is the presence, in deference to the RDBMS standard, of a large set of system tables, variously called the CATalog in Oracle, or the SYS-prefixed family of system tables in Sybase and MS SQL Server. The Relational Model, promulgated and championed by E.F.Codd, specified that a compliant RDBMS expose its internals to reporting using the same language and data model that was used for actual data. For this reason, even a bare bones command line interface to a relational database can, through SQL queries, harvest useful system data, thank Codd.

The performance monitoring area for NoSQL DBs differs in two major respects:

  • There are many products that come under that ‘elastic’ label; the Wikipedia article on NoSQL lists twelve types in its taxonomy (!)
  • As the name suggests, NoSQL has no SQL, no powerful and more importantly, standard language that can be used to query the system for performance metrics or other useful information about the resources found within the RDBMS, such as table size, number of indexes, percentage full of logical devices, and so forth.

NoSQL performance monitoring has to rely on tools that use an API, when present. Some NoSQL products have associated query languages, but, again, there is no common well known and well documented query language for metrics, putting a burden on organizations that have to monitor multiple NoSQL implementations.

This is where we come in.

Learn more about our performance monitoring solution for NoSQL, web apps and web servers: http://www.evidentsoftware.com