NoSQL DB Logging and Reporting Challenges

By Bill Nigh

NoSQL DB logging and reporting are challenging for a number of reasons:

  1. NoSQL is relatively new, so there are not a lot of experienced practitioners of the new technology, which most now understand to comprehend a wide variation of architectures; one definition of ‘NoSQL expert’ might be simply ’one who has product knowledge of one or more products’, as they are so different.
  2. Unlike RDBMS legacy technology, NoSQL DB logging and reporting cannot benefit from an ANSI standard common application and system query capability; while the RDBMS model exposes excellent system metrics via virtual ‘system’ tables that can be queried with the rich SQL language, this is not the case with NoSQL
  3. Single-point solutions are most likely the case, enabling a proliferation of monitoring and reporting consoles
  4. Entities participating in a cluster typically have separate logs on separate hosts in separate contexts, causing log correlation problems; for example cluster rebalancing, as found with Coherence, introduces load onto other nodes in that cluster, creating a ‘pathological’ situation and flurry of log entries that are not easy to interpret
  5. Collection methods can impact performance; an agent coresident in server memory with a NoSQL (or any) app, will impinge on the performance of the very process you are trying to monitor
  6. NoSQL is often part of a much larger system; e.g. Hadoop was a key part, but only part, of IBM Watson;  however, NoSQL logging and reporting tools are typically silos.
  7. The technologies mesh together in synergies hard to foresee and interactions that may not be anticipated; the performance implications of  all combinations of NoSQL running in the enterprise have not yet been documented, but a typical use case spans multiple NoSQL technologies
  8. A holistic view of NoSQL performance monitoring data is lacking, as each technology is associated with different metrics and (typically) different point solutions.
  9. There is so much data that one has to apply heuristics to filter out the valuable information, to avoid ‘drinking from a fire hose’
  10. Reporting such as found in JConsoleLINK does not persist report data for useful enough timeframes, so seeing trends and doing capacity planning are difficult
  11. The elastic nature of the environment means that as resources are deployed in response to increased demand, nodes will be coming and going, with life cycles that are not easy to predict; logging and reporting tools need to somehow account for the fact that a node may not report within a given time frame; does that mean it is just offline or it has a problem?

Bernd Harzog, writing recently in The Virtualization Practice, has furnished a very useful view of the state of the art and state of the industry here. It is well worth the time to read it (and I am proud to point out that he mentions Evident Software as an example of a company that is well positioned to make contributions toward addressing these challenges).

If you are interested in this subject, you may also want to read this post on challenges and opportunities in NoSQL DB logging and reporting.

Learn more about our performance monitoring solution for Java, NoSQL and web servers

Categories: General
Date: February 27th, 2011
blog comments powered by Disqus