The rise of NoSQL has been meteoric. A year or two ago, NoSQL was an esoteric topic—an unconventional technology of interest primarily to a small community of software architects working on the very largest social networking applications. Today, nearly everyone building a cloud- or Web-based application has heard of NoSQL DBs. A growing number of companies are investing heavily in NoSQL development, and job openings for programmers with NoSQL experience have skyrocketed. In a remarkably short time, NoSQL has gone from being an architectural curiosity to a mainstream component in enterprise application architectures.
Now that enterprise IT organizations are launching NoSQL applications, they are discovering new gaps in their Application Performance Management (APM) toolset. Traditional APM tools are blind to data cache activity, so they’re incapable of comprehensively monitoring NoSQL applications. IT organizations find themselves without any means visualizing interactions between the NoSQL DB tier and other tiers of the application stack (such as the operating system tier, the application server tier, the Web server tier, and the database tier). Engineers have difficulty optimizing something they can’t see. When problems occur and root-cause analysis is needed, this lack of visibility can lead to finger pointing among owners of the various application tiers.
To optimize NoSQL applications, enterprises need a new generation of APM tools. Developers and operations engineers need to see not only what’s happening in the NoSQL data tier; they also need to see how the NoSQL tier is interacting with other tiers in the application. In addition, the next generation of APM tools needs to be both enterprise- and cloud-capable, managing resources both inside and outside the corporate firewall.
Like application architectures themselves, APM solutions must expand to account for NoSQL.
What is NoSQL?
Initially, NoSQL meant “No SQL needed.” Early adopters of this style of data caching made bold predictions that NoSQL would ultimately replace the relational database in the application stack. Over the past six to nine months, saner heads have prevailed, and today NoSQL is usually interpreted to mean “Not Only SQL.” (There are some excellent sites covering the history of NoSQL. Start at http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.html, and then review http://nosql-database.org/links.html for an overview of NoSQL’s history.)
After all, relational databases with SQL syntaxes are very useful things. Most enterprise applications can benefit greatly from a transactional, ACID-compliant data store, such as a traditional SQL database. (ACID stands for atomicity, consistency, isolation, and durability—four properties that make a database transaction reliable.) At the same time, architects and developers can build elasticity and boost performance in both new and existing applications by using “in memory” technologies for information that only need BASE (Basically Available, Soft State, Eventually Consistent) functionality. These technologies allow for virtually unlimited scaling of applications, which is critical for new cloud- or Web-based applications. For most enterprise development teams, it’s not a question of SQL or NoSQL, but rather a question of where to deploy each technology to optimize an application’s performance, scalability, and reliability.
Hybrid solutions that combine some BASE and ACID properties in a single solution have already started to emerge. Facebook, for example, uses Memcached and the NoSQL solution they created that is now Apache Cassandra, but they also continue to use MySQL DBs, which they tune to achieve some very impressive results. (See http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html. For an overview of Facebook’s data architecture, see http://www.developer.com/features/article.php/3894566/Inside-Facebooks-Open-Source-Infrastructure.htm.) In addition, there are different “types” of NoSQL data schemas, from name/value pair to document-based storage. Our favorite definition comes from http://www.nosql-database.org, which defines NoSQL as “Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.” If you peruse the list of NoSQL types on that site, you’ll see it includes technologies such as Data Caching Technologies and Grid Technologies, in addition to technologies that fit the “classic” NoSQL definition.
The Rise of NoSQL
The rapid adoption of NoSQL technologies could not have occurred without open source software and open source software distribution. The NoSQL movement began when some of the largest commercial Web organizations (specifically, Amazon, Facebook, and Google) developed solutions for distributing (“sharding”) and displaying information in a scalable way for millions of users. These commercial organizations then generously contributed these data-management solutions to the open source community for anyone to use.
For example, Facebook developed its own data caching technology to manage the data problems associated with millions of users accessing inboxes. Facebook later open sourced this technology and donated it to the Apache Foundation. The technology was later released as Apache Cassandra.
The donations of technologies like Cassandra set off a flurry of activity within the development community. Around the world, developers began experimenting with these technologies, testing them, comparing them, and enhancing them. Over a dozen companies have now been funded to support and expand the capabilities around the different NoSQL technologies. And while early critics predicted that few organizations other than social networks would need BASE data caches, today companies in fields such as financial services, telecomm and e-commerce are investing heavily in NoSQL DBs.
There are three reasons for the rise of NoSQL. First, “Internet-scale” social media applications forced developers to find ways of managing the unprecedented amounts of data these applications require. After all, when you have a user community of over 500 million users, as Facebook does, or you serve 18 million visitors per month, as Digg does, you need to solve data-management problems on a vast scale.
Second, in the past decade, consumer technologies (social media networks, micro-blogging, and mobile video) have blazed trails for business technologies to follow. As consumers and business users come to embrace social media sites and social media-style interfaces, businesses have begun building social-media-scale applications themselves. When they do, they find they need to solve the same data-management problems that “non-business” sites like Facebook did.
Finally, the data solutions developed by Facebook and other social media sites are useful for many vast data management problems. Atomic physicists,4 clothing retailers, banks, and healthcare communication networks can all benefit from the scalable data management capabilities of NoSQL. (For example, scientists at CERN are using CouchDB. See: http://www.readwriteweb.com/enterprise/2010/08/lhc-couchdb.php.)
The main driver for NoSQL has been the realization that while ACID properties are certainly important (and not going away), many applications simply do not need ACID properties. These applications can be written or retooled to manage data in a simpler, more elastic way with NoSQL.
Hence, the evolution of NoSQL into a new tier in the application stack, as represented below:
The Need for Application Performance Management for NoSQL
As with any new development paradigm, management and monitoring challenges abound in the NoSQL space. Front and center: there’s a lack of performance management tools for NoSQL technologies.
The lack of NoSQL management tools is similar to the lack of tools that affected Java Application Servers when they first came to market in the mid-1990’s. While it seems quaint now to imagine Java applications without robust monitoring and management capabilities, Java applications lacked such capabilities for several years. Early adopters of enterprise Java applications had no way to monitor and view aspects of application internals. A new generation of monitoring tools had to be developed to meet the evolving needs of enterprise applications.
Fast forward to 2009. Evident Software introduced a monitoring and managing solution, Evident ClearStone, mid-2009 to address the challenges around managing Data Caching Platforms (DCP), particularly Oracle Coherence. These platforms have become essential infrastructure for many large financial services and ecommerce applications, such as the Shopzilla/BizRate comparison-shopping and rating site, which provides comparison-shopping data to over 40 million users each month and answers over 10,000 queries per second.
Using ClearStone, data center engineers and DevOps teams like the team at Shopzilla can monitor and optimize applications that use data-caching platforms to rapidly manage vast amounts of data. The visibility and analytical insight provided by ClearStone is invaluable. As Phil Dixon, Shopzilla’s CIO, told an audience at the 2009 O’Reilly Velocity Conference: “Through Evident ClearStone, Shopzilla can measure and predict trends across thousands of data points in both our production and pre-production staging environments to the point where we may not consider software releases successful unless ClearStone agrees.”
By mid-2010, there was enough noise in the market to justify expanding our ClearStone solution to support NoSQL. In September 2010, support for Cassandra and Memcached were added. This new functionality included the ability to collect metrics from the NoSQL tier, aggregate statistics and results, and present them using ClearStone’s rich visual interface. In addition, work has begun to focus on correlation – the ability to pull metrics and events from multiple tiers of the application development stack and correlate results to aid in determining root cause of application issues and performance problems.

Evident ClearStone reporting the I/O status of Memcached nodes along with memory statistics for the nodes' slabs.
This capability cannot be overlooked. As application continue to get broader and more diverse (most particularly the explosion of cloud-based applications), it gets harder to visualize a singular view of an application. An application can live on several diverse hosts, and different virtual machines. A problem at the NoSQL tier (for example) may manifest itself at the system level, through network bottlenecks, or within the application itself. These correlations need to be determined and reported on. So, not only collection of the NoSQL information, but the intelligent analysis of this information along with the rest of the application stack is required to provide a holistic view of a running application.
To read the rest of this white paper, please click here to download the PDF.


