Articles tagged Neo4j

The views expressed in this blog are strictly personal, and do not necessarily represent the views of Evident Software.

By John Bennett

GIGAOM STRUCTURE CONFERENCE AND NEW YORK, NY, March 23, 2011 — Evident Software, the leading provider of Application Performance Management (APM) for NOSQL, Java, and more, today announced that it has formed strategic partnerships with four advanced technology companies: Terracotta, EsperTech, Neo Technology, and Cirrus Technologies. Terracotta has selected Evident Software as a development partner for projects that include the Quartz Job Scheduler. Evident’s new software release, ClearStone 5.0, features the Esper Complex Event Processor from EsperTech and the Neo4j graph database from Neo Technology.

ClearStone 5.0 also features a PostgreSQL Management Pack developed by Cirrus Technologies for monitoring and managing PostgreSQL databases. Evident announced the new ClearStone release today at the GigaOm Structure Big Data Conference in New York City.

Terracotta and Evident Software

Terracotta is the leading provider of snap-in performance and scale for enterprise Java applications. The company selected Evident as a development partner.

Initially, Terracotta and Evident are working together on Quartz Job Scheduler. Quartz is a full-featured, open source job scheduling service that can be integrated with, or used alongside virtually any Java EE or Java SE application—from the smallest stand-alone application to the largest e-commerce system. Quartz can be used to create simple or complex schedules for executing tens, hundreds, or even tens of thousands of jobs.

“Evident Software has a proven track record of delivering visually rich monitoring and management solutions for enterprise applications, so they were an obvious choice for us when we were looking for a strategic development partner,” said Mike Allen, Head of Product Management for Terracotta.

EsperTech and the Esper Complex Event Processing Engine

Evident Software re-architected its ClearStone APM platform in the ClearStone 5.0 release, but kept one key component from earlier versions the software: the Esper Complex Event Processing (CEP) Engine. ClearStone 5.0 uses the Esper CEP Engine to aggregate data from a potentially ever-changing collection of resources, perform computations on metrics, build queries dynamically, and monitor resources for thresholds violations.

“Esper has provided essential capabilities to our ClearStone platform for several years now,” said Scott Barnett, CEO of Evident Software. “We began using the free, open source version of Esper in 2007 and soon after licensed their supported commercial version. We’ve always been thrilled with the company’s technology and support. We look forward to continuing our partnership with Esper as we extend the ClearStone platform to cover additional enterprise application components.”

“We are pleased that Evident Software is seeing Esper as a key enabler to its value offering. EsperTech will continue to advance the state-of-art in CEP applied to the domain of APM and to other domains,” said Thomas Bernhardt, CTO of EsperTech.

Neo Technology and the Neo4j Graph Database

Once data moves through the Esper Engine in ClearStone 5.0, ClearStone write it to a Neo4j graph database, which serves as an inventory database for all resources being monitored (e.g., processes, hosts, nodes, and clusters). ClearStone uses Neo4j to dynamically discover and map relationships among resources (e.g., a node running on a host), and to maintain a timeline of events related to each resource.

“Because the relationships among resources may be in flux and cannot be known ahead of time, we needed a flexible data store such as a graph database for this function,” said Barnett. “Neo4j turned out to be ideal for storing metadata for the resources and mapping relationships and correlated events.”

“ClearStone 5.0 demonstrates the type of high-performance, data-rich solutions that can be built on top of Neo4j,” said Emil Eifrem, CEO of Neo Technology. “We’re pleased that Evident chose Neo4j for their state-of-the-art APM platform.”

Cirrus Technologies and the PostgreSQL Management Pack

Jim Mlodgenski, CEO of Cirrus Technologies and former Chief Architect at EnterpriseDB, used ClearStone 5.0’s new Open Data Interface (ODI) to create a ClearStone Management Pack that enables ClearStone 5.0 to collect and correlate metrics and events from PostgreSQL databases. ClearStone correlates these metrics and events with those from other application components, such as MongoDB, a Cassandra data cache, or a jBoss application server, giving developers and data center operations staff a multi-tier view of a Postgres application.

“I was impressed by the architecture and open interface of the ClearStone 5.0 Platform,” said Mlodgenski. “Working from Evident’s ODI documentation, I was able to quickly create a Management Pack that collects PostgreSQL metrics and events through ClearStone’s RESTful interface. Now developers, DBAs, and others can take advantage of the rich visual interface of ClearStone to monitor applications built on PostgreSQL. Instead of relying on home-grown tools, PostgreSQL users can leverage existing open source scripts and take advantage of a much richer monitoring solution that shows not just PostgreSQL metrics but also interactions between PostgreSQL and other tiers in the application stack.”

The PostgreSQL Management Pack is available as an extension to the ClearStone APM Platform.
ClearStone is available for download on the Evident Software Web site, www.evidentsoftware.com.

About Evident Software, Inc.

Evident Software delivers the first comprehensive application performance management solution for NOSQL, Java applications and more. The company’s ClearStone platform enables developers and operations personnel to monitor, manage, and optimize business-critical and Internet-scale applications. Evident’s solutions are installed in the financial, SaaS/cloud, e-commerce, government and IT services industries. The company is based in Newark, N.J. with research and development facilities in Asbury Park, N.J.

###

Evident ClearStone is a registered trademark of Evident Software. All other trade names are the property of their respective owners.

For media inquiries, please contact:

John Bennett (for Evident Software)
john@bennettstrategy.com
+1-510-495-6590

By Bill Nigh

What does a traditional RDBMS programmer or architect need to understand to be productive with NoSQL (Not-only SQL technologies) and DCP (data caching platforms)?

I asked this question of our development team.

Here’s their list of things to know:

  1. Understand how ACID compares with BASE (Basically Available, Soft-state, Eventually Consistent)
  2. Understand persistence vs non-persistence, i.e., some NoSQL technologies are entirely in-memory data stores
  3. Recognize there are entirely different data models from traditional normalized tabular formats: Columnar (Cassandra) vs key/value (Memcached) vs document-oriented (CouchDB)  vs graph oriented (Neo4j)
  4. Be ready to deal with no standard interface like JDBC/ODBC or standarized query language like SQL; every NoSQL tool has a different interface
  5. Architects: rewire your brain to the fact that web-scale/large-scale NoSQL systems are distributed across dozens to hundreds of servers and networks as opposed to a shared database system
  6. Get used to the possibly uncomfortable realization that you won’t know where data lives (most of the time)
  7. Get used to the fact that data may not always be consistent; ‘eventually consistent’ is one of the key elements of the BASE model (I see this latency issue all the time in Twitter, in ‘Followers’ list)
  8. Get used to the fact that data may not always be available
  9. Understand that some solutions are partition-tolerant and some are not

These attributes vary from one system to another. It’s as important to understand the differences among NoSQL technologies as it is important to understand how they differ from a traditional RDBMS.

Here is a pretty good list of the many NoSQL products, from a respected member of the community, Alex Popescu.

Learn more about our performance monitoring solution for Java, NoSQL and web servers.

By Bill Nigh

NoSQL DB logging and reporting are challenges, for reasons discussed in this post. I recently spoke with Evident Software veteran Don Jeffery (@drjmun on Twitter) about those challenges and how Evident ClearStone (ECS) addresses NoSQL DB logging and reporting.

Metrics Collection

ECS collection (using JMX and ODI, our RESTful API over HTTP) creates Neo4j graph nodes from harvested and derived data in the form of resource metrics, resources, relationships and events. A unique identifier is generated for each Neo4j graph node, regardless of type. This identifier can be used to retrieve information from Apache Cassandra 0.7, which we employ as a time-series database, thus supporting current and historical performance monitoring visualizations within customizable perspectives. This product design allows ECS to chart performance metrics and display events such as threshold violations. (A nice writeup of our use of Neo4j and Cassandra was done in a blog post by our CTO, Ivan Ho, and got good play on DZone).

In ClearStone, a Neo4j node represents an instance of any entity we choose to track. The Neo4j node may have attributes that relate to, say, a host, such as an IP address, or data that somehow through heuristics lets us calculate how many processors may be on a piece of hardware. Other resources, possibly from other technologies, may also have information that helps us confirm that new host entity, create an instance of it, and populate it opportunistically as more information comes in. The incredibly free form nature of nodes stored in Neo4j makes this an easy capability to support.

Any time there are events associated with a resource, we keep a timeline of such events married to a snapshot of the associated resource(s) in the inventory at the time of the event occurrence.

Challenges

In the realm of NoSQL logging and reporting, consider the problems involved in monitoring a dynamic distributed environment with tools that are usually specific to a single technology. As Don put it, “what we need to understand is that the virtual and physical resources these products run on often overlap. At the very least there are server farms and networks that are shared”. NoSQL logging and reporting tools need to be able to identify patterns and relationships. They need an “elastic cross-technology solution that gets information on how [the technologies] impinge on one another in a common fabric”.

Another issue: in an environment where nodes are coming and going, a monitoring tool has to keep track of which nodes are current and which are not. As Don said “If we don’t get a report from a node, does that mean it’s just offline, or it has a problem? We also have to know at any given time in the history of our collection what nodes are available. Sampling a number of times helps you get a picture”. Sufficient samples over time can help ascertain whether a node’s state fluctuates a lot, with the caveat that maybe one can never be completely certain of even that. Maybe one rule could be that if a node is always ‘on’, and we get no reports on it for a [fill in the time frame], then we can conclude it has a problem; I think you see the challenge here.

Opportunities

Don understands the challenge and opportunity of NoSQL logging and reporting well; he says that “keeping the best snapshot” of a monitored resource is what we are striving to do with ECS, trying to “identify principal players” that a customer installation consists of, be they caches, nodes in a cluster, whatever they may be, in what is called our inventory. We can’t simply rely on current state, nor rely on history, but rather a combination of the two; “so that’s some of the stuff we’ve been looking at. If we can usefully compare what is in inventory now to what was there in the past, we’ll discover things we hadn’t even thought about, such as usage patterns and virtual and physical resources in that environment. I’m not sure we’ll be initially able to assess causality, but I think we can establish a footprint and allow the user to be able to explore and draw conclusions; I think we can give them the basis for that information.”

“It’s challenging to pinpoint cause and effect. For example, it’s difficult to determine that your publisher success rate is low because your CPU is maxed out; maybe your CPU is maxed out because you are attempting to do so much publishing that you’ve saturated that machine, leading to a low success rate; we can at least give them hints. We can also begin, with ECS 5.0, to give them projections, maybe presented graphically and in a number of visual perspectives; maybe an incident matrix.”

Regardless of what we deliver, Don says that “we want to give them something navigable, so they can begin to see where things are ‘lighting up’, then move distances away. So if a Cassandra cache was causing problems I could inspect the host. Oh, and now that I’m at that host, I can see there’s some other technology on there that’s beginning to have a lot of events.” Maybe this second technology is the actual root cause of the problem that first surfaced in the original monitored resource, a technology that is possible in a different tier, of a different NoSQL or caching technology, or maybe a servlet… you get the picture.

Don says “Being able to provide those hints and that navigation becomes even more important as the size and scale of these systems becomes such an issue that it becomes really difficult to monitor and manage the new environment without some event information or other heuristics that we’ve applied, a view that limits the scope of what they’re seeing to a space that we believe is related to problems that can explore causality. So, that’s one of the things we want to introduce in 5.0, some interesting visualizations, to help them navigate around.”

Weaving powerful semantics among tiers and domains based on an ever growing a better understood inventory of resources will provide a platform for discovering interrelationships whose understanding can well serve both root cause analysis and enforcement of SLA’s. This is the future of Evident ClearStone, as well as its present.

Learn more about our performance monitoring solution for Java, NoSQL and web servers

By Ivan Ho

As an Application Performance Management tool, Evident ClearStone collects and receives real-time data from disparate applications and systems in a distributed application environment. 90% of the data that ClearStone manages is the performance data from the application processes and systems. The other 10% are related to the inventory and events of the managed environment.

ClearStone 5 required an elastic data store that can handle all the scalability, performance, persistence, and flexibility requirements demanded of this class of product. The demands placed on this elastic data store ruled out the use of a traditional RDBMS backend.   The solution must have options for load balancing and high availability – being able to distribute and replicate data was essential. It was obvious that a NoSQL DB would come into play.

For example, in a single application environment with multiple servers, ClearStone would monitor system level metrics (CPU, disk, network, I/O) , application container performance, Java platform measurements, a distributed Cassandra cluster or other NoSQL clusters, and a database like PostgresDB.  The schemas of the performance data vary significantly from one type of resource to another, therefore we also needed a solution that would provide  flexibility in dynamic schema creation and updates while keeping the system running 24×7. At time when this was being developed, there was no one data caching, database system, or NoSQL solution that met all of these requirements. Ultimately, the Evident architects decided on using two NoSQL solutions, Apache Cassandra and Neo4j.

Cassandra is implemented as a time-series data store for storing all the real-time data and historical data. Our implementation uses Apache Cassandra 0.7 with the Hector client APIs for Cassandra. With Cassandra 0.7, we can dynamically create and evolve column families for storing all the performance data. The performance data is normalized by metric. We have also partitioned our column families based on the granularity of the data sets.

Neo4j is implemented as an inventory database used for maintaining all the managed resources (i.e. processes, hosts, clusters, etc.) of the application environment. It is used to store current state of all the resources, relationships among the resources, and correlated events to these resources. Anytime there are events associated with a resource, we keep a timeline of such events married to a snapshot of the associated resource(s) in the inventory at the time of the event occurrence. We felt the use of a graph database like Neo4j was ideal for storing metadata for the resources and mapping relationships and correlated events.

Beyond the performance and scalability benefits, we found that there were also some fringe benefits to each of these products. Both of these NoSQL systems have the options of being embedded into ClearStone or run as isolated clusters managed by ECS across multiple servers. Lastly both of these two products are cloud-friendly, therefore enabling ClearStone to be deployed in both traditional enterprise environment and cloud environments.

If you would like to learn more about our experience with either one of these NoSQL solutions, please come visit our Support Section and post your questions.

If you would like to try ClearStone, please join our Beta Program here.

By Bill Nigh

One of the major design decisions going into ClearStone 5.0 was the selection of Neo4j, an open source graph DB, as the data store for the most recently collected data.

Neo4j allows designers to define relationships between entities stored as nodes, and edges between nodes. Neo4j has a reference node, which serves as the starting point for navigation through the structures. Structures are not limited to hierarchical topology. Arbitrary relationships can be defined and discovered.

Neo4j allows ClearStone to store events in association with entities that we consider resources, and more. A resource is any entity that we collect information on that can be qualified by properties. Neo4J allows ClearStone to capture more sophisticated correlations. ClearStone users can then ‘walk around’ and find second- and third-order relationships. Neo4j lets us express the relationships in a very convenient way. The DB supports a number of query backends. The one we’re using is Apache Lucene (a Java library for building search applications).

Neo4j is good for us not only because of relationships that we already expect in the data, such as when a threshold is violated, but also because we can apply other relationships, which could be generated through heuristics, to the data.

Relationships are not only free-form, but a relationship can itself have properties, thus allowing more sophisticated forms of correlation. “We’re just beginning” to work in this area, one of our developers pointed out. Look for even more advanced, multi-dimensional monitoring capabilities in future releases of ClearStone.

The bottom line: By using Neo4j, ClearStone 5.0 is able to correlate application events with a very rich set of metrics.

One implication is the ability to traverse from one resource to another within the UI. At present, one can already see charts from various parts of the multi-tier enterprise ecology, but being able to ‘follow your nose’ for this type of analysis will echo the natural curiosity of an interested troubleshooter.

Ivan Ho, our CTO, gave an example of one of the use cases that has been part of the design of ClearStone: Suppose you’re interested in a particular process and you want to know where it’s running. When you inspect the host, you might want to see what other resources the host depends on—a database for example—or what other processes are also running on the host, or you might want to assess the host’s health overall.

Using ClearStone’s ability to traverse data (stored in Neo4j), you could traverse the processes running on the host, and make an assessment as the cause of the host’s poor health. This type of rich inspection requires the ability to explore different visualizations of relationships—and Neo4J makes this type of exploration eminently feasible.

Interested in ClearStone 5.0? Sign up for the Beta program here.