|
Evident ClearStone® Live (ECSL) is an application performance management platform for distributed applications that incorporates the use of distributed caching and NoSQL systems. ECSL can monitor and manage multiple tiers across an application stack. While monitoring the various application tiers (web, application server, caching, data storage, messaging, grid, system level, etc.), the product performs real-time aggregations, analysis, and correlations across various disparate data sets to provide a consolidate view of the application environment. ECSL allows users to setup real-time monitoring policies for any metric collected or produced by the system.
Users can visualize the performance data and events thru the Real-Time Dashboard. For example:
- Application activity (i.e. # of orders pending, total orders booked)
- Application performance (i.e. web services response times, processing times, etc.)
- Capacity (i.e. Available sessions, available worker threads, cache capacity, queue depth)
- JVM performance (i.e. CPU, heap utilization, garbage collection, etc.)
- Grid health, usage, and performance
Out of the box, the product supports remote JMX monitoring. Both standard applications and custom application that are JMX enabled can be monitored by ECSL. Customers can install various pre-built ECSL Management Packs for standard applications (i.e. application servers, caching products, grid products, etc.) or have the option to customize on themselves.
Java Management Extensions (JMX) provides a standard interface for managing and monitoring Java based applications, systems, devices, and service oriented infrastructures. Most commercial and open source Java applications and middleware platforms can be monitored and managed using standard Java Management Extensions (JMX). Middleware technologies expose performance metrics, operational control, configuration, and events via a JMX interface. Home-grown systems and web applications can also be instrumented to provide monitoring and management information via JMX.
The instrumentation involves creating Managed Beans (MBeans) that provide an interface for accessing attributes and operations for a managed resource. The MBeans are small code fragments that are implemented by a developer within an application. By performing this instrumentation in the application, the JMX framework will automatically make the resource available for monitoring or management via an integrated MBean Server.
The MBean Server itself is extremely lightweight and integrated as part of the Java Virtual Machine (JVM). By default all JVM are instrumented with default Platform MBeans which expose many statistics and operations to monitor and control the JVM. The server is easily managed and configured remotely using a number of different types of tools (JConsole, MC4J, open source products, custom JMX monitors, or commercial products like ECSL).
Since Java provides standard and open facilities for monitoring and managing an application, all enterprise Java developers should take advantage of JMX and instrument their applications with Managed Beans. Developers can track business level metrics, application activity statistics, performance metrics, application state, etc. in the MBeans. This alleviates building tools or proprietary interfaces to monitor or control internal application resources. Developers, architects, testers, and IT operations can leverage generic management tools to support the application. This allows third party management tools to monitor and manage the application using a standard interface.
ECSL is designed for high performance monitoring from multiple systems. The underlying staged event driven architecture is built for low latency data processing and scale. Most of the core server components is extensible; including management, event handling, and custom business logic. The information and data is delivered thru a rich internet application front-end based on Adobe Flex. The front end enables users to create custom dashboards with aggregated views of the business activity, application activity, and infrastructure utilization. The ECSL platform consists of the following:
- The Collection Framework consists of deploying the Evident Collector and optional Evident JMX Proxy Servlets. Each Collector collects data from multiple distributed Java processes. The Collector supports JMX Remote and HTTP collection via an optional Evident JMX Proxy Servlet for web-based Java applications. A single collector can be configured to collect from multiple JMX endpoints. The Collection Framework also provides facilities for Log Monitoring via Log4j appenders. Depending on the system managed, the collection frequency can vary from 5 seconds to 300 seconds. Collectors can be deployed remotely (close to the source of the instrumentation).
Note: Although the majority of the data collection is performed via JMX for Java applications, the ECSL Collector is highly extensible to support other interfaces. Each management pack will include additional configurations or interface extensions that enable the ECSL Collector to utilize alternative interfaces.
- The Collector(s) stream the collected raw data and events into the Operational Cache ("OpCache"). The OpCache is a low latency elastic distributed in-memory cache that is used to retain all the real-time data and events within ECSL.
- When new data is streamed into the OpCache by the Collectors, the data is immediately consumed by the Pipeline Engine. The Pipeline Engine is responsible for enriching, aggregating, correlating, and analyzing the real-time data. The Pipeline Server leverages a Complex Event Processing engine to perform the real-time analysis. As a result of the analysis, the engine will emit alerts and trigger external actions to automate, control, or configure the infrastructure or application. Thru the use of Event Handlers, customers can supply custom code to subscribe to event data to perform custom external operations. The processing of each dataset is entire configuration driven by information models and monitoring policies configured by users. As data is updated, aggregated, and removed, the Pipeline Engine stores the new data into the OpCache for presentment.
- Once the post-processed data is stored, the Data Services framework delivers real-time updates to multiple the user interfaces. If configured, the Data Services tier will persisted data to any JDBC compliant database, file system, or to the ECSL Data Warehouse for historical analysis.
- The real-time monitoring user interface is the known as the ECSL Real-Time Dashboard ("RTDashboard"). This is an Adobe Flex based user interface that enables users to access pre-built dashboards or create new visualizations and dashboard. Evident Software may provide specialize real-time dashboards for specific products (i.e. Oracle Coherence). For access to the historical data, customers would use the ECSL ChronoGraph Builder and Viewer to build and view customizable historical visualizations.
- The Collector, Pipeline Engine, Data Services, and RTDashboard depend on configuration metadata known as an Information Model. An information model consist a data dictionary that defines the structure of the data for collection, processing, storage, and presentment. The application logic on the data transformation is also part of the information model. Users can create new information models for other JMX datasets using the Information Model Builder.
The following architecture diagram illustrates how these components interact with each other.
JMX Data Collection
Managing real-time Java systems via Java Management Extension (JMX) requires the target Java application(s) to be instrumented with Managed Beans (MBeans). The MBeans expose a standard interface for accessing attributes and operations for a managed resource. By performing this instrumentation in the application, the JMX framework automatically makes the resource available for monitoring or management via an integrated MBean Server. Most commercial and open source Java products expose important application statistics, configuration, and management operations via JMX. Custom built Java applications can provide business level and application performance measurements thru MBeans.
To collect JMX data with ECSL, customers would deploy at least one Evident JMX Collector. A single Evident JMX Collector can be configured to collect from multiple JVMs. For best performance, Collectors should be deployed in the same network segment as the monitored environment. The Collectors depend on one or more data dictionaries that define what data to collect from the JVMs and a Collection Configuration file that define how and which JVMs to collect data from.
ECSL uses a few strategies for collecting JMX data from remote JVMs:
- JMX Remote: This is the standard JMX Remote interface using Remote Method Invocation (RMI) protocol that is supported by the JMX Remote API. All Java applications with JMX support this protocol. This does not require any additional software to be deployed on the monitored application. It only requires configuring a unique JMX management port on the JVM to enable JMX monitoring.
- JMX Local: For Java systems that support remote MBeanServers. The Evident JMX Collector can host an MBeanServer for managing the MBeans. In such case, the Collector can be configured to be a local JMX Client. It will directly access the MBean registry without performing JMX queries over the network.
- JMX over HTTP: Alternatively, JMX collection over HTTP is possible with the EvidentJMXProxyServlet. This is a specialized web application (“servlet”) provided by Evident Software. It can only be deployed into J2EE application containers to serve as a remote proxy that communicates via HTTP. Deployment of the servlet is not necessary, however recommended for optimal JMX query performance and lower network overhead.
The Collectors can perform data collection at configurable recurring intervals (i.e. 15, 30, 60, 120 seconds). The frequency of the data collector depends on the volume of data, # of JVMs, and amount of storage in the Evident ClearStone OpCache. The results from the collection normalized and compressed for delivery to the ECSL OpCache.
Note: Although the majority of the data collection is performed via JMX for Java applications, the ECSL Collector is highly extensible to support other interfaces. Each management pack will include additional configurations or interface extensions that enable the ECSL Collector to utilize alternative interfaces.
Real-time Data Storage
The ECSL OpCache is a separate service that is responsible for managing the embedded distributed cache in the product. It is used for storing the raw and processed data throughout the system. ECSL automatically manages the elasticity and data retention properties of the OpCache across multiple ECSL management servers. It can be configured to store data for a fixed # of hours (i.e. 24 hours). It is constrained by physical memory resources across one or more ECSL Management Servers. This retention period is highly dependent on the data volume and frequency of the monitored environment.
Pipeline Engine
The Pipeline Engine is responsible for transforming raw data into information and analytics that can be consumed by the end user. This engine is entirely data and event driven to support real-time data feeds. It consists of an embedded Complex Events Processing engine known as “Esper”. This enables the product to perform simple and complex data analytics and transformations. At any given time, a Pipeline Server has the capacity to process multiple real-time data feeds at once. The pipeline logic is driven by a set of rules and configurations in each Information Model.
As the Collector(s) publishes new data or events into the ECSL OpCache, the Pipeline Server starts new pipeline threads to process the various record types from the OpCache concurrently. Each record type will undergo a series of transformations based on the business rules defined in the Information Model. Each workflow is known as a pipeline. The simplest pipeline configuration can be a straight pass thru of the real-time data without any significant transformations. In that case, the Pipeline Server will only store the data into the ECSL OpCache for presentment.
There are pipeline configurations with multiple stages of processing for a record type. As the Pipeline Server goes thru each stage, it can produce new record types for consumption or for feeding back into the Pipeline Server for additional upstream processing. There can be additional stages where the Pipeline Server would compute a series of new records based on mathematical computations on the metrics or string transformations. For example, for basic metric calculations:
- Scale metric values
- Compute metrics across multiple records, fields, and data sets.
- Calculate deltas, sum, average, min, max, median, etc.
- Calculate a rate/throughput value
- Custom mathematical expressions for compute utilization and/or capacity
The pipelines can be instructed to perform real-time aggregations and grouping of raw or derived data. This is especially useful when users want to obtain holistic performance and capacity views across a clustered environment, such as an application deployed across a cluster of application servers, grid infrastructures, web services, etc.
Customers can define and configure threshold policies on any collected and derived metric. As data is processed by the Pipeline Engine, the data is evaluated to determine if any thresholds were breached. If so, threshold violation events will be generated. These violations can trigger notifications (i.e. SMTP, SNMP), log events to a file, or trigger external actions.
The pipeline logic and external actions can be extended thru the use of the Pipeline Plugins and Event Handlers.. The Pipeline Plugins is primarily used for executing custom code for data processing. The Event Handler interface enables users to subscribe to events within the product. This can be used for customize notifications, configuration changes, remediation, logging, custom metric calculation, etc.
Real-Time Data Services
As new results are delivered to the ECSL OpCache from the ECSL Pipeline Server, the Data Services component publishes updates to the user interfaces (i.e. Real-Time Dashboard, Operational Console, ChronoGraph Builder/Viewer, etc.). The Data Services component serves as a bridge between the user interfaces and the ECSL OpCache. All the Real-Time Dashboards communicate directly with the Data Services component to retrieve data, configuration, and events.
Furthermore, customers have the option to store data persistently to an external relational database or to disk. For ChronoGraphs, the product stores summarized data into an Oracle database. The real-time data is summarized into 5-minute, hourly, and daily records for enabling historical data analysis over hours, days, weeks, and months. The data warehouse has the capacity to store data for many months. The data retention policies are configurable.
User Interfaces
The Real-Time Dashboard is the generic real-time user interface of ECSL. This application runs in web browsers configured with an Adobe Flash Player plugin. This user interface enables users to construct customized real-time visualizations and ad hoc dashboards to suit their needs. Users can create ‘perspectives’ that can be tailored to monitoring one or more real-time components within the infrastructure. A perspective can be saved and shared with other users. Each management pack may include additional pre-built perspectives for visualizing the monitored systems.
Users can easily tune the real-time data sets by setting the appropriate grouping, filtering, and scoping of the data. Grouping enables users to perform “group by” functions on the record types. Data can be filtered by attribute values (i.e. application name, type of process). Scoping can significantly help reduce the number of unique instances of records. This can help remove the “noise” while perform granular analysis and troubleshooting.
To complement the generic Real-Time Dashboard, the ChronoGraph Builder and Viewer is used for historical analytics. ECSL’s ChronoGraph feature includes standard historical “reports” for common Java systems (i.e. middleware systems, grids, JVMs, etc.). The ChronoGraph Builder is the tool that is used for designing new historical reports without the knowledge of SQL programming or the data warehouse schema. This enables users to rapidly construct new historical analytics tailored to their environment.
Users who are interested in monitoring additional JMX beans from other Java process can define a new Information Model using the Information Model Builder. The Builder guide users with building the structure data model and application logic for monitoring new MBeans.
Management Framework
ECSL supports management operations for distributed Java applications using via the Management Framework. ECSL includes specialized and generic management plugins. These management plugins are basically comprised of annotated POJOs deployed within the Management Plugin Server. The methods of the POJOs are annotated and further exposed as MBean operations thru JMX. This allows any of the ECSL components to remotely execute an MBean operation backed by these methods.
For Oracle Coherence, the product enables users to manage the data grid, browse cache data, read/write data, backup/restore caches, start/stop nodes, load/clear data, execute cluster-wide thread dumps, etc. Customers can expose their pre-existing startup available via a plugin to manage the environment. Furthermore, ECSL provides some generic operations such as remote script execution via SSH and JMX MBean operations.
Users can extent the product with additional custom operations by leveraging existing scripts, code, etc; all without employing any proprietary APIs. This capability empowers users (developers, testers, operations) to incorporate common operational procedures into the product while monitoring the distributed environment. To learn more about how a management plugin is developed, a sample management plugin project is provided with the installation of the project.
Summary
In summary, ECSL’s core features are:
- Real-time metering of heterogeneous distributed systems (i.e. JVMs, application servers, custom web applications, distributed caches, grids, etc.).
- The platform is highly extensible to support non-standard Java systems and custom applications; all configurable by end-user.
- Consolidation and aggregation of:
- Middleware performance and utilization metrics (i.e. concurrent HTTP sessions, thread pool)
- Custom application load and activity (i.e. web services response times)
- Business activity (i.e. # of pending orders, # of orders executed)
- Caching capacity
- Storage utilization
- Display correlated business activity against the real-time infrastructure usage.
- Apply real-time performance and capacity SLAs across clusters.
- Provide the ability to manage distributed Java applications by using an extensible management framework.
- Provide the ability to trigger notifications and actions based on SLA violations or events.
- Ability to trend the real-time performance and activity with highly interactive and rich visualizations.
- Customizable real-time dashboards that can be tailored for different users.
- Historical analytics for capacity planning, usage patterns, and performance and activity comparison.
|