Articles tagged sar

The views expressed in this blog are strictly personal, and do not necessarily represent the views of Evident Software.

By Bill Nigh

Public cloud deliverables include delivering the client from concern about performance. Properly defined SLA’s and indemnification mean ease of mind for administrators, (while granting how relative the term ’ease of mind’ can be in the IT world  :)  )

The quality and extensiveness of the cloud server monitoring vary from one public cloud provider to the next; some may offer third party tools, or tools from the provider, but you get what you get.

Cloud performance monitoring possibilities and options change with the move to private cloud technology, where the technology behind such giants as Amazon, Rackspace, GoGrid and others is licensed for use within corporate data centers. Optimal configuraton of all the elements of private cloud monitoring may be tricky; documentation may be outdated and sketchy; functionality may be blackboxed and constrained to such an extent that you say it’s not worth the candle to license and use it.

A Cloud Server, is, ultimately, a physical device, with processor and memory constraints and characteristics. Such utilities as Unix ‘sar’, which Wikipedia describes as a “Solaris-derived system monitor command” for CPU monitoring , can instrument physical resources in pursuit of complete and holistic stats, maybe eventually married to heuristics for better cloud server management.

What’s also important to consider is that the private cloud infrastructure is not written on a tabula rasa.  As one of our engineers, John Clark, mentioned in a recent conversation, there will also be numerous technologies that are already running inside the firewall, and probably several monitoring tools. “Eyeball correlation” is the term given to the practice, nay, need, to have to look at a number of consoles, logs and graphs to see the whole story and try to discern patterns in a non-correlated heterogeneous visualization with incompatible scaling or visualization metaphors.

Learn more about our performance monitoring solution for NoSQL, web apps and web servers: http://www.evidentsoftware.com

By Ivan Ho

The Problem

Customers and partners have requested a monitoring interface that allows them to deliver or collect third party or proprietary application or system data/events to Evident ClearStone. These data feeds may complement the existing Evident ClearStone Management Packs. Therefore adding tremendous value to consolidate and correlate across disparate applications and systems. The interface had to be simple to use by developers and system administrations with minimal dependencies and minimal knowledge about ClearStone internals.

The expectation is for any third party data delivered to ClearStone are the following:

  1. Support data from thousands of resources with tens of thousands of metrics with data granularity ranging from 5 seconds to hourly and beyond.
  2. Support for data delivery via cloud friendly protocols (i.e. HTTP)
  3. Consolidation of data into a high performance scalable data store without requiring RDBMS. Data must be resilient, partitioned, and replicated.
  4. Aggregation / roll up the data over time.
  5. Allow administrators to set SLAs and fine-grain alerting on any metric for any resource.
  6. Allow users to quickly monitor the data using any of the built-in visualization features.
  7. Allow users to visualize performance trends in real-time or historically.
  8. Provide the ability for enhancing correlations by events, metrics, resources, and time.

The Solution

With the release of Evident ClearStone 5, the product has been opened up to support any third party data. This is accomplished via Evident ClearStone’s new REST-Open Data Interface (REST-ODI). This feature allows developers to use their programming language of preference (Java, Ruby, Python, Perl, PHP, etc.) to publish time-series data and events to Evident ClearStone. System, network, and database administrators can easily use this interface to publish performance data and events as well from legacy/proprietary system tools, applications, scripts, and logs via this interface with minimal scripting effort. Partners can leverage this interface to submit performance metrics and events thru the REST-ODI interface as well.

What we’ve done with the REST-ODI feature is the following:

  1. Provided an open external interface for third party applications to publish (push-only) arbitrary performance data/statistics for monitoring within ClearStone.
  2. The supported data formats consist of tabular (text delimited) and key-value formats.
  3. The external interface is “API-less”. The only requirement is to conform to a particular XML format for delivering data model/schema metadata and data to ClearStone.
  4. Data delivered via HTTP.
  5. Upon receipt of data, ClearStone will perform basic pass-through processing (i.e. storage, consolidation, aggregation, threshold evaluation, notification, and visualization).

Use Case

Let’s have a look at a use case based on using the sar command for system monitoring of Linux/UNIX systems. This is one of the first tools administrators use for monitoring server or application performance problems. The sar command gathers system activity information such as: CPU utilization, memory paging, network I/O, process creation activity, block devices activity, interrupts, etc.

So, how do we take the following sar output for CPU utilization and send it to Evident ClearStone?

[evident@eng-x64-20 ivan]$ sar -P ALL 60 1
Linux 2.6.9-67.ELsmp (eng-x64-20.evidentsoftware.com) 02/14/2011 _x86_64_ (4 CPU)

08:37:43 AM CPU %user %nice %system %iowait %steal %idle
08:38:43 AM all 52.45 0.00 0.15 0.03 0.00 47.37
08:38:43 AM 0 5.60 0.00 0.33 0.05 0.00 94.02
08:38:43 AM 1 4.20 0.00 0.27 0.05 0.00 95.48
08:38:43 AM 2 100.00 0.00 0.00 0.00 0.00 0.00
08:38:43 AM 3 99.98 0.00 0.02 0.00 0.00 0.00

Average: CPU %user %nice %system %iowait %steal %idle
Average: all 52.45 0.00 0.15 0.03 0.00 47.37
Average: 0 5.60 0.00 0.33 0.05 0.00 94.02
Average: 1 4.20 0.00 0.27 0.05 0.00 95.48
Average: 2 100.00 0.00 0.00 0.00 0.00 0.00
Average: 3 99.98 0.00 0.02 0.00 0.00 0.00

Based on this output we need to come up with a “schema” that will help Evident ClearStone process this content. Then we need a script that will automate the delivery of this data on a recurring basis. Here’s what we can visually analyze with this output:

  • The format is tabular (delimited by multiple contiguous space characters).
  • The fields we’re interested in are: time, CPU #, %user, %system, %iowait, %steal, and %idle.
  • The rows we’re interested in are the ones with CPU 0 thru 3.
  • We also need to know about the hostname/resource itself.
  • The observation interval is 60 seconds.

The following content illustrates an abbreviated process on how to construct the schema, publishing it, scripting the monitoring, and publishing the data to Evident ClearStone.

Step 1. Construct the schema to describe the data

The schema for this data is defined in the following XML. For details on how to create this, please consult this article on the support site.


<?xml version="1.0" encoding="UTF-8"?>
<model format="tabular" category="Linux" type="SAR_CPU">
<key>${Host} CPU#${CPU}</key>
<columns>
<column name="Timestamp" description="record timestamp" type="epoch" timestamp="true" format="sec" dynamic="${TIMESTAMP}"/>
<column name="Host" description="Refers to the origin of this data" type="string" dynamic="${RESOURCE}" grouping="true"/>
<column position="2" name="CPU" description="All CPUs" type="string" grouping="true"/>
<column position="3" name="user" display_label="User%" description="Percentage of CPU utilization that occurred while executing at the user level," type="float" units="%" metric="true"/>
<column position="4" name="nice" display_label="Nice%" description="Percentage of CPU utilization that occurred while executing at the user level with nice priority" type=" float" units="%" metric="true"/>
<column position="5" name="system" display_label="System%" description="Percentage of CPU utilization that occurred while executing at the system level (kernel)" type="float" units="%" metric="true"/>
<column position="6" name="iowait" display_label="I/O Wait%" description="Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request" type="float" units="%" metric="true"/>
<column position="7" name="steal" display_label="Steal%" description="Percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor" type="float" units="%" metric="true"/>
<column position="8" name="idle" display_label="Idle%" description="Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request" type="float" units="%" metric="true"/>
</columns>
</model>

You’ll notice the following in the XML:

  • The schema name is called “SAR_CPU” which belongs to a category of data called “Linux”. The Linux category is used to organize a set of schemas together.
  • All records require some unique identification, therefore the value of the key is based on the combination of the Host value and the CPU #.
  • In total, there will be nine fields for each SAR_CPU record.
  • The first field is Timestamp. We are not using the timestamp (08:38:43 AM) that appears in the first 2 columns of the sar output. That time value is missing a date reference. Therefore, we expect the actual timestamp of the sar collection to be provided via a dynamic ${TIMESTAMP} property (see later).
  • The second field is the Host. This is also not extracted from the sar output. That will also be supplied as dynamic property ${RESOURCE} (see later). This field has grouping set to true to indicate this a key field.
  • The third field is the CPU #. This is also a key field, thus grouping is set to true.
  • The fourth field is the first metric field for the user%. The name we used to identify this field is called “user”. Names are restricted to alphanumeric characters; therefore we set a display label for the user interface is set to “User%”. The type of value is a “float” for floating point values and metric is set to true to indicate this field is a metric.
  • The last five fields are similar to the user% field.

Step 2. Publish the schema

The next step is to publish the schema (sar_cpu.xml) to Evident ClearStone. One option is to use the curl command. The curl tool is used to transfer data from or to a server, using one of the supported protocols (HTTP, HTTPS, FTP, FTPS, GOPHER, DICT, TELNET, LDAP or FILE). The command is designed to work without user interaction on most operating systems.

Here’s an example of how to post data via HTTP:


cat sar_cpu.xml | curl -X POST -H 'Content-type: text/xml' --data-binary @- http://<clearstone_server>:8080/ecsserver/odi/model

Other options for publishing the schema are described here.

Step 3. Create script

Let’s turn our attention to formatting and encapsulating the sar output into an XML format. Here’s a simplified shell script that produces the XML content require for ODI. The script constructs the required <data> element and replaces the variables (i.e. $RESOURCE_NAME, $CATEGORY, $TYPE_MODEL, $COLLECTION_TIME, $INTEVAL) with values set in the script. The sar output based on sar -P ALL $INTERVAL 1 | grep -v -E ‘Average|all’ is wrapped within a CDATA segment as-is.


#!/bin/sh

CATEGORY="Linux"
TYPE_MODEL="SAR_CPU"
RESOURCE_NAME=`hostname -f`
COLLECTION_TIME=`date +%s`
INTERVAL=60

echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
echo "<data resource=\"$RESOURCE_NAME\" category=\"$CATEGORY\" type=\"$TYPE_MODEL\" delimiter=\" \" timestamp=\"$COLLECTION_TIME\" time_format=\"epoch_sec\" startrow=\"5\" obs_interval=\"$INTERVAL\" collapse_delimiter=\"true\">"
echo "<property attribute=\"Host\" value=\"\${RESOURCE}\"/>"
echo "<property attribute=\"Domain\" value=\"evidentsoftware.com\"/>"
echo "<![CDATA["
sar -P ALL $INTERVAL 1 | grep -v -E 'Average|all’
echo "]]>"
echo "</data>"

The results will look like:

<?xml version="1.0" encoding="UTF-8"?>
<data resource="eng-x64-20.evidentsoftware.com" category="Linux" type="SAR_CPU" delimiter=" " timestamp="1295976723" time_format="epoch_sec" startrow="5" obs_interval="60" collapse_delimiter="true">
<property attribute="Environment" value="QA"/>
<property attribute="Host" value="${RESOURCE}"/>
<![CDATA[
Linux 2.6.9-67.ELsmp (eng-x64-20.evidentsoftware.com) 02/14/2011 _x86_64_ (4 CPU)

08:37:43 AM CPU %user %nice %system %iowait %steal %idle
08:38:43 AM 0 5.60 0.00 0.33 0.05 0.00 94.02
08:38:43 AM 1 4.20 0.00 0.27 0.05 0.00 95.48
08:38:43 AM 2 100.00 0.00 0.00 0.00 0.00 0.00
08:38:43 AM 3 99.98 0.00 0.02 0.00 0.00 0.00
]]>
</data>

You’ll notice the following in this output:

  • The referenced schema is SAR_CPU in the Linux category. (category=”Linux” type=”SAR_CPU”)
  • The delimiter for this content is a contiguous set of space characters (delimiter=” “ collapse_delimiter=”true”).
  • The specified resource is based on hostname detection. This is referenced by the Host data property.
  • For demonstration, another static data property for Environment is also included (<property attribute=”Environment” value=”QA”/>).
  • The timestamp is set for the entire dataset based on the current system time. The format is in seconds since epoch (timestamp=”1295976723″ time_format=”epoch_sec”).
  • Even though, we’re passing in all the sar output within the CDATA segment, this data will be parsed based on the schema defined above.
  • This demo script only collects a single interval of sar CPU metrics. Certainly additional scripting enhancements are required to fully automate this.

Step 4. Publish the results

The last step is to deliver the XML content to the Evident ClearStone server. Here’s an example of how to use curl to publish the sar data to ECS.


./sar_cpu.sh | curl -X POST -H 'Content-type: text/xml' --data-binary @- http://<clearstone_server>:8080/ecsserver/odi/data

For the complete implementation for system monitoring with sar, please refer to the Linux: Integrating SAR with Evident ClearStone article. This article contains the schemas and automated scripts for sar monitoring.