A while back we released a Memcached performance monitoring management pack that supports statistics collection on Memcached instances. The management pack collects all of the stats one would see by simply calling the “stats” command via a telnet socket interface. By piping metrics from multiple Memcached instances into a single collector, ECS allows users to create “logical” clusters of Memcached instances to monitor. This means that if your Memcached client has 4 Memcached instances it uses for hashing and storing, you would essentially configure ECS to collect stats from each of the 4 instances. Once the individual instance metric data is in ECS, it is also aggregated to a “cluster” level so one can monitor the health of the Memcached stack as a whole, not just the instances individually.
For a developer, monitoring Memcached data in real-time via ECS’s Real-Time Dashboard might be ideal to test different load scenarios on a set of Memcached instances and see how the instances react. This has helped one group fine-tune their compression methods on items being stored in a Memcached instance. But for an operations team, it would be inefficient to employ someone to stare at the real-time feed of metric data and look for problems. Therefore, for a Memcached stack that is in production, ECS allows administrators to set up threshold detection on any metric that flows through its pipeline. This is essential to provide the operations team with instant notification of problem areas or keeping track of KPIs.
For example, a common issue that arises with Memcached is knowing when to bring up more instances. Most need to bring up more instances when the free memory of the Memcached instance or group of instances is running out. With ECS, an administrator can configure a threshold to watch the “% Memory Free” metric (can be at cluster or instance level). In the event that the threshold is hit, emails will be sent out to the operations team to notify them that this threshold is being hit. Knowing that their memory is running low, the operations team can then kick off a new instance of Memcached to ensure that premature evictions do not occur and requests are being served. Requests not being served could mean increased response times for your website. Another common issue is a high number of missed keys. If a threshold is set and a notification is received that there is a high number of missed keys, it allows the operations (or development) team to be proactive and have immediate scope into the problem rather than waiting for someone to complain that response times are slow somewhere higher up the stack or that there is data missing from the cache.
Having this type of visibility and responding to these issues before they become a bigger problem is paramount to running a healthy cluster of Memcached instances. The Management Pack for Memcached is a perfect tool for users of Memcached to be proactive about their environment. With the Real-Time Dashboard and threshold configuration, both the ops and dev teams can obtain immediate visibility to problem areas and understand quickly how to solve them.
Lastly, since Membase is built on top of Memcached, one can use the Management Pack for Memcached to monitor their Membase environment. We are currently working on an extension to the Memcached management pack to monitor not only the underlying Memcached instance, but also collect information from Membase’s Management REST API (a RESTfulservice provided by Membase). Through the REST API, ECS will be able to collect information on Membase-specific architecture (pools, buckets, etc.) that will allow for further understanding of the health of a Membase cluster.

An example of a threshold configuration that recognizes low cluster memory within an Memcached environment.

An example of a threshold configuration that recognizes when a specific key is being missed too often within an Memcached environment.
Read more about our solution for Memcached stats, monitoring and optimization
View Comments