Articles tagged hadoop

The views expressed in this blog are strictly personal, and do not necessarily represent the views of Evident Software.

By Bill Nigh

I recently was invited to a webinar on Hadoop and NoSQL by Impetus, a self-styled “Big Data Services” company. Let me start off with compliments to the Impetus team for their very professional delivery. It was evident that care had been taken in preparation, the timing and pacing were spot on, and all speakers handled with aplomb the several handoffs that were interspersed with poll questions in the presentation. Sanjay Sharma, Technical Architect, and Gaurav Nigam, Module Lead, were the main speakers.

A video of the webinar will be available in a few days, but for now, some preliminary thoughts, and what I as a novice to Hadoop (and just one step removed from that status as a NoSQL guy) thought were highlights.

The moderator started by describing Hadoop and NoSQL as ‘two game-changing technologies’. Hadoop is a framework, a set of MapReduce APIs on top of Java. As such, not being a radically new technology, Hadoop is not difficult for a developer to learn. Hadoop works in batch mode, something that Impetus speakers emphasized, as it can impact how some activities, such as as unit testing, are to be approached.  One tip: unit test of mrjobs should be used.

The ease of transitioning to Hadoop was to crop up at several other points, good news for harried IT shops with ‘performance pressure’; for example, repurposing business logic was used as an example of one easy migration vector. Of course, some learning is required, and that gets into non-tech areas, such as the cost of doing this. Useful guidelines for identifying which project lends itself to the new game changers can be a challenge, one place where Impetus comes in; Impetus offers services such as a deployment toolkit for Hadoop.

One question that Impetus suggested project owners ask themselves should be: is the app compute intensive, leading more toward a choice of, say, Erlang, or data intensive, where Hadoop enters the arena.

NoSQL, as most readers of the Evident blogs probably know, is an appropriately ‘elastic’ label stretched over a number of products/projects that fall into four categories: ColumnFamily, Graph, Key-Value, such as Memcached and Document. Our ClearStone v5.0 product uses, for example, the Neo4j graph NoSQL product, and ColumnFamily champ Cassandra. The NoSQL world is characterized by high availability and amazing scalability, with interesting tradeoffs, such as ‘eventually consistent’ data, as the technology is not transactional.

One point emphasized: for most shops, the traditional development approach can be used for Hadoop/NoSQL life cycle. As long as stakeholders understand MapReduce, there should be a smooth transition.

Impetus recommended verification with a Proof of Concept (PoC), and offered free PoC’s to a ‘select few’, based on submissions from attendees about their Big Data needs, a nice move, methinks.

Impetus claims “a strong focus and established thought leadership in the area of Big Data analytics and high performance computing” and offers a well-tested Global Delivery Model to help you evaluate and implement solutions tailored to your specific technical and business context.

By Scott Barnett

Last month, we launched Evident ClearStone 4.5, which includes NoSQL logging and NoSQL reporting features. This software release marks an important milestone in the evolution of Evident Software. Over the summer, we made a decision to aggressively go after the NoSQL DB market, expanding our previous support for compute-grid technologies such as DataSynapse and application-grid technologies such as Oracle Coherence by offering NoSQL reporting, NoSQL logging, management and performance monitoring.

Why make this change of course? There were several reasons:

  1. NoSQL might not still be called “NoSQL” in a few years, but it absolutely will be an important technology for enterprise applications. Think back to how the Java Application Server came of age in the mid/late 1990′s. That technology required several iterations to become the Java Application Server. The market took a few years to coalesce and turn into something that the broad industry could understand, market, and build around. Today NoSQL is going through a similar evolution. We’re just starting to see forecasts of the size of the NoSQL market. We suspect it won’t be called NoSQL a year from now (some other people seem to agree) – as technologies such as Hadoop, Data Caching Platforms such as Coherence, GemFire, Terracotta and hybrid in-memory databases such as VoltDB all vie for developer mind-share. Whatever it’s called, this is the “new” tier in the application stack, and it’s going to need focused and dedicated capabilities from a management/monitoring perspective, including NoSQL logging and NoSQL reporting. Here’s a great database (no pun intended) of systems that fall into the NoSQL realm.
  2. Correlating metrics and events between the NoSQL tier and the other existing tiers in the application (and system) stack will be key capabilities for monitoring and managing NoSQL applications. Each tier cannot continue to have its own NoSQL logging and monitoring capabilities – monitoring needs to be integrated, so enterprises can get a holistic view of their applications. This is a hard problem to solve.It’s also a valuable problem to solve. We are solving this problem already now for the caching technologies I listed above. Now we want continue extending this capability across the different tiers of the application stack.
  3. Visualization is the key to success in APM. When you are gathering so many metrics/events in real time, it’s a challenge to determine what is really important to DevOps.. We’ve been told we’ve done a great job of figuring this one out – our user interface is intuitive, attractive, and meaningful. Making sense of all that data is hard to do. Without it, you have lots of great data with no insight. You need insight to make good decisions.
  4. Our goal is to support every NoSQL system out there. To meet this goal requires a change of strategy – so you will see us open up our platform so that people can build and deploy their own “Management Packs.” We currently have Management Packs for DataSynapse GridServer, Oracle Coherence, Apache Cassandra, Memcached (with Membase coming very soon), WebLogic Server, and jBoss. We are working on many more, but we want to move even faster. So you will see a Management Pack framework that allows developers to build their own Management Packs (we can help you too!). It is not hard to do this, and we will roll out a developer site shortly for people to share/collaborate/contribute. We will start by contributing our own management packs to the site.

So, 4.5 is the next step in our evolution and a hearty step forward in our embrace of all things NoSQL as the latest, greatest participant in the application stack. From our conversations with customers and prospects over the past few months, we know many of you agree our vision of NoSQL reporting, monitoring and management. We look forward to working with you on this initiative in the months and years ahead. We are very interested in your thoughts, ideas, and suggestions on how to continue this process, so please share your ideas with us!

By Scott Barnett

Hadoop Several of the Evident team members had an opportunity to attend Hadoop World 2010 yesterday in NYC. The event was very well attended – reports have itthat attendance went from 400 last year to over 900 this year. It’s hard for me to compare since I wasn’t at last year’s event, but I can report that this year’s event had excellent speakers and material, and the day flew by.

Anyone questioning the use of Hadoop in production environments would have been well served to hear some of this year’s talks. The Evident team covered practically every talk between us – I spent most of my time in the Grand Ballroom listening to customer case studies around Hadoop. There were the expected players (Twitter, AOL, eBay, Yahoo) but also some perhaps unexpected guests (Bank of America, Chicago Mercantile Exchange, GE, HP, Orbitz). From our biased perspective, it was great to hear that most of the users were looking for better ways to manage/monitor their Hadoop grids (as well as the overall infrastructure). Exactly what we wanted to hear!

While a handful of the talks were disappointing, most of them had very useful information and relevant statistics about the benefits and implementation of Hadoop. It isn’t all about massive scale – rather, the ability to create a simple elastic grid is a great reason to get started by itself – the fact that Hadoop can scale linearly is gravy. Tim O’Reilly had an interesting (if a bit meandering) keynote regarding the consequences of living in a world of data. Mike Olson from Cloudera kicked off the event and did a good job praising the Hadoop community and bashing Oracle. And the Cloudera team overall was quite good keeping things flowing and making their presence known. The vendor kiosks were packed during the breaks, and I met several interesting folks throughout the day, one of whom was downloading ClearStone while we were talking – thanks!

I hope we can come back next year, perhaps as one of the sponsors of the event. It was exciting and full of energy.

By Scott Barnett

I attended the Boston Big Data Summit last week, which was extremely well attended and an excellent session. Moderated by Fred Holahan (who announced his new position as VP, Marketing at VoltDB, congrats!), the panel discussion included folks from 10gen (the “MongoDB guys”), Cloudera (the “Hadoop guys”), Infobright and VoltDB. (As an aside, it seems that Cloudera has done a great job branding their company around Hadoop, but 10gen seems to want to sit behind the Mongo brand?)

Anyway, the topic was real world problems that each of the vendors solutions can address. Each vendor got 10 minutes to talk about a use case that was relevant to their solution. Without going into the details of each use case, what struck me was how similar the excitement of this space is to what happened in the mid-90′s with Java Application Servers.

Questions from the audience focused on scalability and “hardening” of each solution – as well as limitations of each solution as it pertained to certain use cases. Each of the vendors tried to defer to the others with regard to specific use cases for specific products – I’m not sure how long that will last, as there is significant overlap of the solutions – while each one does focus on a specific area, it will be impossible for them not to explore expansion into other capabilities.

There was also a discussion regarding management/monitoring of applications written with these tools. Both Cloudera and VoltDB mentioned that they are releasing management functionality in the next releases of their respective products – this is a good sign, as no serious development shop is going to put applications in production without basic management, so it indicates that these vendors are getting these types of questions from their community. There was also an excellent comment from the audience regarding DevOps – that their operations team has a detailed checklist that development is required to fill out before their applications can be put in production – and part of the checklist is to identify what tools are needed to manage and monitor the application, and how those tools can interoperate with the “standards” already in place. This had personal resonance to what we are seeing at Evident – where it is the developers and architects who are the initial users of ClearStone, and they then recommend production use of our tool to Operations once the applications are ready to go live. It also confirmed our belief that we need to continue to make ClearStone easier and useful for developers during their build/deploy process.

Overall the session was excellent – given that it was my first Big Data summit so I didn’t know the “rules”, I will recommend they publish the following guidelines (assuming they are all standard):

  • Dinner is served and it’s really good, so don’t snack before hand :-)
  • Networking is primarily done before the event, and it ends later than stated, so if you have someplace else to be, get there a little early

By Scott Barnett

We’re pleased to bring you a new website chock full of great information. Evident ClearStone has evolved over the past 2 years and we’re excited for what the upcoming years will bring us. We’ve gone from management, monitoring and analytics around grid, to caching technologies, and now a broader scope that includes Java and big data.

Read the rest of this entry »