I recently was invited to a webinar on Hadoop and NoSQL by Impetus, a self-styled “Big Data Services” company. Let me start off with compliments to the Impetus team for their very professional delivery. It was evident that care had been taken in preparation, the timing and pacing were spot on, and all speakers handled with aplomb the several handoffs that were interspersed with poll questions in the presentation. Sanjay Sharma, Technical Architect, and Gaurav Nigam, Module Lead, were the main speakers.
A video of the webinar will be available in a few days, but for now, some preliminary thoughts, and what I as a novice to Hadoop (and just one step removed from that status as a NoSQL guy) thought were highlights.
The moderator started by describing Hadoop and NoSQL as ‘two game-changing technologies’. Hadoop is a framework, a set of MapReduce APIs on top of Java. As such, not being a radically new technology, Hadoop is not difficult for a developer to learn. Hadoop works in batch mode, something that Impetus speakers emphasized, as it can impact how some activities, such as as unit testing, are to be approached. One tip: unit test of mrjobs should be used.
The ease of transitioning to Hadoop was to crop up at several other points, good news for harried IT shops with ‘performance pressure’; for example, repurposing business logic was used as an example of one easy migration vector. Of course, some learning is required, and that gets into non-tech areas, such as the cost of doing this. Useful guidelines for identifying which project lends itself to the new game changers can be a challenge, one place where Impetus comes in; Impetus offers services such as a deployment toolkit for Hadoop.
One question that Impetus suggested project owners ask themselves should be: is the app compute intensive, leading more toward a choice of, say, Erlang, or data intensive, where Hadoop enters the arena.
NoSQL, as most readers of the Evident blogs probably know, is an appropriately ‘elastic’ label stretched over a number of products/projects that fall into four categories: ColumnFamily, Graph, Key-Value, such as Memcached and Document. Our ClearStone v5.0 product uses, for example, the Neo4j graph NoSQL product, and ColumnFamily champ Cassandra. The NoSQL world is characterized by high availability and amazing scalability, with interesting tradeoffs, such as ‘eventually consistent’ data, as the technology is not transactional.
One point emphasized: for most shops, the traditional development approach can be used for Hadoop/NoSQL life cycle. As long as stakeholders understand MapReduce, there should be a smooth transition.
Impetus recommended verification with a Proof of Concept (PoC), and offered free PoC’s to a ‘select few’, based on submissions from attendees about their Big Data needs, a nice move, methinks.
Impetus claims “a strong focus and established thought leadership in the area of Big Data analytics and high performance computing” and offers a well-tested Global Delivery Model to help you evaluate and implement solutions tailored to your specific technical and business context.

View Comments