Wednesday, May 6, 2015

Hadoop Meetup on the sidelines of Strata Hadoop Conference - Part 1

Not being able to make it up to the Main conference (Strata Hadoop London 2015), the evening meet-ups were the consolation pieces of getting in touch as much as possible.

In my view, these conferences/events often help one to get to know about the recent developments in the space, mostly showcasing whats being done with a given technology, whats coming up (future developments, innovations) and people's experiences with the technology, both good (the famous savings use cases) and bad (challenges faced in achieving production readiness, if ever).

Last evening, on day 1 of the conference, I ended up attending one of the meet-ups.  It was particularly useful for me, for couple of reasons.

There was a talk on the new execution engine for hive, i.e. hive running on spark. Always keen on internal workings of a complex piece of software (or hardware for that matter), i was very happy to be able to listen in directly from the person responsible for much of development on hive.  I have an audio recording of the whole talk, though I am hopeful that the conference organizers shall put up the video on their website anyway.

When Phill talked about his experiences on getting hadoop on its feet and how they orchestrated hadoop as a PaaS within BT was something  insightful, (they seem to call it HaaS there).  it showed two things to me - architects' always have to "find the funding" for innovations and new tech to be brought in to the organization :) Also, Security on hadoop is "doable", as his use case proved to be.  There are reliable tools and solutions which can help achieve Enterprise level security for a hadoop cluster.

Another interesting talk was Dato's.  Dato is a machine learning/modelling tool, which claims to be fairly quick than many others, allows the data to be consumed in-place (like hadoop) and supports hdfs integration.  I am sure to follow up on Dato with the organization.  for me its one of the key problems of the future, where data is too much and the modelling algorithm has to be enabled to consume data for training sets in-place, since its just not practical to move tera/petabytes of data to where program is.  IBM BigR is doing something similar as well.

Finally, another interesting talk was from Big Data Boards team.  they talked about how they are building cluster hardware for hosting small hadoop clusters. Interesting proposition there, to have your own hadoop cluster running on a desk in a corner of your office.  no need of going over to the likes of aws for hosting the cluster.  They say that many universities etc are already using the clusters they made for real life experiments.  very interesting space for me.

No comments:

Post a Comment