Thursday, June 14, 2012

hadoop/hive with tableu


It was in 2010 that  I had the first taste of hadoop/hive.  Back then I was still using hadoop 0.20 and was doing a proof of concept for a customer, who wanted to see if hadoop can be a solution for their problems.

Since then, I have been reading up and following the changes in the hadoop world, and tweaking things here and there with the home installation.   Today, I tried to mount hive on hadoop (without hbase, with hbase will be the next experiment) and see how can I get it playing nicely.

The experience is awesome as usual, and it reinforces my belief in the fact that hadoop ecosystem has a huge role to play in the computing industry of tomorrow.

The analytical capabilities of the volumes of data managed by the hadoop kind of system are ever increasing, therefore the interest from many instant BI players to provide access to the data behind hadoop.

One such player is the instant dashboard tool - Tableu.  They have announced that Tableu 7 will be able to read data directly from hive environments.  

In real life it was a bit of a challenge, but whats the fun if there is no challenge. In a nutshell, it does work. No doubt.  However, the kind of configuration that is required and administration can be tricky.

1. You have to install the hive driver (available from their website - http://www.tableausoftware.com/support/drivers)

2. you have to launch hive in a particular way, as a service. (hive --service hiverserver).  Also, hive on a pseudo cluster only allows one user connected (since the metadata store is single user on Derby).  as a result, if you are using Tableu connectivity, nothing else can access hive, not even a command line inerface.

3. Remember that each addition/change to the data set on tableu interface triggers a map-reduce job on the hive cluster/server.  And that, hadoop/hive are not really meant to be fast responsive systems.  Therefore, expect high delays in fulfilling your drag and drop requests.

4. There might/will be additional troubles in aggregating certain types of data, since the data types on hive might not be additive in the same way as front end expects them to be.

All in all, it wins me in the ease of use provided for accessing the data behind the hadoop environment, however, there are faster ways that already exist to achieve the same result.

8 comments:

  1. HoweverArticle Submission, it is similarly simple to lose all sense of direction in the manner and become overpowered by the information. machine learning certification

    ReplyDelete
  2. I am glad that I saw this post. It is informative blog for us and we need this type of blog thanks for share this blog, Keep posting such instructional blogs and I am looking forward for your future posts.
    Cyber Security Projects for Final Year

    JavaScript Training in Chennai

    Project Centers in Chennai

    JavaScript Training in Chennai

    ReplyDelete