Friday, April 30, 2010

Implementing Cartesian Product in Informatica Mapping

As against pentaho, Informatica doesnt provide a ready made transformation for implementing cartesian product in a mapping. Although, most of us would agree that  its not often that we tend to go for cartesian product joins. [the instinct generally is to do enough to avoid a cartesian product, because its a performance killer in general]

However, when your requirements need this, there is no direct way to do it in informatica joiner transformation.  Either you do it in the db side, by overriding your source qualifier sql statement and building it in there.


However, I have seen that some designers dont like to override sql statements, in such cases you'd have to implement it inside the mapping only. Here's a workaround for achieving that. Here goes -

  1. Read both the sources using their own source qualifiers, normally.
  2. For both of them, put in an Expression Transformation after the source
  3. In both the expression transformations, create an output port with a constant value. For Example, call it dummy1 for stream 1 and assign it a value -1.  Similarly, a port would be created in the second pipeline, lets call it dummy2 and assign it a value -1.
  4. Now create a joiner transformation. Link ports [including the one that we created with a constant value] to the joiner from both the expressions.
  5. In the join condition, choose to compare the dummy columns. 
  6. The rest of the joiner configuration would have to be like any other joiner. Nothing specific.
You might want to keep the smaller source as the master in the joiner, since it would save on the caching.

Before implementing the above solution, be sure to go back and check if its actually required for your application to have cartesian product !!!!

Thursday, April 29, 2010

Hard vs Soft Parse in Oracle...

Was reading on asktom, came across this response from Tom, that seemed so direct and clear...

So, shamelessly copying.. :)



[From AskTom]
Parsing

This is the first step in the processing of any statement in Oracle. Parsing is the act of breaking the submitted statement down into its component parts ? determining what type of statement it is (query, DML, DDL) and performing various checks on it.

The parsing process performs two main functions:

o Syntax Check: is the statement a valid one? Does it make sense given the SQL grammar documented in the SQL Reference Manual. Does it follow all of the rules for SQL.

o Semantic Analysis: Going beyond the syntax ? is the statement valid in light of the objects in the database (do the tables and columns referenced exist). Do you have access to the objects ? are the proper privileges in place? Are there ambiguities in the statement ?

For example if there are two tables T1 and T2 and both have a column X, the
query ?select X from T1, T2 where ?? is ambiguous, we don?t know which table to get X from. And so on.

So, you can think of parsing as basically a two step process, that of a syntax check to
check the validity of the statement and that of a semantic check ? to ensure the
statement can execute properly. The difference between the two types of checks are hard for you to see ? Oracle does not come back and say "it failed the syntax check", rather it returns the statement with a error code and message. So for example, this statement
fails with a syntax error:

SQL> select from where 2;
select from where 2
*
ERROR at line 1:
ORA-00936: missing expression

While this statement failed with a semantic error ? if the table NOT_A_TABLE existed and we had permission to access it, this statement would have succeeded:

SQL> select * from not_a_table;
select * from not_a_table
*
ERROR at line 1:
ORA-00942: table or view does not exist

That is the only way to really tell the difference between a semantic and syntactic error ? if the statement COULD have executed given the proper objects and privileges, you had a semantic error, otherwise if the statement could not execute under any circumstances, you
have a syntax error. Regardless ? Oracle will not execute the statement for you!

The next step in the parse operation is to see if the statement we are currently parsing has already in fact been processed by some other session. If it has ? we may be in luck here, we can skip the next two steps in the process, that of optimization and row source generation. If we can skip these next two steps in the process, we have done what is known as a Soft Parse ? a shorter process to getting our query going. If we cannot, if we must do all of the steps, we are performing what is known as a Hard Parse ? we must parse, optimize, generate the plan for the query. This distinction is very important. When developing our applications we want a very high percentage of our queries to be Soft Parsed ? to be able to skip the optimize/generate phases ? as they
are very CPU intensive as well as a point of contention (serialization). If we have to
Hard Parse a large percentage of our queries, our system will function slowly and in some cases ? not at all.

The way this sharing of SQL in Oracle is accomplished is via the shared pool, a piece of memory in the SGA maintained by Oracle. After Oracle parses the query and it passes the syntax and semantic checks ? it will look in the shared pool component of the SGA to see if that same exact query has already been processed by another session.

Since it has performed the semantic check it has already figured out:

o Exactly what tables are involved
o That we have access to the tables (the proper privileges are there)

And so on. Now, it can look at all of the queries in the shared pool that have already been parsed/optimized and generated to see if the work has already been done.


[Raghav]
As has been discussed here, the soft parsing of queries makes our application faster, since it doesn't have to do everything all over again (from parsing point of view).  Bind variables come to help in this case.

For example, a query like the following -

select a,b from tbl1 where c = 2333;

would be parsed and stored under a different identifier than

select a,b from tbl1 where c = 58763;

whereas, essentially they are same queries only, the literal value being different for the where condition.
In this case, the second query would go for hard parse, instead of our expectation of a soft parse. So, what could be done to tell oracle to think that these are actually same queries and there is no need to do the hard parse again. Well, you can do that by using a bind variable, in the sense, removing the only differentiating part of query by a runtime replacement.

Look at the following query -

select a,b from tbl1 where c = :bb;

now, this query can be used for both the examples above, at the runtime (after parsing), the variable :bb would be replaced by the respective literal value, and you have your answer.  However, this gains a lot of performance, since, the second run goes for a soft parse, instead of a hard parse, so, every subsequent run, after the first one, is re-using the parsing information collected the first time.

So, simple recommendation, use bind variables wherever you see a query being reused with different literal values, and use run time substitution of the literals.

Wednesday, April 28, 2010

AWS Application Demo

  • Amazon started with S3 (Simple Storage Service) - purely storage service.
  • For Computing purposes, Amazon started EC2 
  • EC2 allows hosting your application on virtual servers operated by Amazon, widely known as AWS.
  • For calculating the availability as a resource, 1 compute unit at Amazon EC2 is roughly equivalent to a 1.2 GHz Xeon server.
  • All computing resources are virtualized, none is physical. No physical details are ever published. However, the user has to manage his own file system. Based on need/request, disk space is mounted and is made available to the server. Further, it has to be managed by the application.
  • There can be three types of instances, 
    • Small
    • Medium
    • Large instances

Behavior on instances
  • like a normal web server
  • has a public web address

  • Java Software available to convert command line instructions into SOAP wrapped API calls to AWS
  • Need to download API tools from AWS
  • Key pairs are tied to regions
    • The benefit of this is that the servers are replicated across data centers located physically separately


Autoscaling service
  • Allows scaling of servers based on need
  • Consumes about 60-90 seconds to scale up based on the need (e.g. load on the server). If configured the appropriate way, the system will replicate the application code and launch new server instances within seconds to handle the extra load.
  • Handles scaling up as well as down both at the same time. Automaticaly, the extra/unused instances will be shut down and released from the application deployment when there is no load to require their service.
  • very useful in cases of spikes, specially high-peaks. New feature launches on sites, or sudden outbreak of news etc, cause spikes in server loads. At such times, AWS works perfectly to scale up the required computing power. The application owners are saved from buying extra servers for that 1 day load... :)
  • Its possibel to say how many servers to add
  • Such a scaling up/down can be configured. For example, it can be specifiied that request for new instances when current system's load exceeds 85% or scale down, when the usage goes below 20% used



OSs supported
  • Linux
  • Windows
  • openSolaris

Buzzwords
  • AMI = Amazon Machine Images
  • EBS = Elastic Block Device
  • Elastic IP = static ips assigned to instances
  • Amazon EC2 = Amazon Elastic Computing Cloud

Monday, April 26, 2010

Web Application Debugging Tools

A Talk by Matthew McCullogh


Some tools he talked about and demonstrated

1. tcpdump
      Captures tcp-ip packets during traffic
      Available as an open source project at http://www.tcpdump.org/
      Allows you to save the captured packet information into a file on disk, thereby allowing offline analysis.

tcpdump -i en1 -s0 -n -A

2. WireShark - visual interpretation of tcpdump captured data
       Once you capture packet information using tcpdump, you can get a better view of it, more readable using this tool. Again an open source project, available from http://www.wireshark.org/

3. netstat
      An all time *nix flavors favourite. Every system admin's first choice. Allows port information.
      Available by defautl with almost all port of all OSs. Sometimes differences are found in implementations pertaining to syntaxes, or switch names/usage.

4. curl  - Another all time favorite.
      1. Allows calling/initiating the http request from command line, saves a lot of time since no program or skeleton is needed to be created for testing http calls.

5. jMeter - designed to load test functional behavior and measure performance

6.  soapUI => mocking a server.
       Available from http://www.soapui.org/

7. JavaScript
      FireBug -  for firefox - Javascript functions/elements are easily visible and manageable. Debugging on the fly is allowed too.  Available as an addon for firefox.
      FireBug - Firebug implementation for Safari.

8. FireFinder- find item in DOM. Easily allow an item in the DOM of a given web page.

More details coming on the following -

9.  Visual Event

10. FireFocus

11. JASH - CLI for test javascript


buzzwords
- bpf - berkeley packet filter

Saturday, April 24, 2010

workshop day GIDS 2010 - 3 - Functional Programming

Well, for the third session of the day, my choice was clear, cloud computing discussion with Simone, from AWS. I had already had the taste of google cloud solution (app engine), and now I wanted to see the Amazon flavour. Quite horribly unfortunately, the internet link gave way and Simone's presentation plans went awry.

The guy was quite angry and it was so visible, so very visible. He tried to talk people through, plain talk talk talk it never settled in. I see people literally dozing off, and I guess he realized that too, since on more than one occasion, he instructed to keep the hall lights switched on. A bit of bad thing on part of Saltmarch/Airtel...

As a result, midway during the session, I left off, and thought of joining in the Flex-php workshop from adobe guys. However, somehow, I ended up entering the hall where Venkat was talking about functional programming.

And did I regret that decision, no chance. I guess, I have never enjoyed a programming language discussion so much, probably because I didnt get a chance to speak to such a learned speaker, and so many of keen and interested students (yeah, blame my college, I know, I do that myself.. all the time :)

It was a packed house, and fantastic atmosphere. For one thing, Venkat knew what he was talking about, and for other, I found that discussion took me back about 15 years to my college days. I had a strong feeling, why didnt I had an educational experience as nice as this. The teacher, the students, the atmosphere, whole place was like a wonderful experience. Though I had no clue what scala is before I entered the place, it never felt like a problem.

The way he talked about issues, the benefits, the arguments he presented, and I think more importantly, the way he engaged the audience in discussions, it was just wonderful. You might think that I am over reacting, but i dont care, I just loved this guy, this session.

He talked about programming paradigms, the way procedural languages were king one day, and the way they are looked at as if ancient history is being talked about today. And similarly, for today's generation OOPS is becoming more of a thing of the past, and new paradigms are coming over.

The way he propounded functional programming, the very idea of not allowing immutability, the way of thinking in terms of having to think of a solution without that.. amazing...

At more than one point in time, the people in the audience tried to contest the idea/need of another language, when we already have so many, and the need of learning another one. There were thoughts on the toughness of a programming language vs the other, and the way he answered.. fantastic... His point in this case was, its as good as a Karnataka person going to Punjab and saying, "These people are crazy, their language is so touch, I cant even read that". And it was a good laugh..

At more than one point in time, he compared different programming languages, their features and pros and cons of each of them. The kind of knowledge that he had, to be able to compare them, was amazing...

I have to say, without any doubt, that this session was my best experience at GIDS 2010. I loved this, probably due to my personal interest in the theory of computer science as such, but also because there probably wasnt any other speaker with that kind of personality and flair for talking.

Hats off to Saltmarch for inviting people like him too...

workshop day gids 2010 - 2 - paypal x

It went even better afterwards, Paypal has opened doors to their Payment methodology to developers. They call it PayPal X. They have published a complete SDK which allows developers to write independent programs that allow integration of paypal as a payment gateway in their e commerce applications.

Their idea is to popularize paypal and use its simplicity and capability in spreading its business reach. The whole team of PayPal was present at the workshop and the atmosphere was pretty nice. Khurram Khan started off by talking a bit on the background of financial transactions, further, Rangarajan took over and talked in detail.

Frankly, the session was interactive, because perhaps people are motivated by money, the chances of earning it, and the basic idea of linking real e-commerce to their websites by such low effort and simplicity. My personal interests in that area are purely out of curiosity, in the sense that I am more keen to know as to whats going on, then get down to the code details and start implementing it.

One thing that stood out for me is that paypal proposes to review the code of the developer before actually allowing to move it to production (live), but has no control thereafter. So, in essence, someone who intends to do something funny (read : wrong/illegal etc...) might pass the initial test by presenting a genuine case, and then once he;s live, he can go back and change the code. I feel that by allowing updations like this, paypal is losing a bit of control on its API usage by the people.

I would probably have built somekind of a dynamic filter or a pair matching kind of mechanism (or a checksum for that matter) which would change the moment the developer changes his code on production site. Every call to the paypal API should check this checksum/authentication token and should go through only if this checks also holds good. Well, its just a thought, and probably paypal has its own reasons not to enforce any such check, but, if I were Khurram, I would probably start like that, perhaps remove it later or something like that.

When I posed this to Khurram, he was saying that, the responsibility of doing anything illegal or wrong lies anyway with the developer or the site owner, so, paypal doenst really want to get in their way. They would be apprehanded sometime anyway, sooner or later. As much as I agree with his argument, I still think that paypal could probably play a role and possibly stop "wrong" things from happening to some extent, after all its happening through their infrastructure, although the ownership lies somewhere else.

Other than, this particular point, I think the SDK is pretty nice and ok. They allow Java, no special downloads etc required to start developing, besides the SDK of course, and that the API supports almost all kinds of operations.

I specially liked their idea of generalizing the paypal kind of services, where it comes to trust building. During one discussion a statement was said, I dont remember which one, but a paypal employee said that, "I am sure that when it comes to trusting someone over internet, you can trust paypal a lot more with your financial information than a relatively unknown website that you are using for the first time." That I agree with, and perhaps the Indian jinx of not using web for payments can be broken with a trusted guy on the net, paypal.

You trust one person, paypal, and the rest is handled by paypal. As far as this statement goes, there actually is a greater risk, what if paypal goes rogue ? It would then have all my information, all of it... well then you would have to trust someone.. right ?? or not ???

The workshop day @GIDS 2010 - I - cloud computing using Google App engine

Well, A developer day cant get better than that... And, whatever issues I mentioned regarding scheduling of the summit were gone up in air.

I had the choice to go attend the Cloud Computing/hosting workshop with Mathhew McCullogh. It was based on java, which I dont care about anyway. Not that I have any kind of disrespect for the language, but I strongly think that we ought to think beyond languages.

He perfectly showcased how easy it is to build and host you own web application on the google app engine cloud platform. Its hard to imagine how easy it has become with eclipse plugin for google app engine and web toolkit, to develop an application for the cloud and deploy it. With those tools installed and configured (its surprisingly easy task to do it anyway), its a button click to deploy your application to the cloud.

I am not a big fan of java, or of huge complex configuration files for that matter, but the way this app engine thing is coming pre-configured, it reminds me of ruby on rails. At least to start with. Its surprisingly easy to start developing with GWT (Google Web Toolkit) and Google App Engine.

I tried to replicate what he was doing/showing, and, remember I am no big shots in java, that last professional java work I did was about 8 years back, It was surprisingly easy. Just a few clicks and it was done. I have to say, it was infectious, as if I felt like delving more into java and go back to experimenting with it...

Thursday, April 22, 2010

Is that a workable idea ?

Actually, we are waiting for a new server to be installed and any new physical server installation takes some time, we all know that. The project would request it, there would be financials and then the order would be approved internally.
It would be placed with the vendor by purchasing, the vendor will ship it, and then a few days/weeks after, technicians will come over to install the stuff.

Then the local admins will get to work and configure it to be used by the respective teams. Another few days/weeks...

Just wondering, any given enterprise has hundreds ir not thousands computers, all classes, laptops, desktop development PCs, server class PCs and others...

How about taking off 10% of all these PCs' computing power to create a cloud computing environment (of course within the company environment only). An application can be installed which will make sure that a given percentage of computing power will go to only a particular resource.

A lot like how BOINC works. You install a client/manager app on ur PC and then u receive a piece of computation for your part of computing power.

That kind of computing environment can easily replace an average server's computing power.
The storage can easily be hooked up to a SAN.

I'd encourage a discussion on this... see what others think around the idea...

in reference to: Distributed computing - Wikipedia, the free encyclopedia (view on Google Sidewiki)

Fantastic Effort by SaltMarch

Well, i have to say it like that, The one day that I attended at the GIDS - 2010 was amazing. Probably I am reacting like that since its my first such summit, nonetheless the quality of speakers and the content they delivered was top of the ladder.

Amazing truely.

To get people of such caliber at one platform is a job in itself which saltmarch have done to perfection. I chose to be there for .Web day, and the kind of knowledge that was flowing around, technologies being talked about and the manner in which they were addressed, simply great.

We know some things as simple buzzwords, but we ought to realize that the buzzwords actually have lot more behind them. The way Simone (from AWS) put it, "Everybody in the confrerence hall would know what is Cloud computing, but I am sure that when I ask you what is it, you'd all come back with different definitions, and perhaps they are all correct as well". That statement sort of sums up the whole experience, we all know to some extent, or varying degree of perception what something is about, but unless we know it from the source, as they say, from the horse's mouth, its always a bit farther from the real reality.

I like that way the speaker and sessions were compiled, though a few of my favorite sessions were canceled (blame that volcano in Iceland for canceling all the flights), I still enjoyed a lot of it.

Personally me, interested in Ajax, came a lot closure to experimenting with it, and perhaps using it in professional environment as well. Thanks to people at Adobe booth there, I have a chance to try out the Adobe Flex Builder IDE, and check out the potential first hand.

Tomorrow, there is a workshop on php and Flex delivering RIA, and I cant wait to be there :)


However, as is the custom for people like me, to comment and find fault with something or other, after all we are humans, we are bound to make mistakes and other humans are bound to find them and report them publicly, like on this blog... :)

I believe the scheduling of some of the sessions could have been better. For example, I missed out on the PayPal X presentation, because I wanted to attend Marty Hall's session on evaluation of Ajax/JavaScript Libraries too. Now, I'd have to contend only with the video of that presentation, hoping that it would be made available. :)

I like the scheduling of the workshop day a lot better. It gives people like me options to attend sessions diff technologies ...

Looking forward to another exciting day of fun filled experiments that enhance know how...

in reference to: Great Indian Developer Summit :: Conference, Expo, and Awards on Java, .NET, Rich Web :: Saltmarch Media Summits (view on Google Sidewiki)

Wednesday, April 21, 2010

Web Changing the face of the world : Ramesh

Ramesh, a senior technologist/scientist from Adobe, talked about how and what has web 2.0 done to change the world, especially the way the information is treated/used/processed etc...

Here are some of his points -

Real Time Web

  • kind of content that is coming around
  • no more old docs
  • social networking
  • live collaboration
experiencing the web - how the web is experienced, the way its accessed, used etc.
  • Some tools are coming up in the VR world, which allows
  • augmented reality -> web apps capturing reality and using the inputs
proliferating devices
  • Already more web access from non PC devices
  • More devices are coming through which access web differently
democratization of data
  • Some govts have already started putting their data in public domain.
  • Lots of application development opportunities based on that data.
  • Cloud computing also coming around to provide processing power and applications for the data.
Buzzwords
  • ActionScript
  • ECMAScript
  • Flex

Quite interesting talk :) Learned person, Ramesh.

Marty Hall : Why Ajax : Using Ajax in ur web apps

Why WebApps - Ajax
  • Asynchronous JavaScript and XML => doesnt hold good anymore
  • allow the browser to utilize the time when the main page is gone tofetch the result of your initial app
  • Add/modify content on the webpage at runtime
  • trends -> more jobs on Ajax than any other technology - php, vb, asp etc...
Ajaxifying an Application
  • libraries - jQuery, ext-ja, dojo, yui, prototype, google closure, mootols
  • Browser allow calls to only the server from where the original code came in, due to firewall issues.

  • Situation 2 - Hybrid
  • Libraries to use for new apps - JSF 2.0, Struts 2.0, Spring MVC 3.
  • JSF is better because
  • integrated in main code - jquery needs u to write javascript
  • no javascript to write - a simple tag

  • Situation 3 - Hardcore Ajax
  • Google web toolkit
  • Use java everywhere
  • Write java code at front end -> compiles to javascript at run time
  • Write java code at back end-> provides facility of communication in terms of java objects
Buzzwords
  • Jboss Seam <-> JSF 2.0

Tuesday, April 20, 2010

@GIDS 2010 Bangalore

I am going to be there at the "Great Indian Developer Summit" in Bangalore April 20th to 23rd 2010.

Its hosted at the lush green IISc campus, in the J N Tata Auditorium. Various pieces of info are available from the organizer website -

www.saltmarch.com

http://www.developermarch.com/developersummit/


It looks like an amazing place to be, full of tech discussions, and hopefully knowledge too...