Wednesday, December 1, 2010
Rejecting records in Fact table loads - Informatica
Wednesday, November 17, 2010
Setting a useful command prompt in Unix
* \[ : begin a sequence of non-printing characters, which could be used to embed a terminal control sequence into the prompt
* \\ : a backslash
* \] : end a sequence of non-printing characters
* \a : an ASCII bell character (07)
* \@ : the current time in 12-hour am/pm format
* \d : the date in "Weekday Month Date" format (e.g., "Tue May 26")
* \D{format} : the format is passed to strftime(3) and the result is inserted into the prompt string; an empty format results in a locale-specific time representation. The braces are required
* \e : an ASCII escape character (033)
* \H : the hostname
* \h : the hostname up to the first '.'
* \j : the number of jobs currently managed by the shell
* \l : the basename of the shell’s terminal device name
* \n : newline
* \nnn : the character corresponding to the octal number nnn
* \r : carriage return
* \T : the current time in 12-hour HH:MM:SS format
* \t : the current time in 24-hour HH:MM:SS format
* \u : the username of the current user
* \s : the name of the shell, the basename of $0 (the portion following the final slash)
* \v : the version of bash (e.g., 2.00)
* \W : the basename of the current working directory
* \w : the current working directory
Monday, November 15, 2010
scripts and hash bang ( #! )
Sunday, October 31, 2010
A Note to New Consultants
- The consultant can function as a specialist or expert, In this role he must be more knowledgeable than the client. This implies a very narrow field of specialization, otherwise the client with his greater continuity of experience would be equally expert.
The consultant can function as a counselor or advisor on the process of decision making. This implies an expertise of a special kind, that of the psychotherapist. This is merely a particular kind of expertise in a particular field.
The most typical role for a consultant is that of auxiliary staff. This does not preclude any of the other roles mentioned before, but it does require a quite different emphasis.
All companies have staff capabilities of their own. Some of this staff is very good. Yet no company can afford to have standby staff adequate for any and all problems. This is why there is an opportunity for consultants. They fill the staff role that cannot be filled internally.
By definition this means that consultants are most useful on the unusual, the non-recurring, the unfamiliar problem. Outside consultants are also most useful where the problem is poorly defined and politically sensitive, but the correct decision is extremely important. Outside consultants get the tough, the important and the sensitive problems.
The natural function of a consultant is to reduce anxiety and uncertainty. Those are the conditions under which anxiety and uncertainty are greatest and where consultants are most likely to be hired.
Problem Definition - If this point of view is our starting point, then problem definition becomes extremely important.
- If the problem is incorrectly defined, then even its complete solution may not satisfy the client's perceived needs.
- If the problem is improperly defined, it may be beyond our ability to solve.
- Problem definition is a major test of professional ability. Outside consultants can frequently define problems in a more satisfactory fashion than internal staff, primarily because they are unencumbered with the historical perspective of the client and the resulting "house" definition.
A consultant's problem definition is the end of the assignment if the problem is not researchable. If the problem is not researchable, then the consultant is either a specialist-expert or a psychotherapist. Neither of these roles are suitable for the use of the resources of an organization such as The Boston Consulting Group.
A researchable problem is usually a problem that should be dealt with by a group approach. Data gathering and analysis requires differing skills and different levels of experience that can best be provided by a group. The insights into complex problems are usually best developed by verbal discussion and testing of alternate hypotheses.
Good research is far more than the application of intellect and common sense. It must start with a set of hypotheses to be explored. Otherwise, the mass of available data is chaotic and cannot be referenced to anything. Such starting hypotheses are often rejected and new ones substituted. This, however, does not change the process sequence of hypothesize / data gathering / analysis / validation / rehypothesize.
Great skill in interviewing and listening is required to do this. Our client starts his own analysis from some hypothesis or concept. We must understand this thoroughly and be able to play it back to him in detail or he does not feel that we understand the situation. Furthermore, we must be sure that we do not exclude any relevant data that may be volunteered. Yet we must formulate our own hypothesis.
Finally, we must be able to take our client through the steps required for him to translate his own perspective into the perspective we achieve as a result of our research. This requires a high order of personal empathy as well as developed teaching skills.
- The end result of a successful consulting assignment is not a single product. It is a new insight on the part of the client. It is also a commitment to take the required action to implement the new insights. Equally important, it is an acute awareness of the new problems and opportunities that are revealed by the new insights.
We fail if we do not get the client to act on his new insights. The client must implement the insights or we failed. It is our professional responsibility to see that there is implementation whether we do it or the client does it.
Much of the performance of a consultant depends upon the development of concepts that extend beyond the client's perception of the world. This is not expertise and specialization. It is the exact opposite. It is an appreciation of how a wide variety of interacting factors are related. This appreciation must be more than an awareness. It must be an ability to quantify the interaction sufficiently to predict the consequences of altering the relationships.
Consultants have a unique opportunity to develop concepts since they are exposed to a wide range of situations in which they deal with relationships instead of techniques. This mastery of concepts is probably the most essential characteristic for true professional excellence.
A successful consultant is first of all a perceptive and sensitive analyst. He must be in order to define a complex problem in the client's terms with inadequate data. This requires highly developed interpersonal intuitions even before the analysis begins.
His analytical thinking must be rigorous and logical, or he will commit himself to the undoable or the unuseful assignment. Whatever his other strengths, he must be the effective and respected organizer of group activities which are both complex and difficult to coordinate. Failure in this is to fall into the restricted role of the specialist.
[raghav] The first time I have read that a specialist role can be restrictive, and honestly, when you think about it again, it does come back as a correct statement, specially in the wider world of other opportunities. Specially for a management consultant.
In defining the problem, the effective consultant must have the courage and the initiative to state his convictions and press the client for acceptance and resolution of the problem as defined. The client expects the consultant to have the strength of his convictions if he is to be dependent upon him. Consultants who are unskilled at this are often liked and respected but employed only as counselors, not as true management consultants.
The successful professional inevitably must be both self-disciplined and rigorous in his data gathering as well as highly cooperative as a member of a case team.
The continuing client relationship requires a sustained and highly developed empathy with the client representative. Inability to do this is disqualifying for the more significant roles in management consulting.
- Identifies his client's significant problems;
- Persuades his client to act on the problems by researching them;
- Organizes a diversified task force of his own firm and coordinates its activity;
- Fully utilizes the insights and staff work available in his client's organization;
- Uses the full conceptual power of his own project team;
- Successfully transmits his findings to the client and sees that they are implemented;
- Identifies the succeeding problems and maintains the client relationship;
- Fully satisfies the client expectations that he raised;
- Does all these things within a framework of the time and cost constraints imposed by himself or the client.
Thursday, September 30, 2010
my experiments with solr :)
I came across hadoop, when I was looking for a new solution for one of our in-house projects. The need was quite clear, however, the solution had to be dramatically different.
The one statement we received from business was, "We need an exceptionally fast search interface". And for that fast interface to search upon they had more than a hundred million rows worth of data in a popular RDBMS.
So, when I sat about thinking, how to make a fast search application, the first thing that came to my mind was, Google. Actually, whenever we talk about speed or performance of web sites, Google is invariably the first name that comes across.
Further, Google has a plus point that there is always some activity at the back end to generate the page or results that we see, its never static content. And, then, another point, Google has a few trillion pieces of information to store/index/search whereas our system was going to have significantly lower volume of data to manage. So, going with that, Google looked like a very good benchmark for this fast search application.
Then I started to look for "How Google generates that kind of performance". There are quite a few pages on the web talking about just that. But, probably none of them has the definitive/authoritative view on Google's technology or for that matter the insider's view on how it actually does what it does so fast.
Some pages pointed towards their storage technology, some talked about their indexing technology, some about their access to huge volumes of high performance hardware and what not...
For me, some of them turned out to be genuinely interesting, one of them was the indexing technology. There has to be a decent indexing mechanism to which the crawler's would feed and the search algorithms hit. The storage efficiency is probably the next thing to come in the play. How fast can they access the corresponding item ?
Another of my observation is that, the search results (the page mentioning page titles and stuff) comes real fast, mostly less than 0.25 seconds, but the click on the links does take some time. So, I think it has to be their indexing methodology that plays the bigger role.
With that in mind, I sat about finding what can do similar things and how much of Google's behaviour they can simulate/implement.
Then I found Hadoop project on apache (http://hadoop.apache.org/) which to a large extent reflects the way Google kind of system would work. It provides distributed computing(hadoop core), it provides a bigTable kind of database (hbase), provides map/reduce layer, and more. Reading into it more, I figured out that this system is nice for a batch processing kind for mechanism, but not for our need of real time search.
Then I found solr(http://lucene.apache.org/solr/), a full text search engine under Apache Lucene. It is a java written, xml indexing based genuinely fast search engine. It provides many features that we normally wish for in more commercial applications, an being from apache, I would like to think of it as much more reliable and stable than compared to many others.
When we sat about doing a Proof of Concept with it, I figured out a few things –
• It supports only one schema, as in, rdbms tables – only one. So, basically you would have to denormalize all your content to fit into this one flat structure.
• It supports interactions with the server interface only through http methods be it the standard methods get/put etc or be it REST like interfaces.
• It allows you loading data in varying formats, through xml documents, through delimited formats and through db interactions as well.
• It has support for clustering as well. Either you can host it on top of something like hadoop or you can just configure it to do it within solr as well.
• It supports things like expression and function based searches
• It supports faceting
• Extensive caching and “partitioning” features.
Besides other features, the kind of performance without any specific tuning efforts made me think of it as a viable solution.
In a nutshell, I loaded around 50 million rows on a “old” Pentium-D powered desktop box with 3 GB RAM running ubutnu 10.04 server edition (64 bit) with two local hard disks configured over a logical volume manager.
The loading performance was not quite great. Though its not that bad either. I was able to load a few million rows (in a file that was sized about 6 GB) in about 45 minutes when the file was on the same file system.
In return, it gave me query performances in the range of 2-4 seconds for the first query. For subsequent re-runs of the same query (within a span of an hour or so), it came back in approx 1-2 milliseconds. I would like to think that its pretty great performance given the kind of hardware I was running upon, and the kind of tuning effort I put in (basically none – zero, I just ran the default configuration).
Given that, I wont say that I have found the equivalent or replacement of Google’s search for our system, but yeah, we should be doing pretty good with this.
Although there is more testing and experimentation that is required to be able to judge solr better, the initial tests look pretty good.. pretty much in line with the experiences of others who are using it.
Sunday, September 12, 2010
Business & Open Source - How both can benefit
I have always felt that for open source projects/products to become commercially viable for a business enterprise, the enterprise has to come up and spend some resources to it to get the actual value out of it.
In other words, if an organization wants to use an open source product, which has an equivalent competitive commercial product available in market, they should be open enough to have their own in-house people who can take ownership of the installation. The organization shouldn't completely rely on the support available from the community forums and such.
I have seen more than one manager complain about the lack of support on the open source products. Had there been proper support system for each of the open source products, we'd see a lot of stories similar to mysql's model or pentaho model.
What I would like to see perhaps is that the organizations' becoming mature enough in their adaptation of the open source products. By that, I expect them to have a open vision, have people who understand and like and own the product, and at the same time tweak and tune the product to suit the organization's business needs.
In the process, the organization should contribute to the product's development cycle. This could happen in many ways, bug fixes, contribution of new features, the employees could contribute on community forums and such. Using the terminology from peer to peer sharing, only leechers dont help a torrent, people need to seed to it as well. Same way, unless organizations contribute to an open source product, they would stand to become only leechers.
Only after we have a decent balance of organizations using and contributing to the open source products, we'd see the ecosystem flourishing...
Thursday, September 9, 2010
Tips for brainstorming...
Interesting read, from both positive and negative viewpoints -
1. Use brainstorming to combine and extend ideas, not just to harvest ideas.
2. Don't bother if people live in fear.
3. Do individual brainstorming before and after group sessions.
4. Brainstorming sessions are worthless unless they are woven with other work practices.
5. Brainstorming requires skill and experience both to do and, especially, to facilitate.
6. A good brainstorming session is competitive—in the right way.
7. Use brainstorming sessions for more than just generating good ideas.
8. Follow the rules, or don't call it a brainstorm.
Read more here - http://www.businessweek.com/
"8. Follow the rules, or don't call it a brainstorm."
- Eight Tips for Better Brainstorming (view on Google Sidewiki)
Wednesday, September 8, 2010
Big help...
SELECT table_schema,table_name,
(data_length+index_length)/
(index_length)/1024/1024 as index_mb, CURDATE() AS today
FROM information_schema.tables
WHERE table_schema='mySchemaName'
ORDER BY 7 DESC
Thanks Ron...
in reference to: Calculating your database size | MySQL Expert | MySQL Performance | MySQL Consulting (view on Google Sidewiki)
Wednesday, August 25, 2010
I also feel like saying, 1984...
1. an iphone that has been hacked to work outside the contract with which it was sold, read "jailbroken"
2. an iphone that is perhaps being used by someone other than the person who registered the first heartbeat or facial recognition info..
Apple intends to capture the phone location using GPS/other tech and perhaps control the device remotely if they feel its being used "unauthorized"..
i agree with people who remember 1984 after reading apple's intentions...ha.. time does come back...George Orwell.. were u too right ??
in reference to: Apple to make iPhone theft-proof - Hardware - Infotech - The Economic Times (view on Google Sidewiki)
Monday, August 2, 2010
Country General Mood using Tweets
http://www.iq.harvard.edu/
I quote - (with all credits where its due, none to me...)
A group of researchers from Northeastern and Harvard universities have gathered enough data from Twitter to give us all a snapshot of how U.S. residents feel throughout a typical day or week.
Not only did they analyze the sentiments we collectively expressed in 300 million tweets over three years against a scholarly word list, these researchers also mashed up that data with information from the U.S. Census Bureau, the Google Maps API and more. What they ended up with was a fascinating visualization showing the pulse of our nation, our very moods as they fluctuate over time.
The researchers have put this information into density-preserving cartograms, maps that take the volume of tweets into account when representing land area. In other words, in areas where there are more tweets, those spots on the map will appear larger than they do in real life.
A apparantly public domain result of the analysis is available here -
http://cdn.mashable.com/wp-
Wednesday, July 28, 2010
Oracle Count(1) vs Count(*)
Well, it might have been an everlasting discussion about which one of these to use, count(1) or count(*).
I guess, this article of Thomas Kyte already clarified the situation long long ago (well, for IT industry 2005 is long ago anyway, especially given the speed at which we are moving.)
Essentially, what askTom says that, count(*) is better than count(1) since count(1) translates to count(*) internally anyway. I wonder then, why would someone want to use count(1) anyway.
There is at least one more step involved in getting to the actual result. And there is another possible tweak, count(1) has to evaluate an expression as well, "count(1) where 1 is not null". Though its a tautology equivalent, it has to be evaluated nonetheless.
Further, there was some misconception about how the result is returned, whether its read from the data dictionary, this view or table or something like that. I dont think so. The result is calculated at the exact run time,when the query is run, and it actually goes ahead and counts the records in the table.
Should set the record straight...
Friday, July 23, 2010
Developing a Rails application using an existing database
Initially we needed the basic CRUD screens for some tables. Being lazy (i m really proud of that), I set out finding if there a solution that generates the forms (read views) for the existing tables/models.
I have already managed to generate models/schema.rb using another gem. This is called magic_model. Read more about that here
Then google helped me find this another gem called scaffold_form_generator which generates the necessary views/forms for a given model. However, there need to be some improvements required on that (I think). perhaps I would contribute something (if I find out enough on how to do that)
Well, for the moment, I am struggling with handling of the missing special meaning column from the legacy tables. Will continue writing on this...
Wednesday, July 21, 2010
Making rails talk to SQL Server Express Edition
Make an odbc connection - pointing to the db server, preferably make a sql server authentication user
Install the following gems
- activerecord-odbc-adapter (2.0)
- activerecord-sqlserver-adapter (2.3.8)
- dbd-odbc (0.2.5)
- dbi (0.4.5)
- deprecated (2.0.1)
Further, I copied odbc.so and odbc_utf8.so files from http://www.ch-werner.de/
My database.yml file looks like this -
development:
adapter: sqlserver
mode: ODBC
dsn:
username: myUserName
password: myPassword
Friday, July 16, 2010
winscp vs cuteftp
I recently had an experience of transferring relatively larger files from a windows box to linux one.
I had some files which were ranging from 2.1 GB to 4.x GB. The company traditionally uses cuteftp to transfer (read ftp) files across servers. So I started with that anyway.
However, all my transfer attempts were failing. After transferring 2 GB of data, cuteftp would end the transfer and give away some or other error message. I figured that this was happening only to files which were larger in size. And this transfer consumed more than hour or so before it failed.
Google couldnt help a lot, so, finally i got down to winscp. And, it worked out so well. Not only the transfers didnt fail, but they finished within minutes, instead of the hours and hours spent by cuteftp.
Perhaps its some setting somewhere in cuteftp that I couldnt find, but the whole experience has left me more inclined towards the open source community.
Thanks a ton guys .. :)
Wednesday, July 14, 2010
Trying to become a Data warehousing architect... !!!
- Describe advantages of the CIF architecture versus the bus architecture with conformed dimensions. Which would fit best in our environment given [some parameters they give you] and why
- Describe snow-flaking
- Describe fact-less fact tables.
- Draw a star schema of our business
- Describe common optimization techniques applied at the data model level
- How do you handle data rejects in a warehouse architecture?
- Describe common techniques for loading from the staging area to the warehouse when you only have a small window.
- How do you load type 1 dimensions
- How do you load type 2 dimensions, and how would you load it given our [insert business particularity]
- How would you model unbalanced hierarchies
- How would you model cyclic relations
- What major elements would you include in an audit model?
- How would you implement trace-ability ?
Thursday, May 13, 2010
Parallel Querying using Oracle
Perhaps a long awaited feature, this one is clearly the one that I like most.
Using pre-built packages its now possible to break a single (relatively heavy) operation into multiple small operations, which run in parallel. The benefits are obvious, and cant be ignored.
I am yet to ascertain the usability of such a feature with tools like informatica or pentaho, but I am quite sure a lot can be achieved, especially in the direction of updates to huge tables in data warehouses.
And looking at the implementation, its genuinely simple and straightforward.
Would be nice to see more applications utilizing the benefits of such features..
Read more on Oracle magazine, article written by Steven Feuerstein...
Thursday, May 6, 2010
Ubuntu 10.04 is here.... Upgrade today...
Its only been a few hours since I have had the pleasure to upgrade to the latest version of Ubuntu.
And without any doubt, or any detailed analysis/investigation I can say this much for sure -
- The upgrade process was super easy. No hassles whatsoever, you just need to have a decent internet connection, few times you need to confirm the suggested decision, and done...
- The new system looks cleaner, much much cleaner. Its neat, to some extent beautiful. Its not about the background or themes or anything like that, but the overall look and feel is genuinely cool. Perhaps its to do with one thing that I have done on my part. My system font is "Lucida Grande" all across. And, for some reason the shapes and looks on this font are far better than anyone else I have encountered.
- The boot time has come down a good notch.
Well, I guess this should be a good starting point for my review on the new version, lets see how it comes across further.
Friday, April 30, 2010
Implementing Cartesian Product in Informatica Mapping
However, when your requirements need this, there is no direct way to do it in informatica joiner transformation. Either you do it in the db side, by overriding your source qualifier sql statement and building it in there.
However, I have seen that some designers dont like to override sql statements, in such cases you'd have to implement it inside the mapping only. Here's a workaround for achieving that. Here goes -
- Read both the sources using their own source qualifiers, normally.
- For both of them, put in an Expression Transformation after the source
- In both the expression transformations, create an output port with a constant value. For Example, call it dummy1 for stream 1 and assign it a value -1. Similarly, a port would be created in the second pipeline, lets call it dummy2 and assign it a value -1.
- Now create a joiner transformation. Link ports [including the one that we created with a constant value] to the joiner from both the expressions.
- In the join condition, choose to compare the dummy columns.
- The rest of the joiner configuration would have to be like any other joiner. Nothing specific.
Before implementing the above solution, be sure to go back and check if its actually required for your application to have cartesian product !!!!
Thursday, April 29, 2010
Hard vs Soft Parse in Oracle...
So, shamelessly copying.. :)
[From AskTom]
Parsing
This is the first step in the processing of any statement in Oracle. Parsing is the act of breaking the submitted statement down into its component parts ? determining what type of statement it is (query, DML, DDL) and performing various checks on it.
The parsing process performs two main functions:
o Syntax Check: is the statement a valid one? Does it make sense given the SQL grammar documented in the SQL Reference Manual. Does it follow all of the rules for SQL.
o Semantic Analysis: Going beyond the syntax ? is the statement valid in light of the objects in the database (do the tables and columns referenced exist). Do you have access to the objects ? are the proper privileges in place? Are there ambiguities in the statement ?
For example if there are two tables T1 and T2 and both have a column X, the
query ?select X from T1, T2 where ?? is ambiguous, we don?t know which table to get X from. And so on.
So, you can think of parsing as basically a two step process, that of a syntax check to
check the validity of the statement and that of a semantic check ? to ensure the
statement can execute properly. The difference between the two types of checks are hard for you to see ? Oracle does not come back and say "it failed the syntax check", rather it returns the statement with a error code and message. So for example, this statement
fails with a syntax error:
While this statement failed with a semantic error ? if the table NOT_A_TABLE existed and we had permission to access it, this statement would have succeeded:
That is the only way to really tell the difference between a semantic and syntactic error ? if the statement COULD have executed given the proper objects and privileges, you had a semantic error, otherwise if the statement could not execute under any circumstances, you
have a syntax error. Regardless ? Oracle will not execute the statement for you!
The next step in the parse operation is to see if the statement we are currently parsing has already in fact been processed by some other session. If it has ? we may be in luck here, we can skip the next two steps in the process, that of optimization and row source generation. If we can skip these next two steps in the process, we have done what is known as a Soft Parse ? a shorter process to getting our query going. If we cannot, if we must do all of the steps, we are performing what is known as a Hard Parse ? we must parse, optimize, generate the plan for the query. This distinction is very important. When developing our applications we want a very high percentage of our queries to be Soft Parsed ? to be able to skip the optimize/generate phases ? as they
are very CPU intensive as well as a point of contention (serialization). If we have to
Hard Parse a large percentage of our queries, our system will function slowly and in some cases ? not at all.
The way this sharing of SQL in Oracle is accomplished is via the shared pool, a piece of memory in the SGA maintained by Oracle. After Oracle parses the query and it passes the syntax and semantic checks ? it will look in the shared pool component of the SGA to see if that same exact query has already been processed by another session.
Since it has performed the semantic check it has already figured out:
o Exactly what tables are involved
o That we have access to the tables (the proper privileges are there)
And so on. Now, it can look at all of the queries in the shared pool that have already been parsed/optimized and generated to see if the work has already been done.
Wednesday, April 28, 2010
AWS Application Demo
- Amazon started with S3 (Simple Storage Service) - purely storage service.
- For Computing purposes, Amazon started EC2
- EC2 allows hosting your application on virtual servers operated by Amazon, widely known as AWS.
- For calculating the availability as a resource, 1 compute unit at Amazon EC2 is roughly equivalent to a 1.2 GHz Xeon server.
- All computing resources are virtualized, none is physical. No physical details are ever published. However, the user has to manage his own file system. Based on need/request, disk space is mounted and is made available to the server. Further, it has to be managed by the application.
- There can be three types of instances,
- Small
- Medium
- Large instances
Behavior on instances
- like a normal web server
- has a public web address
- Java Software available to convert command line instructions into SOAP wrapped API calls to AWS
- Need to download API tools from AWS
- Key pairs are tied to regions
- The benefit of this is that the servers are replicated across data centers located physically separately
Autoscaling service
- Allows scaling of servers based on need
- Consumes about 60-90 seconds to scale up based on the need (e.g. load on the server). If configured the appropriate way, the system will replicate the application code and launch new server instances within seconds to handle the extra load.
- Handles scaling up as well as down both at the same time. Automaticaly, the extra/unused instances will be shut down and released from the application deployment when there is no load to require their service.
- very useful in cases of spikes, specially high-peaks. New feature launches on sites, or sudden outbreak of news etc, cause spikes in server loads. At such times, AWS works perfectly to scale up the required computing power. The application owners are saved from buying extra servers for that 1 day load... :)
- Its possibel to say how many servers to add
- Such a scaling up/down can be configured. For example, it can be specifiied that request for new instances when current system's load exceeds 85% or scale down, when the usage goes below 20% used
OSs supported
- Linux
- Windows
- openSolaris
Buzzwords
- AMI = Amazon Machine Images
- EBS = Elastic Block Device
- Elastic IP = static ips assigned to instances
- Amazon EC2 = Amazon Elastic Computing Cloud
Monday, April 26, 2010
Web Application Debugging Tools
Some tools he talked about and demonstrated
1. tcpdump
Captures tcp-ip packets during traffic
Available as an open source project at http://www.tcpdump.org/
Allows you to save the captured packet information into a file on disk, thereby allowing offline analysis.
tcpdump -i en1 -s0 -n -A
2. WireShark - visual interpretation of tcpdump captured data
Once you capture packet information using tcpdump, you can get a better view of it, more readable using this tool. Again an open source project, available from http://www.wireshark.org/
3. netstat
An all time *nix flavors favourite. Every system admin's first choice. Allows port information.
Available by defautl with almost all port of all OSs. Sometimes differences are found in implementations pertaining to syntaxes, or switch names/usage.
4. curl - Another all time favorite.
1. Allows calling/initiating the http request from command line, saves a lot of time since no program or skeleton is needed to be created for testing http calls.
5. jMeter - designed to load test functional behavior and measure performance
6. soapUI => mocking a server.
Available from http://www.soapui.org/
7. JavaScript
FireBug - for firefox - Javascript functions/elements are easily visible and manageable. Debugging on the fly is allowed too. Available as an addon for firefox.
FireBug - Firebug implementation for Safari.
8. FireFinder- find item in DOM. Easily allow an item in the DOM of a given web page.
More details coming on the following -
9. Visual Event
10. FireFocus
11. JASH - CLI for test javascript
buzzwords
- bpf - berkeley packet filter
Saturday, April 24, 2010
workshop day GIDS 2010 - 3 - Functional Programming
The guy was quite angry and it was so visible, so very visible. He tried to talk people through, plain talk talk talk it never settled in. I see people literally dozing off, and I guess he realized that too, since on more than one occasion, he instructed to keep the hall lights switched on. A bit of bad thing on part of Saltmarch/Airtel...
As a result, midway during the session, I left off, and thought of joining in the Flex-php workshop from adobe guys. However, somehow, I ended up entering the hall where Venkat was talking about functional programming.
And did I regret that decision, no chance. I guess, I have never enjoyed a programming language discussion so much, probably because I didnt get a chance to speak to such a learned speaker, and so many of keen and interested students (yeah, blame my college, I know, I do that myself.. all the time :)
It was a packed house, and fantastic atmosphere. For one thing, Venkat knew what he was talking about, and for other, I found that discussion took me back about 15 years to my college days. I had a strong feeling, why didnt I had an educational experience as nice as this. The teacher, the students, the atmosphere, whole place was like a wonderful experience. Though I had no clue what scala is before I entered the place, it never felt like a problem.
The way he talked about issues, the benefits, the arguments he presented, and I think more importantly, the way he engaged the audience in discussions, it was just wonderful. You might think that I am over reacting, but i dont care, I just loved this guy, this session.
He talked about programming paradigms, the way procedural languages were king one day, and the way they are looked at as if ancient history is being talked about today. And similarly, for today's generation OOPS is becoming more of a thing of the past, and new paradigms are coming over.
The way he propounded functional programming, the very idea of not allowing immutability, the way of thinking in terms of having to think of a solution without that.. amazing...
At more than one point in time, the people in the audience tried to contest the idea/need of another language, when we already have so many, and the need of learning another one. There were thoughts on the toughness of a programming language vs the other, and the way he answered.. fantastic... His point in this case was, its as good as a Karnataka person going to Punjab and saying, "These people are crazy, their language is so touch, I cant even read that". And it was a good laugh..
At more than one point in time, he compared different programming languages, their features and pros and cons of each of them. The kind of knowledge that he had, to be able to compare them, was amazing...
I have to say, without any doubt, that this session was my best experience at GIDS 2010. I loved this, probably due to my personal interest in the theory of computer science as such, but also because there probably wasnt any other speaker with that kind of personality and flair for talking.
Hats off to Saltmarch for inviting people like him too...
workshop day gids 2010 - 2 - paypal x
Their idea is to popularize paypal and use its simplicity and capability in spreading its business reach. The whole team of PayPal was present at the workshop and the atmosphere was pretty nice. Khurram Khan started off by talking a bit on the background of financial transactions, further, Rangarajan took over and talked in detail.
Frankly, the session was interactive, because perhaps people are motivated by money, the chances of earning it, and the basic idea of linking real e-commerce to their websites by such low effort and simplicity. My personal interests in that area are purely out of curiosity, in the sense that I am more keen to know as to whats going on, then get down to the code details and start implementing it.
One thing that stood out for me is that paypal proposes to review the code of the developer before actually allowing to move it to production (live), but has no control thereafter. So, in essence, someone who intends to do something funny (read : wrong/illegal etc...) might pass the initial test by presenting a genuine case, and then once he;s live, he can go back and change the code. I feel that by allowing updations like this, paypal is losing a bit of control on its API usage by the people.
I would probably have built somekind of a dynamic filter or a pair matching kind of mechanism (or a checksum for that matter) which would change the moment the developer changes his code on production site. Every call to the paypal API should check this checksum/authentication token and should go through only if this checks also holds good. Well, its just a thought, and probably paypal has its own reasons not to enforce any such check, but, if I were Khurram, I would probably start like that, perhaps remove it later or something like that.
When I posed this to Khurram, he was saying that, the responsibility of doing anything illegal or wrong lies anyway with the developer or the site owner, so, paypal doenst really want to get in their way. They would be apprehanded sometime anyway, sooner or later. As much as I agree with his argument, I still think that paypal could probably play a role and possibly stop "wrong" things from happening to some extent, after all its happening through their infrastructure, although the ownership lies somewhere else.
Other than, this particular point, I think the SDK is pretty nice and ok. They allow Java, no special downloads etc required to start developing, besides the SDK of course, and that the API supports almost all kinds of operations.
I specially liked their idea of generalizing the paypal kind of services, where it comes to trust building. During one discussion a statement was said, I dont remember which one, but a paypal employee said that, "I am sure that when it comes to trusting someone over internet, you can trust paypal a lot more with your financial information than a relatively unknown website that you are using for the first time." That I agree with, and perhaps the Indian jinx of not using web for payments can be broken with a trusted guy on the net, paypal.
You trust one person, paypal, and the rest is handled by paypal. As far as this statement goes, there actually is a greater risk, what if paypal goes rogue ? It would then have all my information, all of it... well then you would have to trust someone.. right ?? or not ???
The workshop day @GIDS 2010 - I - cloud computing using Google App engine
I had the choice to go attend the Cloud Computing/hosting workshop with Mathhew McCullogh. It was based on java, which I dont care about anyway. Not that I have any kind of disrespect for the language, but I strongly think that we ought to think beyond languages.
He perfectly showcased how easy it is to build and host you own web application on the google app engine cloud platform. Its hard to imagine how easy it has become with eclipse plugin for google app engine and web toolkit, to develop an application for the cloud and deploy it. With those tools installed and configured (its surprisingly easy task to do it anyway), its a button click to deploy your application to the cloud.
I am not a big fan of java, or of huge complex configuration files for that matter, but the way this app engine thing is coming pre-configured, it reminds me of ruby on rails. At least to start with. Its surprisingly easy to start developing with GWT (Google Web Toolkit) and Google App Engine.
I tried to replicate what he was doing/showing, and, remember I am no big shots in java, that last professional java work I did was about 8 years back, It was surprisingly easy. Just a few clicks and it was done. I have to say, it was infectious, as if I felt like delving more into java and go back to experimenting with it...
Thursday, April 22, 2010
Is that a workable idea ?
Actually, we are waiting for a new server to be installed and any new physical server installation takes some time, we all know that. The project would request it, there would be financials and then the order would be approved internally.
It would be placed with the vendor by purchasing, the vendor will ship it, and then a few days/weeks after, technicians will come over to install the stuff.
Then the local admins will get to work and configure it to be used by the respective teams. Another few days/weeks...
Just wondering, any given enterprise has hundreds ir not thousands computers, all classes, laptops, desktop development PCs, server class PCs and others...
How about taking off 10% of all these PCs' computing power to create a cloud computing environment (of course within the company environment only). An application can be installed which will make sure that a given percentage of computing power will go to only a particular resource.
A lot like how BOINC works. You install a client/manager app on ur PC and then u receive a piece of computation for your part of computing power.
That kind of computing environment can easily replace an average server's computing power.
The storage can easily be hooked up to a SAN.
I'd encourage a discussion on this... see what others think around the idea...
Fantastic Effort by SaltMarch
Well, i have to say it like that, The one day that I attended at the GIDS - 2010 was amazing. Probably I am reacting like that since its my first such summit, nonetheless the quality of speakers and the content they delivered was top of the ladder.
Amazing truely.
To get people of such caliber at one platform is a job in itself which saltmarch have done to perfection. I chose to be there for .Web day, and the kind of knowledge that was flowing around, technologies being talked about and the manner in which they were addressed, simply great.
We know some things as simple buzzwords, but we ought to realize that the buzzwords actually have lot more behind them. The way Simone (from AWS) put it, "Everybody in the confrerence hall would know what is Cloud computing, but I am sure that when I ask you what is it, you'd all come back with different definitions, and perhaps they are all correct as well". That statement sort of sums up the whole experience, we all know to some extent, or varying degree of perception what something is about, but unless we know it from the source, as they say, from the horse's mouth, its always a bit farther from the real reality.
I like that way the speaker and sessions were compiled, though a few of my favorite sessions were canceled (blame that volcano in Iceland for canceling all the flights), I still enjoyed a lot of it.
Personally me, interested in Ajax, came a lot closure to experimenting with it, and perhaps using it in professional environment as well. Thanks to people at Adobe booth there, I have a chance to try out the Adobe Flex Builder IDE, and check out the potential first hand.
Tomorrow, there is a workshop on php and Flex delivering RIA, and I cant wait to be there :)
However, as is the custom for people like me, to comment and find fault with something or other, after all we are humans, we are bound to make mistakes and other humans are bound to find them and report them publicly, like on this blog... :)
I believe the scheduling of some of the sessions could have been better. For example, I missed out on the PayPal X presentation, because I wanted to attend Marty Hall's session on evaluation of Ajax/JavaScript Libraries too. Now, I'd have to contend only with the video of that presentation, hoping that it would be made available. :)
I like the scheduling of the workshop day a lot better. It gives people like me options to attend sessions diff technologies ...
Looking forward to another exciting day of fun filled experiments that enhance know how...
Wednesday, April 21, 2010
Web Changing the face of the world : Ramesh
Here are some of his points -
Real Time Web
- kind of content that is coming around
- no more old docs
- social networking
- live collaboration
- Some tools are coming up in the VR world, which allows
- augmented reality -> web apps capturing reality and using the inputs
- Already more web access from non PC devices
- More devices are coming through which access web differently
- Some govts have already started putting their data in public domain.
- Lots of application development opportunities based on that data.
- Cloud computing also coming around to provide processing power and applications for the data.
- ActionScript
- ECMAScript
- Flex
Quite interesting talk :) Learned person, Ramesh.
Marty Hall : Why Ajax : Using Ajax in ur web apps
- Asynchronous JavaScript and XML => doesnt hold good anymore
- allow the browser to utilize the time when the main page is gone tofetch the result of your initial app
- Add/modify content on the webpage at runtime
- trends -> more jobs on Ajax than any other technology - php, vb, asp etc...
- libraries - jQuery, ext-ja, dojo, yui, prototype, google closure, mootols
- Browser allow calls to only the server from where the original code came in, due to firewall issues.
- Situation 2 - Hybrid
- Libraries to use for new apps - JSF 2.0, Struts 2.0, Spring MVC 3.
- JSF is better because
- integrated in main code - jquery needs u to write javascript
- no javascript to write - a simple tag
- Situation 3 - Hardcore Ajax
- Google web toolkit
- Use java everywhere
- Write java code at front end -> compiles to javascript at run time
- Write java code at back end-> provides facility of communication in terms of java objects
- Jboss Seam <-> JSF 2.0
Tuesday, April 20, 2010
@GIDS 2010 Bangalore
Its hosted at the lush green IISc campus, in the J N Tata Auditorium. Various pieces of info are available from the organizer website -
www.saltmarch.com
http://www.developermarch.com/developersummit/
It looks like an amazing place to be, full of tech discussions, and hopefully knowledge too...
Wednesday, March 31, 2010
Apple and India...
Well, this is sad, particularly for someone in India, who avoids getting cracked iPhones...
Only yesterday Apple and Airtel announced availability of iPhone 3GS in India and today I get to read this, that Apple is releasing a new/better hardware for iPhone coming summer.
So, basically, India as an iPhone market is going to lag behind the rest of world by at least one model this time around as well.
Sometimes, I dont understand Apple's marketing strategy, specially in Indian context. As an Economy, India is probably the largest in the emerging market economies closely behind China. That statement should alone prove to be a "lucrative market" for any company which aims for growth.
This is specially interesting when we look at the number of iPhones already used in India with cracked OS's.
This doenst help a company's revenues if the product is consumed in a market where the company doesn't officially sell the product.
Also, the number of iPhones used in India, should give Apple an indication about the marketability of their product in this market. Personally, I think as a product from Apple's stable, iPhone has a lot of glamour linked to it.
And therefore, huge possibilities in India for Apple.
The new iPhone's rumoured specs look stunning, 5 MP camera, better display resolution, more processing power, possible multi tasking, video calling camera and few other things. After reading that, my plans to order an iPhone 3GS have taken a back seat. I want to see this product coming to market, further to see, when does it come to India.
Apple seems to listening to customers.. the kind of improvements rumoured is possibly the exact list of things someone might say are downsides of iPhone...
Good luck Apple, and India, hoping for a better collaboration this time around...
Thursday, March 18, 2010
Explain plan of Infobright
We all knw that Infobright is based on MySql. However, when I started with Infobright, I was equally unaware of both.
Today I got to know about Infobright's explain plan collection technique, and realized how different it is from Oracle's.
A typical explain plan from oracle talks about the path it follows to retrieve the data, and somehow makes common sense to read a plan.
On the other hand, an explain plan from Infobright looks nothing like the Oracle's plan. Its a set of cryptic rows put together. An example of such a plan -
2010-03-18 01:31:01 [4] T:-1 = TABLE_ALIAS(T:0,"myTableName")
T:-2 = TMP_TABLE(T:-1)
VC:-2.0 = CREATE_VC(T:-2,PHYS_COL(T:-1,
A:-1 = T:-2.ADD_COLUMN(VC:-2.0,AVG,"
A:-2 = T:-2.ADD_COLUMN(VC:-2.0,MAX,"
A:-3 = T:-2.ADD_COLUMN(VC:-2.0,MIN,"
A:-4 = T:-2.ADD_COLUMN(,COUNT,"count(
VC:-2.1 = CREATE_VC(T:-2,PHYS_COL(T:-1,
A:-5 = T:-2.ADD_COLUMN(VC:-2.1,GROUP_
VC:-2.2 = CREATE_VC(T:-2,PHYS_COL(T:-1,
A:-6 = T:-2.ADD_COLUMN(VC:-2.2,GROUP_
VC:-2.3 = CREATE_VC(T:-2,PHYS_COL(T:-1,
A:-7 = T:-2.ADD_COLUMN(VC:-2.3,GROUP_
VC:-2.4 = CREATE_VC(T:-2,PHYS_COL(T:-1,
A:-8 = T:-2.ADD_COLUMN(VC:-2.4,GROUP_
VC:-2.5 = CREATE_VC(T:-2,PHYS_COL(T:-1,
A:-9 = T:-2.ADD_COLUMN(VC:-2.5,GROUP_
VC:-2.6 = CREATE_VC(T:-2,PHYS_COL(T:-1,
A:-10 = T:-2.ADD_COLUMN(VC:-2.6,GROUP_
VC:-2.7 = CREATE_VC(T:-2,PHYS_COL(T:-2,
T:-2.ADD_ORDER(VC:-2.7,DESC)
RESULT(T:-2)
And, guess what, to generate a plan, you have to change a setting in the initialization parameters file, and then bounce the db. I wonder if there have been any instances when someone had to look into production database's performance and therefore needed to look at some plan.
Being as new a system as it is, I cant really complain, since such things of maturity will slowly creep in. The focus right now is probably elsewhere, building the functionality, and probably performance.
Friday, February 19, 2010
Discovered BOINC 5 years ago, joined today :)
It was George, a colleague at Axa Tech in Switzerland who first introduced me to this idea. I was thrilled alright, but didnt join right away.
Today I did.
Its a nice feeling, and a feeling of giving back something to science.
http://en.wikipedia.org/wiki/
http://boinc.berkeley.edu/
I would not go into details how it works and all that. However, I would have to say, for whatever time your laptop/workstation remains idle, you can donate the processing power to a better cause. Absolutely fantastic idea, genuinely noble.
And then, look a the stats, people like me are contributing to the processing power generated to the extent of 4 pFLOPS, as against the fastest supercomputer system at mere 1.79 pFLOPS.
As I said, feels great. :)
I hope to inspire others to join in...
Tuesday, February 16, 2010
Connecting to an Oracle Server...which way is faster...
However, in a recent situation, I had to connect to a remote server from my client system and was wondering whether I should connect using tnsnames.ora or should I use the basic authentication mechanism, using the complete connect string.
I thought that the tnsnames.ora would incur some latency in connections. On the other hand, connecting directly without using any such files is like doing a telnet to the server on a given port.
Also, I have seen that if you attempt to connect using the complete connect string from the tnsnames.ora file, it also works.
Since I was not clear myself, I did post this question on oracle forums. And a gentleman, Billy Verreynne took out some time to help explain the concept. Many thanks to him.
As it stands, the basic methodology of connection is same in both cases. However, if you happen to have a huge tnsnames.ora file, it would be an overhead to parse the file and get the connect string for the service name you attempted to connect to. But it would only make a difference, if your client connection process go into hundreds then it might make any noticeable degradation. Basically, client would have to parse the huge tnsnames file few hundred times per second.. then it might make a difference.
Billy also suggested that if you switch the server mode from dedicated to shared, it would help.
So, for my conclusions, its the same if you connect through tnsnames or through a native connect string.
Thursday, January 21, 2010
availability of native drivers !!!
The experience till now is mixed, in some scenarios, the data loading is very good, as much as 40k rows per second, whereas through other channels its a poorer 500 rows per second. When I go with their built in loader, its lightening fast, but when I try from Pentaho or Informatica, its measly.
Apparantly, the drivers and the compatibility of the third party tools do play a role in attaining the performance.
The fact that they dont have any native driver published, is a huge bottleneck. Informatica dont even have a native connector for MySql, the more known cousin of InfoBright (being open source, the core engine of InfoBright is based on MySql only).
One thing came out of this experience for sure, the liking for open source got better and better. During the hunt to see the reason for slow performance from Pentaho, I even tried and managed to get the source code of the plugin (transformation) that pentaho uses for Infobright. Its pure java and it was a very powerful and awakening feeling to see the code of the item. I felt as if I have the power, the choice to make a difference, make it better. :)
I am currently exploring other options, one of them includes creating a dump file (read : csv) of the data, and then launching the command line tool to load the data into target db. I dont like it, but lets see if there is any better (read:faster) way around...