"Why is this application so slow?"
-- DRAFT - REVISION PENDING --
very
Software Developer (or Engineer whichever term you prefer) building
large scale software has come face to face with the question: "Why
is it so slow?" This question could not be more vague. What does
"slow" mean? Slow in comparison to what?
This entry is about performance tunning and how to solve this elusive
"slow" problem.
Now a days, performance tuning is even harder to implement in our convoluted
N-tier systems. We build N-tier application, which run on more than
2 Application servers to better serve the users of our applications.
The abstraction that takes place to serve one single report into a users
web browser, is, but a work of art in my opinion.
The facility to expand and load balanced these servers is amazing. We
can prepare for increase in user operation load by splitting the resources
into multiple servers in different tiers (This without touching our
base code).
Needless to say that extendibility comes with a price measured in complexity.
Complexity grows in every aspect of the building process:
- Complexity of the Information System
- Complexity of the Software Architecture
- Complexity to measure performance
Following is an example where performance tuning paid off.
There is the famous saying of developers: "Build it so that it works
first, and optimize later." Following this advice will get you out
of trouble (almost) every time.
As I mentioned, our applications have grown in complexity and just making
the whole system work is a challenge in itself and performance tuning
is a different type of challenge. What do we need to optimize? Our systems
are N-tiers system, and there is definitely something that can be fine
tuned in of those Ns.
For example, this is a typical system for Business Intelligence analysis:
N-tier system generating reports to be distributed to thousands of users
via thin client (Web browser).
This sentence alone, abstracts the whole architecture behind the whole
solutions, which could be comprise of:
- Oracle 9i database system
- Brio On Demand Server
- Brio Authentication Server
- IIS Web Server
- PlumTree portal Application
- Brio Intelligence Report builder
- IE web browser
- Brio Intelligence Plugin
- VBS scripting language
So, you are probably asking "All that to generate one report?"
Indeed, all that to generate one report. Of course, this reports are
vital to any business and this one report needs to be distributed to
tens of thousands of people. Sometimes hourly, daily, weekly, monthly,
etc. And all this users view the same report almost at the same time.
It's is pretty common to have a set of reports ready to view at the
beginning of each week (Usually Monday mornings). As a tactical business
head honcho, you are relying in the delivery of this reports so your
co-workers make you more money. If one of this reports fails to generate,
the bottom line suffers. So, yes. All that infrastructure is needed
to server "one" report.
Note that the power of the Information Systems is not how many more
machines and software components we can throw into the mix. The power
is in the questions that are being answered by that "one"
report.
So, now that the whole infrastructure is in place to serve the "one"
report, one or two stake holders says: "The system is great and
everything, but, but why is it so slow?"
If you are trying to solve this "slow" problem, first you
need to understand what "slow" really means.
Assume the system is brand new, so the user has no concept of slow or
fast. The user has only attention span gaps. To a normal computer user,
3 seconds waiting for results is long enough. More than that, and our
user is lost in la-la land. There are studies supporting the statement
I just made, this is nothing new. Jacob
Nielsen has done some studies about it.
In our particular case, slow means, downloading a report through the
internal portal (100 Mbps network) and taking over 3 minutes. Obviously,
this won't do in the current setting: thousands of user viewing and
interacting with the same report (It was a Brio Scorecard application).
Note that in a different situation, 3 minutes would have been acceptable
- For example, downloading an image from an outer space satellite -
3 minutes is not long at all.
Back to our "slow" report. The report in question was 9 MB
in size. It was designed and implemented to work first and the darn
report kept growing and growing. There was no mistake made on the design
nor implementation. The system was built to work first, but, now it
was time to optimize - Here is where I came in the picture for this
particular issue.
Ok, so what do we need to optimize?
As explained above, there are many components in the equation (Remember
all those N tiers). However, optimizing one variable will not solve
the whole problem. For example, it is not viable to throw in a faster
web server. We can't suggest to change the pipes of delivery to a fiber
optic channel. Nor, we can tell the user to click on the "view
report" button, go for lunch and come back 15 minutes later.
First of all, changes of infrastructure are very expensive and if you
(as a Software solution giver) suggests this route after some consideration,
you are missing a big chunk of the existing picture (Albeit, there are
times that the only way to improve "perceive performance"
is through a change of delivery infrastructure. I.e. faster server,
bigger pipes, or faster client machines). It is very unrealistic to
change work habits of workers. Software, is easier adapted to the needs
of the user. Hence, there must be something else that can be done.
I had a specific problem, and I had many components to look at. I spent
between 5-10 days looking at the current solution and understanding
what each component did - In the current environment - Tools are tools
and are used differently - I did know what a web server does :)
Anyway, I didn't design the system, I was a new comer and I had to spend
the time to understand what others had done with those tools. Let me
tell you, if you are technical manager who pays consultants to do this
type of work, this is money and time well spent. To fix something, one
must understand what is being fixed and the only way to wrap your mind
around a problem is by looking at it and tinkering with it.
I put the proposal for the solution together. I had proposed, to leave
all components alone and pre-generate all the reports and publish them
to the web portal for user consumption. Nothing new here, this is batch
computing.
My solution called for a Java Application to reuse existing databases
to automated the generation of all reports. Minimum maintenance. If
an employee was added/removed, my solution would pick the changes at
run time. We would schedule the application to run monthly (Via preferred
method: cron, Windows scheduler, Brio Scheduler, etc), and all reports
would be pre-generated.
That was the proposal, and the solution was accepted.
Of course, I spent the first week understanding their environment, and
an "enhanced" proposal popped in my head: Don't use a Java
Application, use VB Script and ActiveX to instantiate the executable
generating the reports, and have a master file (.BQY file) generate
all reports from within itself.
What was the impact?
I had less design and implementation to do. No Java Application was
required. I was to use all existing technology to solve the problem.
Which by the way, was delivering a 9 MB file to thousands of user, who,
potentially, could have been looking at the same report at the same
time.
So, pre-generating the report was clever, but now all that performance
gain in the user eyes, had to be lost somewhere else. I was reminded
of the conservation of energy law: Energy cannot be created, nor destroyed,
only transformed.
In our case, it meant that the processing of this report was to be done
monthly on some hidden and dark computer room in well secured computing
facilities. After making the optimization work, it took around 8 hours
to generated X amount of reports (Sorry no details - But, the number
was in the thousands). It is not a long time if you think about it,
but, the stake holder once again, asked the question: "Why is it
so slow?"
It is typical that after one optimizing step, which solved the issue
of serving the original 9 MB report in 3 minutes (Each report was server
under 15 seconds after I was completed), we had more optimization to
do.
The question now becomes of economics and the primordial query: "Is
it good enough for our purposes?"
I'm of the point of view that anything and everything is possible. It's
all a matter of how much money you have and how much time you have to
spend. Obviously, this two resources are in short supply in some cases,
so an executive decision has to be made: Do we go through more optimizing
steps, or not?
Since the environment was composed of many components and one of them
was a RDBMS (Oracle 9i), my consulting services were out of
the picture. I'm well versed with DB matters and optimization of queries
issues, however there was a full time DBA to handle the situation. The
next optimization step is of course, to look at the report generation.
Analyze DB queries at each processing instance and investigate if more
gains can be made by optimizing the raw SQL. In this case, there was
an instance and a minor change to the processing had to be made. Each
report can now be generated every 5 seconds. This is not bad at all,
considering that this reports get generated monthly, spending 5 hours
per month is a non-issue.
Of course, if the number of reports to be generated grows to be millions,
the current solution could still be used with minor configuration changes.
I.e. Distribute the load among different machines. The open ended architecture
of the optimization solutions, lends itself for distributed computing.
As I mentioned, there could be many optimizing steps, however, this
was the end of this cycle. The solution was: cost effective, re-used
of existing components was maximized, impact on concurrent development
efforts was non existent, and most importantly my solution solved the
problem of serving the "one" report to thousands of internal
users.
In summary,
- Optimizing is good whenever there is a "working" solution
- There is no point in optimizing development code - Wait until the
components are "production" quality and (sometimes) deployed
into a real environment
- Code with performance in mind. I.e. Do not create more objects than
you need. Use the most efficient algorithm you can design - Sometimes,
the most elegant algorithm is more complex than it needs to be and
it doesn't yield optimal performance
- First understand your domain (I.e. Data inputs) and then decide
on your methods, if you are thinking of changing working code
- Set an attainable and measurable goal: "Serve a 1 MB file in
10 to 15 seconds after the user clicks the link." Don't use:
"Sort array A of n elements and better than O(n log n) time."
- This is not possible.
- Never make performance promises you can't fulfil
- Last, but not least, every situation is different. Your performance
improving methodology will be different from case to case. I.e. Sometimes
fine tuning an application server is enough. Sometimes source code
needs to open and one particular algorithm needs to be revisited or
change entirely. Sometimes, the only way to increase performance is
by putting new hardware in place, etc, etc, etc.
This is by no means a complete guide on how to tackle optimization problems.
There are volumes of works relating to the performance topic, in some
web site/library near you. Google for example, had this
to say.
Finally, solving performance issues is fun. I for one, don't cringe when
I hear a stake holder ask the question: "Why is it so slow?"
I look directly in the eyes and ask back: "What do you mean slow?
Relative to what baseline is the current solution slow? Aha! if you have
no base line, lets create one, measure our results and let performance
tune this baby - There are many steps to optimization. Lets start with
...blah...blah...blah..." I think you get the picture.
|