Saturday, November 05, 2005

Performance testing and disaster recovery

When looking at business continuity testing and disaster recovery testing, probably the last thing you think about is performance. The system’s performance is good or at least adequate in your main office, so why will it be any different in a backup or disaster recovery site? Most organisations that have a business continuity plan, test it in some way but do they test that not only can their systems run on the equipment at the disaster site but that they can run successfully, i.e. accurately and with the required performance?

The machines and systems in your home site have been tuned to provide the necessary performance. Running your systems on a different machine could well produce different performance, even if you think the specification of the machines is the same. Different makes of machine, even if quoted as being the same power can produce different results depending on other internal factors such as chipset, type and speed of memory and numerous other parts which can differ between machines.

1 Central machines/servers/mainframes

When looking at central machines it may not be adequate to use a backup machine that is the same specification as the original home machine. I was once working for a bank which had an arrangement to use a mainframe machine at a disaster site if required. This was fine for online activities but when it came to batch work we hit a problem. The machine was not a hot or even warm standby, therefore the operating system had to be loaded followed by the applications. All this took time and from the moment of a disaster to the time of getting the backup machine up and running we were looking at a minimum of 3 hours. That meant we had a backlog of at least 3 hours batch processing to contend with. When this was realised, during disaster recovery and performance testing, we ensured that our backup mainframe was a more powerful model than the machine at our home site.

2 Network

At the disaster site it is assumed there is a network. How fast is that network? Does it have all the same characteristics as the home network, ie routers, firewalls etc? Does it connect to everything you need in the same way as from the home site? Any differences may affect the performance of your systems.

3 Desktop

If the business side of your organisation has to relocate, what desktop machines will be available? Quite often, as companies acquire new and more powerful machines they move their old machines into the disaster recovery site. These old machines may have a slow processor or may have less memory than the new equipment and may not be powerful enough to run new applications that are now being used within the home site.
Not only does the hardware need to be capable of running the applications but the desktop machines must also be capable of running the necessary system software, such as the correct operating system (and version) and office products. If old machines are being used, these may have old operating systems on them and (for example) you can find that you are using functions in the latest version of Excel that are not available in the version of the backup machines.

4 Testing all systems

Whatever machines are to be made available at a disaster site they need to be tested to ensure they can run the necessary applications in the required time. If you have a contract with a 3rd party supplier to provide machines in the case of a disaster then it is essential that these machines are tested regularly to ensure they still provide adequate performance. I have seen situations where a company has been paying for a contract to supply equipment but the systems at the home site have changed and the disaster equipment is no longer valid.

5 Automated testing

When carrying out application testing at the home site many organisations use automated test tools to both record the tests that are being run and to monitor the performance of the systems. These tools can also be used to record and monitor the systems running on the machines at the disaster recovery site.

In summary, all business continuity and disaster recovery plans require comprehensive testing and that must include an element of performance testing. The faster the systems recover following a disaster, then the better the chances of the business surviving.