Saturday, February 04, 2006

Disaster recovery testing

Disaster recovery, business continuity and resilience are often confused.

To many companies the term business continuity relates to the ability of the organization to continue functioning following a major problem. Disaster recovery usually refers to the IT side of business continuity; to cope with a major problem affecting IT systems, covering both the data centre and the desktop. Resilience (also used mainly as an IT term) tends to refer to the ability to recover from a problem on the main site of the company, such as ensuring server machines have dual power supplies.

This therefore means that resilience can form a part of disaster recovery which in turn forms the IT part of business continuity. All companies needs a combination of the 3 areas above to be able to survive whatever is thrown at it.

Resilience in IT terms includes:

  • Dual power supplies for critical computers
  • Uninterruptible Power Supplies (UPS) for critical systems and network equipment
  • RAID or mirrored (which is also a type of RAID) disks to preserve important data
  • Clustered servers

Resilience goes further than just ensuring that IT equipment is protected (to a degree). It can also be applied to the wider areas of the company, for example, ensuring that more than one telephone supply company is used, multiple electricity suppliers provide the necessary power and that multiple ISPs (Internet Service Providers) are used. Each of these actions reduces (but does not eliminate) any risk to the business.

All the work employed to ensure resilience, disaster recovery and business continuity is wasted if everything is not tested correctly. Testing must be performed based on risks to the business, as these are the only types of risk that really matter. Although the items above concentrate on the IT aspects of any organisation, it is the business side that matters but there are few businesses today that would continue to exist without their computers and systems.