drives | hard
drives | optical
storage media | right
storage solution for you | drive
| Exabyte |GST
| HP | IBM
| Overland |
| Sun | StorageTek
AIT Drives |
LTO Drives |
SDLT Drives |
DLT Drives | LTO
2 VS SDLT 320 | Drive
Backup and Recovery Solutions
for the HP 3000
Typically we think only in
terms of backup. We develop backup plans,
we monitor backup times and resources used,
we use metrics to reduce backup costs and
resource usage. We often do not even consider
why we are always backing up systemsrecovery.
In our attempts to be realistic, we focus
and gain expertise on backing up systems
efficiently or reducing the cost of backup.
We focus on making efficient use of backup
As long as we have a means of recovery,
we think we are safe. We do not question
whether restoring from backup alone can
meet our business-critical recovery requirements.
We do not realize that we may have inadvertently
compromised the ability to recover. We have
not investigated methods other than backup
that we can use to minimize the business
losses incurred when recovery is needed.
We have not fully assessed the time needed
to get our business-critical applications
back up and operational.
Up until five years ago, recovery of a
full system from a backup tape provided
acceptable recovery times. Systems were
still quite small and tape backup and recovery
took only one to two hours for full system
recovery. Tremendous changes over the last
five years have forced us to think differently
about recovery and backup.
What Is Your Cost of Downtime?
Thinking differently means thinking in terms
of cost of downtime per application. The
dollar amount exposes which application
sets are important to your business processes
and affects important investment decisions
regarding both high availability and recovery
and backup solutions. The amount includes
the cost of unplanned vs. planned downtime,
unavailability during peak usage hours vs.
off-hours, and the effects on profits when
an application is unavailable. Not all downtime
is equal. The costs of a system interrupt
caused by an OS failure, application abort,
or even a planned daily backup are different
from those of a longer term outage caused
by a disk failure that requires a reload
or a disaster that requires a data center
In determining cost of downtime for an
application, consider both direct and indirect
costs. Direct costs include idle employment
and manufacturing expenses, delayed business
processes, and direct profit losses and
penalties. We are usually good at identifying
the direct costs of downtime. We often do
not, however, recognize and account for
indirect costs, like negative impressions
on potential investors/partners/users and
damage to branch or agency communication.
In a recent survey of 150 HP 3000 users
in different industries, 45 percent had
no idea of their downtime cost. Of the 55
percent who did estimate their downtime,
most had not considered all the direct and
indirect costs, with the result that their
estimates were low.
Data Storage Trends
For today's IT organizations, server-based
storage requirements are growing at a rate
of 50 percent or even 100 percent per year.
This dramatic change in data storage capacity
requirements, and the need to manage storage
better, has occurred for several reasons.
In terms of size and storage capacity, system
capabilities have grown at dramatic rates.
Efforts to reduce costs, such as system
consolidations and distributed computing
environments (in which data storage for
many clients is centralized in server systems)
have caused the measurement unit of data
storage capacity to change from megabytes
Processing power has increased dramatically,
allowing applications to grow in complexity
and size. Disk storage is more affordable
and provides greater storage capacity per
device. Furthermore, incorporation of new
storage-intensive data types, such as imaging,
voice, and video, within many commercial/business
applications over the next few years will
cause data storage capacities to increase
The growth in storage capacity requirements
results in pressure to find backup/ recovery
solutions for large volumes of data. Increasingly,
Hewlett-Packard is being asked for backup/recovery
solutions for terabytes of data to high-end
The demand for continuous accessibility
to huge amounts of data, very often in the
terabyte range, is caused by the increased
use of the Internet and online services,
such as CompuServe and America Online, as
well as by the emergence of new applications,
such as imaging and multimedia. In the past,
mainframes were the primary storage devices
and data accessibility was a fairly simple
process. The only limitation was disk size.
Today, the answers to storage problems cannot
be provided simply by installing bigger
disks on a central server.
As users reengineer their businesses, many
are choosing to migrate off the mainframe
through downsizing. Mission-critical applications
are moving to client-server computing environments
consolidated across LANs and WANs. Huge
amounts of company-sensitive data, which
used to be located in the data center and
under central control, are now distributed
and available in the network. Today, in
many businesses, the amount of distributed
data has surpassed the amount of data in
the data center.
Companies must view storage management
as integral to their network solutions.
In addition to the challenge of managing
storage on distributed systems, IT managers
must deal with another issuethe amount
of data is outstripping the network's capacity
to handle it efficiently. For example, a
company might need to back up 100 gigabytes
of data in an hour. As the storage staff
looks for solutions, they see processor
performance improving faster than disk performance
(I/O), and both disk and processors outstripping
the installed network infrastructure (bandwidth).
The amount of data being moved from system
to system, across a network, or pumped to
backup devices is increasing. It is essential
to develop new ways of transferring and
storing large amounts of data without downtime.
Users are demanding that solutions minimize
downtime or inaccessibility to critical
data. They are looking for high availability
features in almost every solutionfrom
networks to backup/recovery solutions. Storage
management software needs to provide much
greater data availability and reliability
in a much more complex environment than
in the past. Today, store and restore of
data alone often cannot meet business-critical
application/data recovery requirements.
We need to look for other forms of recovery.
IT managers are being asked to reduce costs
of operations. Data storage management strategies
must be tuned to require minimal operator
intervention and make use of less expensive
storage devices. For this reason, users
are drawn to unattended backup solutions
and automated tape libraries. Many IT managers
are trying to lower costs by eliminating
backup altogether at branch offices and/or
Backup Schedules and Plans
Very early in a solution deployment, IT
managers must establish a backup/recovery
policy that provides the appropriate level
of data integrity and availability. The
backup/recovery policy must ensure that
critical data can be completely and quickly
recovered from a backup, even in the event
of a disaster.
Backup is the number #1 cause of application
planned downtime. Ascertain the length of
your backup window. How much planned downtime
can your business afford for backup? If
you cannot tolerate any planned downtime
and have chosen an online backup solution,
when are your periods of low application/system
usage and how long do they last? Your backup
window affects other decisions, such as
the speed and number of your backup devices,
and configuration policies, such as the
number of parallel stores and backup schedules.
If application availability allows, the
standard full/partial backup schedule is
a good one. An alternative may be to rotate
full and partial backups of major applications.
Each major application is backed up fully
on a different day, and other applications
are partially backed up in between other
You should have policies to minimize the
amount of data in each backup stream. Your
current backup probably includes reference
or archival data. By removing the reference
and archival data from the active data,
backup times can be reduced significantly.
Put your backup on a diet. Don't back up
data that doesn't need to be recovered or
will certainly be recovered. For example,
system data is recovered with SLT and FOS
tape, and STDLIST spoolfiles are unnecessary.
You can often recover nonproduction utilities
from other systems. Continually monitor
and remove files that are no longer needed
and archive them on less expensive media.
You can free up significant amounts of current
online data storage capacity through automated
data management, such as file compression,
trimming, and purging.
Recovery Plans and Methods
Your recovery plan should include specific
detailed recovery strategies and procedures
for all scenarios from which you will need
to recover, including, but not limited to,
system aborts, application aborts, disk
faults, power outage, network interruption,
system component faults, user and operator
errors, and disasters.
The first priority should be to get business-critical
applications available with minimal business
loss. An application's cost of downtime
drives this priority as well as your calculation
of the amount of downtime you can tolerate.
Ask yourself this question: Do I have the
appropriate recovery strategy for each application?
If not, look at ways of reducing the risk
and minimizing recovery time (for example,
Mirrored Disk/iX, Fast/Wide arrays, SharePlex),
reducing downtime due to backup (7x24 True-Online
Backup), using faster recovery/backup devices
(for example, DDS-3 or DLT), optimizing
backup/recovery configurations (for example,
user volume application sets, massive parallel
A recovery plan should be tested with the
operations staff that will implement the
recovery in a real disruption. Perform dry
runs of the recovery plan with different
scenarios. Test, review, and update your
plan regularly. A good recovery plan is
only good so long as nothing changes.
Select recovery methods or combinations
of methods that will decrease downtime costs
enough to warrant the cost of implementing
the methods. Several recovery methods are
available, providing different levels of
high availability: full system recovery,
application set recovery from backup, application
volume set recovery with disk arrays and
mirrored volume sets (Mirrored Disk/iX),
and application recovery using shadowing
Full System Recovery
If you keep all data (system and application)
on the system volume set, a disk or system
component fault will require recovery of
all data--system and application. This involves
a full system reload. All applications will
be unavailable, and the recovery time is
the longest of all methodsfour to eight
hours. (See Figure 1.) This recovery method
is the least costly to implementif
your environment can tolerate this level
Application Set Recovery
You can significantly decrease recovery
time just by partitioning the disk subsystem
into user volume sets. With this strategy,
the operator stores all accounts by volume
sets. If a drive fails within the volume
set, the operator recovers only the files
on the affected user volume set--not those
on the entire system. Users accessing other
volume sets are not affected. Recovery of
the entire system is required only if a
disk failure occurs on the system volume
set. (See Figure 2.)
The user volume set recovery method reduces
recovery time significantly while increasing
the fault tolerance of your critical applications.
When making this segmentation, be sure to
separate reference data in order to avoid
conflict with critical application recovery.
Also back up reference data only when there
are changes. This recovery method is relatively
inexpensive to implement and reduces recovery
time significantly, generally to 1 or 2
See the sidebar for tips on application
High Availability Disk Arrays
High availability disk arrays tolerate an
outright failure of any single disk mechanism
within the device without losing data or
interrupting the host system. Although redundancy
of the I/O channel, cable, and power is
not provided with a high availability disk
array as it is with Mirrored Disk/iX, disk
arrays do reduce the risk that it will be
necessary to do a recovery from a backup.
We recommend that the system volume set
on business-critical systems be protected
with disk arrays.
Mission-Critical Environment Using Mirrored
Disk failure of a mirrored disk does not
make systems or applications unavailable.
With Mirrored Disk/iX, when a disk fault
is detected, the mirror of the volume set
takes over as though no error occurred.
The reactivation of a failed disk can often
occur without taking the system down. (See
Figure 3.) Mirrored Disks provides full
redundancy of the I/O card in the system,
data cable, disk drive, and the power into
the disk drive but it does not protect against
system component failure nor can you mirror
the system volume set. The cost to implement
this recovery environment includes the purchase
of Mirrored Disk/iX (approximately $1,500
to $26,300), plus additional disks for mirroring.
The recovery time, however, is minimal,
taking only about 40 seconds to activate
the mirrored volume.
OLTPMission-Critical Environment Using
In this high availability segment, applications
and data are replicated in real-time on
separate servers. In the event of a node
failure, another system takes over applications
running on the failed node. Recovery is
available for any system component or disk
failure. Providing full redundancy for protected
applications provides the best protection
for very important business-critical applications.
The cost to implement this solution includes
purchase of SharePlex/iX (approximately
$14,000 to $125,000, depending on system
size and bundle), as well as access to an
alternate system. The approximate recovery
time for protected applications is five
to ten minutes. (See Figure 4.)
If you use multiple recovery methods (like
Mirrored Disk/iX and SharePlex/iX), potential
recovery using backup is less likely. Bi-weekly/monthly
backup of full applications and only DB
logs and changed non-DB files daily may
be sufficient. This still maintains a recovery
path if higher recovery methods fail.
Backup for Recovery and System Availability
Strategies for backing up data range from
backup by small shops able to do this at
night to enterprise-wide backups of heterogeneous
clients and servers. For small shops, there
are few problems. However, as companies
grow and have more and more data to back
up, it becomes increasingly difficult to
complete backups in the course of a night.
Many companies are forced to adopt night
shifts to change tapes. Or they look for
other ways to complete backups within an
allotted window. One typical way is to break
up the store process by running massive
parallel stores on separate user volume
sets. Additionally, faster, larger capacity
devices can be used, such as DDS-3 or DLT.
Some companies have had great success using
DAT autochangers from third-party vendors.
The use of autochangers can help to eliminate
tape-changing delays and the need for attended
backups. To completely eliminate downtime
due to backup, some users have gone to online
backup. This allows users and jobs to continue
modifying databases and files while the
backup occurs. Both TurboSTORE/iX 7x24 True-Online
and a utility from ORBiT Software allow
To improve your data availability, consider
the following environments.
Using user volume sets decreases downtime
for backups. Not all users are restricted
from system access during backup. Only the
users of the volume set being backed up
are affected, and only for a relatively
short time. Users accessing data on other
volume sets can still access the system
during the backup period. (See Figure 5.)
In a mirroring environment, the applications
are mission critical, requiring rapid backup
and restore functionality. We do not recommend
splitting mirrors during a backup since
the user would be exposed during this time.
Instead, with the use of 24x7 True-Online,
users can continue to keep their mirrors
and perform backups without any application
In a SharePlex environment you may need
to recover your system from tape media;
however, the use of SharePlex does not eliminate
the need for backup. In a SharePlex/iX environment,
perform a 24x7 backup on the less critical
shadow system. This will generally eliminate
the overhead associated with online backup.
Although many HP 3000 users have initiated
projects to develop an enterprise-wide backup/recovery
strategy, the single greatest limiting factor
to a totally centralized backup architecture
is network bandwidth. With large volumes
of distributed data, totally centralized
backup architectures are unfeasible. Additionally,
full volume restores resulting, for example,
from a disk head crash, can require significant
time and tie up the network during business
hours. Most users end up with a multitiered
architecture with local tape and limited
centralized backup. These circumstances
favor backup/recovery vendors supporting
With 100VGI, and in the future Fibre Channel,
large backups have become more feasible.
We are seeing requests today from users
who want to perform networked, rather than
local, backups. Higher speeds make it possible
for backup data from the clients on the
network to be moved to a single server which
then stores the data to local tape drives.
Selecting Appropriate Software Solutions
Software products from Hewlett-Packard include
Store/iX, TurboSTORE/iX, and TurboSTORE/iX
7x24 True-Online. (See Figure 6.) Third-party
software solutions include ORBiT's Backup/3000,
Unison's RoadRunner, and Legato's Networker.
Store/iX. STORE is an excellent choice
for small shops that do not require online
backup or for environments in which delays
cannot be afforded. STORE, included with
the OS, offers basic functionality, but
is limited in its file selection capabilities.
In most cases, users must use other tools
to generate precise file lists at the cost
of backup time. In addition, STORE is unable
to specify more global parameters (for example,
full database but partial files). Neither
STORE nor TurboSTORE offer optional tape
Librarian utilities to track which file
ended up on which tape. Sophisticated media
managers are also not available.
TurboSTORE/iX. TurboSTORE/iX products provide
high-performance backup solutions, including
powerful parallel backup and recovery, data
interleaving, data compression, and online
To take full advantage of TurboSTORE/iX
products, review the tips in the sidebar.
TurboSTORE/iX 7x24 True-Online. As businesses
move closer toward continuous operations,
IT managers find a growing need for solutions
that can meet the demands of a 7 days per
week, 24 hours per day environment. TurboSTORE/iX
7x24 True-Online Backup was specifically
designed for 7x24 environments by providing
backup of selected data without requiring
application downtime or user logoff. In
addition True-Online provides the same powerful
backup capabilities of previous versions
Selecting the Appropriate Hardware Configuration
Selecting the appropriate hardware configuration
involves more than merely selecting the
appropriate backup device. With MPE/iX 5.0
and greater, you can have up to 32 backup
devices on your system. By using multiple
devices in parallel, you can increase your
data throughput. To get the maximum throughput:
Spread disk devices and backup devices
across multiple device adapter cards for
maximum backup performance, especially if
total backup speeds of over 12 GB/hour are
needed. For maximum backup performance,
the system configuration should contain
no more than four disks per card.
SCSI cards cannot sustain more than 10 GB/hour
data transfer across the bus. Backup devices
can share the SCSI bus, up to a total of
10 GB/hour. Therefore, for maximum performance,
make sure no more than two DDS-3, two DLT,
four 7980S, or two 7980SX share a SCSI bus.
Where TurboSTORE is used to compress data,
use the INTERleave option.
You can avoid the impact of reel switch
by using TurboSTORE/iX Sequential device
or Sequential device pool functionality.
Use the INTERleave feature when storing
data from more than three disks.
The numerous new tape technologies in the
marketplace can be grouped into three categories:
low end, midrange, and high end.
At the low end are DDS and 8 mm devices.
These were developed as spin-offs of consumer/entertainment
applications and have had great success
penetrating the computer data markets. However,
larger users are becoming impatient with
their inherent problems with reliability,
throughput, and capacity.
The midrange includes DLT-4000 and Mammoth.
Mammoth is a future technology from Exabyte
based on 8-mm technology. As development
schedules have continued to slip, it appears
that the market has moved to DLT and the
opportunity for Mammoth has sharply diminished.
DLT-4000 is one of a portfolio of products
developed specifically for computer data
storage. It offers high reliability and
solid throughput and capacity.
DLT-4000 also penetrates high-end tape technologies.
Also in this category are the StorageTek
See Figure 7.
HP 3000 Solutions
When selecting a hardware device, keep in
mind that HP solutions are based on user
requirements for amount of data to be stored
and time available in which to store it.
For users with small datasets and longer
backup windows, DDS may be a very appropriate
solution. At the midrange and high end,
DLT-4000 mechanisms provide backup for users
with large amounts of data and limited backup
windows. Throughput and capacity allow multiple
DLT-4000 mechanisms to meet the needs of
users with large volumes of data and aggressive
HP offers the latest DDS-3 tape drives.
The new DDS-3 format has a native mode capacity
of 12 GB. With data compression, users can
typically store 24 GB on a single tape.
DDS DAT is an industry-standard high capacity
device. Its compact media is easily stored
in a fireproof safe.
The HP 3000 servers support automated digital
linear tape (DLT) mechanisms. DLT/4000 provides
greater native cartridge capacity (20 GB)
than DDS (12 GB), enabling fast, unattended
backup of large quantities of data within
the brief windows available in today's high-end,
mission-critical environments. DLT native
transfer rate (5.4 GB/hour) is three times
faster than DDS-2 DAT (1.8 GB/hour). Furthermore,
large-capacity DLT boasts superior drive
See Figure 8.
Future Requirements for Massive Backup/Restore
As HP designs and implements solutions for
the different environments we have discussed,
we can't lose sight of some important trends
in the storage environment of our users.
Faster network capabilities will make centralized
networks more feasible. As HP 3000 systems
coexist with heterogeneous systems, users
will want to back up multiple heterogeneous
clients and servers to a single centralized
server. Fibre Channel will become the dominant
peripheral interface over the next three
to five years, providing fast interconnect
as well as the availability of faster disks,
arrays, and tapes. Data capacity requirements
will continue to grow, requiring solutions
to back up and restore terabytes of data.
Cost pressures continue to emphasize solutions
that support operatorless environments.
More and more users will want storage management
functionality that allows online backups
and simple, sophisticated media management.
Of user requirements, the highest priority
is given high availability solutions. Users
will continue to demand solutions that minimize,
if not eliminate, inaccessibility to critical
data. Storage management solutions need
to provide much greater availability and
reliability and a much more complex environment.
Tips for Application Set Recovery
Tips for the System Volume Set
To ensure maximum fault tolerance as well
as to reduce recovery and backup times,
make the system volume set as small as possible.
It is very important not to allow permanent
user data on the system volume set. With
all user data restricted from the system
volume set, full backups of the system volume
set need be done only when new software,
configuration changes, etc., are added to
the system. If the system volume set is
small, make a combined SLT storeset of the
system volume set using SYSGEN. For a larger
system volume set, back up the volume set
with multiple storesets for faster recovery.
Tips for the User Volume Set
Minimize the number of applications per
volume set. If practical, aim for one application
per volume set. Create a general user volume
set for all nonapplication-specific user
files and don't put them on the system volume
set. For your business-critical applications,
use mirrored disks within a High Availability
Storage System (HASS) enclosure. Do not
split the mirrored volumes during backup,
but instead use 24x7 True-Online. Rotate
full backups of the user volume sets and
use the DIRECTORY option when doing volume
set backups. When identifying application
sets, use the following guidelines:
System code that is recovered with a system
load tape (SLT) created at install/update
Other system utilities and third-party tools
that have not been modified.
Major applications (which are the reason
the system is needed). Some may be critical
to your business processes, but some may
Extraneous data that does not require recovery.
Development data (source, tests, etc.).
Client system data (client data storage,
client backups). 2
Use multiple parallel storesets to gain
throughput and performance of backup. You
also gain better performance during recovery.
Use MAXTAPEBUF for larger I/O blocks; this
improves performance when using fast backup
Use device hardware compression, when available,
to minimize CPU overhead.
Use software compression if hardware compression
is not available for the device.
If reading from more than three disks, use
the INTERleave option.
Use the DIRECTORY option on all major backups.
Use an integrated database backup so the
database and other nondatabase application
set data can be backed up together online.
Store backup media in a safe place (a data
vault or fireproof safe). Do not leave your
business-critical recovery data lying around
Verify your backup. This can be done easily
on another system (with MPE 5.0 or later)
to reduce overhead on your main system.
When performing online backup, chose a time
with the lowest activity on the system to
minimize CPU overhead during busy system