A Typical Customer Brief
Develop a small, elegant, extremely robust, inexpensive, easily supportable and fully automated solution.
Avoid the initial purchase price, annual software maintenance fee, unexpected/unwanted vendor changes
and retain total control of the software.
Also, when external problems occur, the solution must "self-right" after short/medium term
interruptions and provide automated notifications of long term interruptions.
Constraints:
- CapEx budget: $0 (really! Now that's a constraint!)
- OpEx budget: tiny
Yes, it is possible!
Some Recent Customer Solutions
-
A suite of shell scripts to backup a SAP ERP system located in the primary DC with no local backup
facility to a remote DR DC - over a modest network link.
This is fully automated and sends notification emails in the event of a problem. (Linux/MaxDB).
For this customer, CPU cycles are cheap and network bandwidth is expensive, so trade more of
the former for less of the latter. Other objectives are to get the backups off site ASAP and to cope with
network interruptions by transferring many smaller files rather than one huge file.
-
splitcompress: Asynchronously splits and
compresses "large" (~130GiB, so fairly small by SAP standards) full and incremental SAP ERP DB
backup files into compressed chunks - N chunks at a time - using as many available processors
as desired. Checksums are calculated at every stage.
-
offsite_dirs: Copies the compressed files and checksums generated when they are complete.
-
offsite_files: Copies all the other files that are required to restore
a SAP ERP system (including DB transaction logs) off site shortly after they are complete.
-
verifycksum: Verifies all the files have arrived at the DR site intact. All the compressed
chunks are uncompressed and concatenated, and that result is checksummed
and compared with the original checksum from the primary site.
-
EDI bureau interface: A shell script that polls the EDI bureau at specified times to up/download business
documents. Due to the bureau's systems being quite slow, the script forks N "threads" (sub-shells) that
simultaneously up/download - thus realizing the required throughput.
Rogue suppliers occasionally generating up to hundreds of duplicate orders was a major problem for the
business, so the script also detects and quarantines duplicate orders so they never reach the ERP system.
-
Third party logistics (3PL) interface: A shell script to up/download business documents of various types
(e.g., iDoc, XML and CSV). Depending on the document type, various actions are required, such as:
line termination change, creation of a file containing a list of files (for ERP to read and process), trigger a
SAP event, send emails, etc.
-
A suite of shell scripts to keep Oracle DR DBs (for SAP ERP systems) in warm standby: As above, one of
the design constraints was that the network link is fairly modest, so a very high level of compression was
required. Another benefit is weeks or even months of compressed Oracle offline redo logs can be stored
off site at the DR/test DC. This is very handy for recovering test and sandbox systems.
(Linux/Oracle)
Oracle's Data Guard was not used because:
(1) DG was too new when we started,
(2) we have control! - e.g., choice of compression utility, the priority to run it, etc.,
(3) DG only replicates Oracle, there are other files that must be replicated too,
(4) No Oracle changes are required on either the primary or standby DB and
(5) these scripts are extremely robust and reliable.
Notes:
- PKI is used for authentication and encryption.
- Passphrases for automated secure communication use ssh-agent.
- Most of the shell scripts are initiated by cron or SAP batch jobs.
- Extensive use of utilities such as: rsync, lzip, xz, awk, sed, cksum utilities (cksum, sha256sum, etc.) and
findfiles (developed by YOSJ Staff), of course!
- "Obvious" solutions such as SAN replication were eliminated due to the additional initial and ongoing costs -
i.e., the SAN software and the requirement of a much higher bandwidth network link between the DCs.
- CapEx can inexplicably be an order of magnitude more difficult to obtain than OpEx - even if it's the better option!