Why does it take so long to do a backup?
Why does it take so long to do a backup?
What is the difference between copying a file and backing it up?
We are often asked why our software takes “so long” to do backups – longer than simply copying files. The answer is simple and complicated.
Many factors affect the speed of backups. Some are in your control and some are not. The answer is, simply, that RBackup does far more than copy files. That’s why it takes longer than copying files. It doesn’t copy files. It backs files up. There is a vast difference.
Backing up files right is a complex process that RBS has perfected from performing many millions of backups and millions of restores since our company was started in 1987. It is definitely NOT a simple copy process.
Backup is a critical process that must be dependable, reliable, and perfect. Since we are also sending files offsite over a public network it must also be secure and private. Since offsite backups are governed by so many security and privacy regulations it must also be compliant with all these regulations.
Here is the complete process RBackup uses to back up files. Each process requires some time. Some processes are executed for each file, some for each batch of files, and some for the entire backup session.
Initialize TCP Connection - Open a connection with the Network Interface Adapter.
Log into RBS Server – Contact the RBS Server. This involves sending a message to the RBS Server and looking for a valid response. If the IP address of the Endpoint is listed in the Server’s firewall as “deny” the Server will not respond.
Authenticate Endpoint – The RBS Server identifies itself and then sends the Endpoint a unique encrypted Session Authentication Token. The Endpoint decrypts the token, transforms it using a proprietary algorithm, encrypts it, and sends it back to the Server with some proprietary cargo. The Server receives and validates the token and performs various functions based on the associated cargo. The Endpoint will not be allowed to continue if it does not authenticate.
Assign a Data Port and IP Address – The Server assigns the Endpoint a unique data port and IP address to use for file transmission and for some commands. The Endpoint receives the Server’s assignment and opens a second connection to the Server on the assigned IP address and data port. There is a further authentication on the data port to authenticate the Endpoint.
Begin All Files Process – The Endpoint begins its main process loop to back up all files. This establishes a process start point in case the process is interrupted or aborted before completion. At this point, and until the End All Files Process, the CPU speed and disk speed play a larger part in determining how fast a backup proceeds.
Select Files and Objects – The selection phase of the backup process can take a long time depending on the number of files selected for backup, the method you use for File Selection Criteria, disk speed, network speed, and whether you are using AutoSelect or manual file selection, and whether the file is locked by a local application.
Unlike simple copy processes, RBackup can back up locked files. Some applications lock files exclusively, which prohibits any other application from accessing them, even for backup. But since a backup process, unlike a copy process, must be 100% reliable, RBackup must back them up regardless.
SO, RBackup first checks the Windows operating system to see if it supports Microsoft’s Volume Shadow Copy Service (VSS.) If it does, RBackup switches on its VSS driver. If it does not, RBackup switches on its legacy open files driver.
If VSS is ON, RBackup first checks to see if a file is listed in Windows’ Locked Files list. If it is, RBackup takes a snapshot of the locked file using VSS. While it takes a little time, this is a relatively fast process. Depending on the file size and time, it might require some disk space.
If VSS is OFF, RBackup attempts to open each file for reading. If it fails, RBackup uses its legacy open file system driver to snapshot the file. This is slower than using VSS.
If the file is not locked, RBackup locks it, then opens the file for reading and begins working with it.
File Selection Criteria
Archive Bit selection is the fastest because RBackup only has to examine the archive bit of each file. It scans the file selections and examines only the archive bit of each file that matches your selections.
FastPick is next. RBackup examines the date and time of each file or object selected and compares them to the date and time of the last backup. Files that have dates and times newer than that of the last backup are backed up.
Date/Time takes the longest. For each file that matches your file selections, RBackup has to examine the date and time of the file and compare it to the date and time of the last time each specific file was backed up. This requires RBackup to look up each file in its catalog, examine each file on the disk, and compare dates.
If you use AutoSelect rather than selecting files manually, RBackup might examine all files on the hard drive and all files on mapped drives, depending on how you have defined the AutoSelect function. Because of this AutoSelect is much slower than selecting files manually.
Extract Changes – Based on the information RBackup gets from the file selection process, it extracts changes from the file system depending on the selected Backup Method and File Type.
Incremental / Differential – Back up only files that have changed since the last backup. This is the quickest.
Full – Back up the entire file, regardless of the File Selection Criteria.
BitBackup – Back up only the parts of the files that have changed since the last backup. This requires RBackup to compare the current file to the previous version of the file, and takes the longest amount of time, and uses the most drive space and CPU time.
File Backup – If the file selected is a simple file like a word processing document or spreadsheet, RBackup can process it quickly.
Database Backup – If the file selected is a database like Exchange, SQL Server, Active Directory, or SharePoint, RBackup will switch in one of its built in agents to extract the data changed since the last backup. This allows RBackup to back up only the Objects that have changed, like Mailboxes, Records, and Directory Objects. It is the slowest form of backup by File Type.
Calculate File/Object Signature – RBackup must calculate a pre-compression, pre-encryption digital signature for each file or object it backs up. This allows the Endpoint and Server to verify the authenticity, origin, and content of each file securely before and after transmitting it to the Server without the need to view the file’s contents.
Compress – Files and Objects are then compressed using one of five built-in lossless compression algorithms, each one selected for each file based on the file’s contents and encoding method as optimum for each specific file.
Encrypt – Compressed files and objects are encrypted using the selected encryption method for the current backup set. Encryption methods have different speeds depending on their algorithm and key length. Generally, the longer the key length, the slower the encryption speed, however the various algorithms available also have different speeds. Blowfish encryption is the quickest high security encryption.
Close the File – RBackup instructs the Windows file system that it is now finished with the file, and authorizes other applications to access it.
Verify Locally – The file or object is then verified to be sure that the encryption and compression process did not alter the file’s contents.
Digitally Sign – The file is then digitally signed using the previously calculated signature. The signature is appended to the backup copy. This will be used by the Server and the Endpoint later to authenticate the file’s contents without the need to read the file. This insures that no viruses or Trojans get attached to the backup copy after transmission to the Server and that the file has not been altered since it was signed. On restore, the file is guaranteed to be 100% identical to its original version.
Alias Filename – To comply with worldwide data security and privacy regulations the filename is removed from the file and replaced with a unique identifier that indexes each file to its metadata in the Catalog.
Store in Cache – The backup file, now secure, is stored in the local cache where it will stay until it has been verified as properly stored on the Server. Then it will be erased.
Index the File – The secure file structure of RBackup indexes filenames to folder names, and breaks up the data into manageable chunks that Windows can handle most efficiently. This requires indexing files to folder names in the server’s data store, the file’s metadata, a signature of its encryption key, encryption method, group type, backup set, compression type, and storage name.
Assign Directory Names – The file is assigned to a directory on the RBS Server, no more than 5,000 files per directory (unless set differently.)
Catalog – All the file’s metadata, including its index, directory name, and original filename are stored in the local Catalog. This is a database maintained by the RBackup Endpoint for quick lookup on restore.
Transmit – Each backup file, now renamed, compressed, encrypted, signed and secure, is then transmitted to the Server. Depending on the settings of the software, RBackup might transmit several files at a time. File transmission time is determined primarily by Internet speed, but is also affected by the number of ports assigned to each simultaneous connection (at the Server) and other factors in the control of the Service Provider, and by the Bandwidth Throttling setting for each Endpoint. Bandwidth Throttling can be changed remotely at the Server for each individual Endpoint.
Wait for Server to Acknowledge – The Endpoint waits for the Server to acknowledge that it has received the file. Backups of previously prepared files may continue in other threads. The Endpoint software hardly ever really “waits” for the server.
Server Verify – The Server stores the file in its Data Store. The Server then performs a local validation based on the file’s metadata and its signature.
Acknowledge – If the Server determines that the file is properly stored it sends the Endpoint some metadata and a proprietary token that identifies the file it has just received. The Endpoint calculates the token and authenticates it, then notifies the Server if its validation is correct. If it is not, the transmission process restarts.
Erase from Cache – After acknowledgement the Endpoint erases the backup file from its local cache.
End All Files Process – After all files are processed, the main process loop is closed and the server is signaled.
Validate Batch – The Endpoint and Server exchange information about the batch to validate that all files were received and stored in their original forms, unchanged.
Store Key Escrow Files – If Key Escrow is turned on, the Endpoint transmits and validates its Escrow files.
Perform Delete and File Move Functions – The Endpoint searches its catalog and compares the current date/time with the dates and times of all backup sets and all previously stored files and objects to determine if it needs to prune files from the server. It sends the Server encrypted instructions to delete or move backup files depending on the file retention protocol previously defined for each backup set.
Update Local Catalog – After the Server verifies that the Delete/Move function is done, the Endpoint updates its local catalog with any changes induced by the process.
Validate Local Catalog Synched with Server – If this process is turned ON, the Endpoint and Server verify that the local Catalog and the Server’s Data Store are in sync. If not, the local Catalog is updated and messages logged for the Service Provider.
Compare Catalog – RBackup compares the last copy of the catalog it stored on the Server with its current copy to extract changes.
Extract Changes from Catalog – If the catalog has changed, the changes are extracted.
Prepare Catalog for Backup – The changes are prepared for backup exactly the same way files and objects are prepared. They are compressed, encrypted, and signed.
Store Catalog – The catalog changes are sent to the Server and stored.
Verify Catalog – The Endpoint and Server exchange metadata and signatures through secure tokens to validate the catalog changes were properly stored on the Server.
End of Backup Processes – At the end of the backup both the RBackup server and the Endpoint do some housekeeping tasks like flushing cache, updating databases, writing logs, closing and releasing ports. In addition, the Endpoint might run any command files that have been defined by the Service Provider.
Other things that might affect speed:
Throttling – RBackup has a function that throttles bandwidth. It is set to medium by default, but it can be turned way down or way up. This makes a huge difference in the speed of transmission, and not in the speed of file preparation.
Priority – RBackup can be set to use High, Medium, or Low priority. This affects how much CPU time the backup process takes from other applications. It is set to Medium by default.
CPU speed – The speed of the CPU affects the file preparation process, starting with the Begin All Files Process phase.
Internet Speed – Of course Internet speed plays a large part in transmission speed. Sending files UP your Internet connection (backing them up) is usually far slower than downloading files (restoring them.) Most Internet Service Providers specify their speeds by advertising only the download speed, the fastest one. The remote-backup.com website has several articles and calculators to help predict transmission speeds.
File Size – Big files take a long time to back up. That’s just simple physics.
Number of files – RBackup needs about 1 second per file to handle overhead tasks like validating signatures and verifying that files have been correctly stored on the Server, and this can add a lot to the backup time. For example, if you are backing up 100,000 files, that’s 100,000 seconds or 1,666 minutes, or 28 hours – just in overhead time for file validation, not including the time needed to prepare and transmit the files.
If you made it through this article to the end, I hope you are convinced that backing up files is vastly different from copying files, and that doing proper backups is worth the time it takes.