Use Rsync for Daily, Weekly and Full Monthly Backups

Today, we will be using rsync to make daily, weekly, incremental backups and then a full compressed/archived backup once a month. We will then use cron to automate the process. Lets face it us humans get lazy sometimes and most backup systems loose complete effectiveness if they are not completely automated.

What is rsync?

 
Rsync is a sweet utility for syncing files/folders. Many times it is used for producing incremental backups since it is capable of detecting what files are added and changed to a folder. It usually does this by timestamps but it can be set to determine file changes wih a more precise (but slow) method using md5 hashes. However, generating md5 hashes for detecting file changes is usually not required.

Syncing vs Full Backup

 
Before geting into the details I think it is worth explaining the difference between full backups and incremental backups. With incremental backups I like to think of them as syncing. You are making the two sets of data match. For example, if one data set contains one extra file the incremental backup will only add that one file to the backup. As opposed to re-copying the other files. This is useful for maintaining frequent backups without the added bandwidth or processing overhead.

Installing rsync for Debian / Ubuntu

 
If rsync is not installed on your Debian / Ubuntu system you can install it with your preferred method or use aptitude:

# apt-get install rsync

Thats it! Did you expect more?

Installing rsync for FreeBSD

 
It is prefered to compile applications in FreeBSD rather than binary packages. So we will install it with the following commands:

# cd /usr/local/ports/net/rsync
# make install clean

If you are prompted with make instructions the defaults are fine.

Syncing Two Folders for Daily Backup

 
For our daily backup we will use the incremental method since it will be very frequent. We also don’t want to waste disk space, use unnecessary write operations, and CPU cycles by doing a full backup.

To sync one folder to the next we use the following command:

$ rsync –av /path/to/source /home/mark/rsync/daily

The ‘a‘ flag tells rsync to go into archive mode.
The ‘v‘ option is just verbose (show details).

Now this should copy all of the files from one folder to the other. If the files already exist and they have not been modified since the last time you run this command it will simply copy those files. So basically when you run this command it will backup the files that have changed since the day before.

Adding the –delete flag. By default rsync does not delete files in the backup if they were deleted in the source. This is rsync’s way of protecting you from deleting the entire directly on accident. To force rsync to delete files we do this:

$ rsync –av --delete /path/to/source /home/mark/rsync/daily

This will ensure that we are not backing up deleted files.

Warning: When using the –delete flag be sure to check your command twice. If you reverse the source with the destination you will sync your data with an empty folder. You will be left with two empty folders!

Weekly Sync

 
For our weekly sync will just sync with the latest daily folder.

$ rsync –av --delete /home/mark/rsync/daily /home/mark/rsync/weekly

We run this command once a week to maintain our weekly incremental backup. Now if we accidentally deleted something last Tuesday and just noticed it on Friday we will have a backup.

Full Monthly Backup

 
Since we are going to keep full monthly backups and they won’t be accessed frequently we can compress them with bzip.

tar -cvjf /home/mark/rsync/monthly/monthly.tar.bz2 daily/

Now since we are archiving our full backups monthly we want to be sure not to over write an existing monthly backup. We will do this by naming each one with the date. Instead of the command above use this one, to add the date to each filename.

tar -cvjf /home/mark/rsync/monthly/monthly_$(date +%Y%m%d).tar.bz2 /home/rsync/daily/

Now you should have all the commands to set up a rotating backup daily, weekly and a full monthly backup. The only thing now is to execute those commands every day, week, and end of the month.

Automate the Process with Cron

 
Usually we don’t want to have to remember to type in a command daily, weekly, and at the end of each month so, we will automate it with cron.

$ crontab -e

Now add the following lines of code:

01 17 * * * rsync –av --delete /path/to/source /home/mark/rsync/daily
00 18 * * 5 rsync –av --delete /home/mark/rsync/daily /home/mark/rsync/weekly
00 6 1 * * tar -cvjf /home/mark/rsync/monthly/monthly_$(date +%Y%m%d).tar.bz2 /home/rsync/daily/

This example cron setup will backup daily at 5:30PM.
Backup every Friday at 6:00PM.
Do the full backup on the first of each month at 6:00AM

For more information on cron see, Learning Cron by Example.

Now you will need to tailor this to the usage patterns of you or your users. You should also allow enough time for the daily backup to finish before doing the weekly. In this example on Fridays I allowed 59 minutes for the daily backup to finish. If you are worried about the sync time running into each other you can schedule your daily backup in the morning and your Friday weekly backup at night.

If you want to find out how long it is currently taking to do a backup add ‘time‘ command to the beginning of each command.

Tell Cron to be Quiet

 
Cron by default sends emails with the output of the command. If you don’t want to get emails you can pipe the cron comands to /dev/null. Add this to the end of each cron line:

| /dev/null


Was this information useful?


14 Responses to "Use Rsync for Daily, Weekly and Full Monthly Backups"
  1. Creating Useful Bash Aliases on April 23rd, 2008

    [...] could make a system backup script and whenever you feel like you need to make a quick unscheduled backup you could [...]

  2. My Blog is Now 1 Year Old on July 8th, 2008

    [...] Use Rsync for Daily, Weekly and Full Monthly Backups [...]

  3. hypotheses on July 20th, 2008

    Thank you a great tutorial. This gives a very clear answer and solution for what I want.

  4. Oblivon on January 28th, 2009

    Nice write-up.

    One comment. When adding the –delete flag or trying out any new rsync command for that matter, it’s a great idea to use the -n “dry run” flag first. It’ll show you exactly what it’s going to do. If you’ve got things twisted around you’ll know.

    Be especially careful if using –delete with the -x ‘one filesystem’ flag. It may not behave as you expect.

    If I’m working on a large number of files/directories, I like to save the dry run output to a text file and then grep that for ‘deleting’ to see what’s going to get the axe.

  5. Mark Sanborn on January 28th, 2009

    Yup,

    You hit the nail right on the head. Be VERY careful with the –delete flag.

  6. Powerhouse Programs of Linux on April 29th, 2009

    [...] For a more detailed guide on how to actually put rsync to good use be sure to check out, Use Rsync for Daily, Weekly and Full Monthly Backups [...]

  7. Charles on May 31st, 2009

    Newbie question. If you are using Cron to schedule, what happens if your computer reboots? Do you have to rerun the code?

  8. Mark Sanborn on May 31st, 2009

    No, once it is in cron it is in there forever; however, if you restart during the time a scheduled cron is to take place it will obviously not run.

  9. Nicolás Schubert on June 3rd, 2009

    I’ve been using this scripts for some days now. They work great. Thanks a lot.
    I have to declare my ignorance on this issues but I’ve been bothered about rsync flags for a while now.
    Looking at the man page there are some interesting flags like -o for owner or -t for time that would be good to keep on a backup.
    Is it OK to use them?

  10. [...] Use Rsync for Daily, Weekly and Full Monthly Backups [...]

  11. Using Rsync on August 5th, 2009

    [...] Most backup senarios can be done with rsync and cron. For an example on how to create a daily weekly backup scheme check out, Use Rsync for Daily, Weekly and Full Monthly Backups. [...]

  12. Kyle on August 6th, 2009

    I think you would be better off keeping a weekly type backup every day of the week.

    Sync each day from /source to something like /backup/monday.

    This way if a user creates a file on Monday, deletes it on Tuesday and asks for it to be restored on Wednesday it can be done. The above method will have deleted it from the daily (due to the –delete command) and it would not yet be in the weekly backup because it hasn’t run for that week yet.

    You can then create a tar archive of all the folders (monday – friday) at the end of the week. This enables you restore a file from any day of that week for as many weeks as you decide to archive (typically you would do 2 weekly archives before overwriting them but 1 is acceptable).

  13. Mark Sanborn on August 7th, 2009

    The setup in the post has proven to enough for our needs. We don’t have the bandwidth to keep a daily for each day in the week. Thanks for the comment though, maybe some of my users would prefer your method.

  14. Kyle on August 9th, 2009

    Yes the bandwidth has to be taken into account especially if you’re with one of those ISP’s that count uploads. I was thinking of a removable device of some sort.

    By the way, you mentioned that rsync can check a file by md5 hashes. Does this mean if the md5 hash of the file is changed it will just recopy the entire file? I read somewhere that rsync can inspect data within a file and only change the section of that file that has changed. Is that true?

Leave a reply