Patrick Baselier – 26 March 2014
1310 words in about 6 minutes

Creating a backup of a (PostgreSQL) database in your Rails application should be easy. Well, so seems to be playing Eddie Van Halen’s Eruption. Nevertheless, it took me about 1.5 day to have this sorted out and get this up and running. So I figured, why not sharing it with the world. It might be helpful to someone.

Creating a location, credentials and permission on Amazon

The bucket

Our back up will be uploaded to an Amazon S3 bucket my-app.backups. For this, of course, this bucket needs to be created. For this you just follow the AWS documentation.

Lifecycle

It’s ok for backups older than 30 days to be removed (or at least: in our case). Instead of managing this programmatically, you can define a rule for your bucket which takes care of cleaning up old backups. In the Amazon S3 console, select the bucket, choose Properties and open LifeCycle. Here you add a rule to have files of a specific age cleanup for you.

IAM

We create a separate user system.my-app.backup for performing the backup. We also define a group Backup where the user belongs to and give this group access to the bucket. Creating a user and a group is very straight forward with the IAM Management Console, but keep the following in mind:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  "Version": "2012-10-17", 
  "Statement": [
    {
      "Sid": ...,
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": [
        "arn:aws:s3:::my-app.backups/*"
      ]
    }
  ]
}

The Amazon stuff is all set up. Before we switch to our code, a quick word about where to store credentials.

Location of credentials

We like to keep credentials out of our codebase. Instead we define environment variables and use these in our application code (see below). The credentials for our Amazon user are stored in /etc/environment:

1
2
export S3_BACKUP_KEY=*******  
export S3_BACKUP_SECRET=*******

Maybe there a better place to define environment variables, but this works for us.

Coding the backup

We use SettingsLogic for maintaining application settings such as user credentials that may change per environment, but other solutions are applicable. SettingsLogic uses config/application.yml and has a key for every environment. Since we can also use ERB in YAML, we can define credentials for S3 that uses the key and secret we defined earlier a environment variables:

1
2
3
4
5
6
backup:
  bucket: 'my-bucket'
  connection_settings:
    aws_access_key_id: <%= ENV['S3_BACKUP_KEY'] %>
    aws_secret_access_key: <%= ENV['S3_BACKUP_SECRET'] %>
    region: 'eu-west-1'

As you can see, we also defined other S3 settings, to keep them all in one place. When you start a Rails console and enter Settings.backup.connection_settings.aws_access_key_id you will see the key you defined in your environment file. backup gem provides an easy to use interface for handling backup. It (v4) also suits our needs. Although backup advices NOT to include the gem in Gemfile, we do want to store the configuration for the backup in the Rails application. How can we run backup outside of our application environment and still use this configuration and Rails specific values? This is how we did it.

In :db_backup model you define things like the database name, user, password and credentials for S3. We already have defined these in database.yml and application.yml so we want to re-use these. Include the following lines in the beginning of the model:

1
2
3
4
5
6
7
8
require 'yaml'
require 'erb'

rails_env       = ENV['RAILS_ENV'] || 'development'
database_yml    = File.expand_path('../../database.yml', __FILE__)
db_config        = YAML.load_file(database_yml)[rails_env]
application_yml = File.expand_path('../../application.yml', __FILE__)
app_config       = YAML.load(ERB.new(File.read(application_yml)).result)[rails_env]['backup']

Now these db_config and app_config can be used to configure our :db_model:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Model.new(:db_backup, 'Database backup for my app') do
  database PostgreSQL do |db|
    db.name               = db_config['database']
    db.username           = db_config['username']
    db.password           = db_config['password'].to_s # nil not allowed
    db.host               = "localhost"
  end

  store_with S3 do |s3|
    s3.access_key_id     = app_config['connection_settings']['aws_access_key_id']
    s3.secret_access_key = app_config['connection_settings']['aws_secret_access_key']
    s3.region            = app_config['connection_settings']['region']
    s3.bucket            = app_config['bucket']
    s3.path              = "/#{rails_env}"
    s3.fog_options = {
      path_style: true
    }
  end

  compress_with Gzip
end

This configuration should be very straight forward, except for the path_style key; set this to true if your bucketname contains dots.

Perform the backup

To perform a backup you use the following command (replace :environment and :path/:to/:app):

1
RAILS_ENV=:environment backup -t db_backup -c :path/:to/:app/config/backup.rb

The catch here is not to cd into the application directory first when you’re using rvm (or other version manager), since the backup command is not available then. backup will upload a file to Amazon S3 and puts it in the my-app.backup bucket in :environment/:model/:timestamp/db_backup.tar

Schedule the backup

With whenever you can define cron jobs in Ruby. A few caveats here:

In config/schedule.rb we can define a custom job_type that uses whenever’s variables. This job_type is only called for specific environments, so the cron job is only defined here:

1
2
3
4
5
6
7
8
job_type :backup, "source ~/.rvm/scripts/rvm && RAILS_ENV=:environment backup perform -t :task -c :path/config/backup.rb"

case @environment
when 'production', 'staging'
  every 1.day, at: '10:04pm' do
    backup 'db_backup'
  end
end

db_backup is the name of our model. You can emit source ~/.rvm/scripts/rvm if you don’t use rvm. Here :path, :environment and :task are whenever variables, so leave them as is.

Backup on deployment

Since we already have a setup, why not use this to backup the database when we deploy a new release. What we need to keep in mind is that the backup gem needs to be installed system wide, so it’s a good thing to have this taken care of by a capistrano task as well. We use Capistrano 2, but I’m sure most of the code applies to v3. In config/deploy.rb we define a task my-app:backup:install that only installs the backup gem if needed and then performs a backup. We call this task in the deploy:create_symlink hook so the current_path points to the folder of the release you’re currently deploying.

1
2
3
4
5
6
7
8
9
namespace :my-app do
  namespace :backup do
    task :install do
      run "if ! [ $(gem list backup -i) == true ]; then gem install backup --no-ri --no-rdoc ; fi && RAILS_ENV=#{rails_env} backup perform -t db_backup -c #{File.join(current_path, 'config/backup.rb')}"
    end
  end
end

after "deploy:create_symlink", "my-app:backup:install"

Restore the backup

Restoring a backup is (currently) done by hand. Of course this can be automated, but here’s our process. The backup creates a plain-text SQL script file so we can restore it using psql. psql will ask to enter the user’s password. You can find this in database.yml.

1
2
3
4
cd :path/:to/:app/current
sudo apachectl stop
bundle exec rake db:drop db:create
tar -xvf /tmp/db_backup.tar -C /tmp && gunzip -c /tmp/db_backup/databases/PostgreSQL.sql.gz | psql -U :user :database sudo apachectl start

Again, replace :path/:to/:app, :user and :database with the proper values. :user is the user defined in database.yml for the specified environment. And that’s it. Hope you’ll never need it so you can spend more time on expanding Eruption but if you do, may your backup be restored within seconds.

Patrick Baselier

I’m a professional Ruby on Rails-, front-end- and unprofessional (that is: not professionally… yet) Ember developer from The Netherlands, I love sharing knowledge and one day I hope to be a more than a novice guitar player.