Using multiple databases in Rails

We recently embarked on a research project to extract some functionality in our codebase to a separate database. Our motivation was to investigate scalability options for our back-end—we've previously written about our Postgres database, and as we continue to scale the business our need for database performance grows accordingly.

To properly test out new database back-ends we needed to connect up some representative workloads in our Rails application, which led us to investigate the feasibility of adding new databases to our existing environments. This isn't entirely novel—we already have code that does this in our Rails application—but the twist was that we wanted to entirely isolate it from being able to write to our primary database. We could still read from the replica if required, but any writes had to be directed exclusively to our new database.

(This post focuses on the setup for multiple databases, but we'll talk more about what platforms we've been experimenting with in a future post).

Multiple databases in Rails

As a bit of background if you're not familiar with Rails—the core framework has good support for multiple databases, as detailed in this guide, supporting any number of database connections. When setting up your databases the core config is managed in the database.yml file—here's a basic example of a primary database with a read-replica, and a secondary database for additional content:

production:
 # Both the primary and replica connect to the same database,
 # but with different roles.
 primary:
   host: primarydb.vpc.local
   adapter: postgresql
   username: foo
   password: bar
   database: blog
 replica:
   host: replicadb.vpc.local
   adapter: postgresql
   username: foo
   password: bar
   database: blog
   replica: true
 # This connects to a different database. This might store information
 # on records we don't always need to access.
 secondary:
   host: commentdb.vpc.local
   adapter: postgresql
   username: bar
   password: baz
   database: comment_metadata

The primary configuration will be used by default per Rails naming conventions. If we want to direct queries to our replica to offload some read traffic we can use the appropriate helpers to configure the appropriate database for the writing and reading roles:

# Set the base ApplicationRecord to use the replica when reading.
class ApplicationRecord < ActiveRecord::Base
 self.abstract_class = true

 connects_to database: { writing: :primary, reading: :replica }
end

# When code should connect to the replica, set the role with connected_to.
ActiveRecord::Base.connected_to(role: :reading, prevent_writes: true) do
 # code to use the replica goes here
end

Switching between actual databases is configured in a similar manner, we just swap out our primary and replica connections for the secondary database:

# Abstract base class for models.
class SecondaryRecord < ApplicationRecord
 self.abstract_class = true

 connects_to database: { writing: :secondary, reading: :secondary }
end

# Concrete model class for a blog post.
class BlogPost < SecondaryRecord
 # Model-specific code goes here.
end

This is the approach we took for our research project, although as you'll see there were some other considerations we had to make to isolate ourselves entirely from the primary database.

You can also do manual connection switching with ActiveRecord::Base.establish_connection() or ActiveRecord::ConnectionPool.with_connection, but we found this was more confusing than managing the connection config on the models, and could also force us to establish a connection to the primary we didn't want.

Behind the scenes, connections are handled by the connection handler—you can access this via ActiveRecord::Base.connection_handler. This has knowledge of all of the connection pools and connections (via ActiveRecord::Base.connection_handler.connection_pools), of which the active connection is exposed via ActiveRecord::Base.connection.

ActiveRecord::Base.connectionwill be the connection that is implicitly used for all database operations unless an alternative is specified, including helpers like opening connections with ActiveRecord::Base.transaction. This becomes important to understand when you're managing multiple connections, as you want to ensure that you open transactions against the correct database to avoid cross-database queries or rogue connections.

Our setup

Our setup is very similar to the above example, with a primary database, read-replica, and a separate secondary database we want to provide access to.

Historically all of our pods have had access to both the primary and replica connections. For the purpose of our research project we wanted to prevent the possibility of accidental writes to the primary, so we disabled the read/write primary connection leaving only the connections to the replica and the secondary database. This meant that we had to be acutely aware of how connections were established, since any connections to the primary would fail.

Thankfully the Ruby test framework that we use (RSpec) lets you create custom matchers to use in your tests, and Rails emits a variety of notification sent by Rails when it executes a database query. By checking which connection the queries are executed against, we can then assert whether our code has connected to the expected databases:

# `expected_dbs` should be a Hash of the form:
# {
#   primary: [:writing, :reading],
#   replica: [:reading],
# }
RSpec::Matchers.define :exactly_query_databases do |expected_dbs|
 match do |block|
   @expected_dbs = expected_dbs.transform_values(&:to_set).with_indifferent_access
   @actual_dbs = Hash.new { |h, k| h[k] = Set.new }.with_indifferent_access

   ActiveSupport::Notifications.
     subscribe("sql.active_record") do |_name, _start, _finish, _id, payload|
       pool = payload.fetch(:connection).pool

       next if pool.is_a?(ActiveRecord::ConnectionAdapters::NullPool)

       name = pool.db_config.name
       role = pool.role

       @actual_dbs[name] << role
     end

   block.call

   @actual_dbs == @expected_dbs
 end

 failure_message do |_block|
   "expected to query exactly #{@expected_dbs}, but queried #{@actual_dbs}"
 end

 supports_block_expectations
end

This then lets us add simple assertions to our specs like:

it "doesn't query the primary database" do
 expect { some_query }.to exactly_query_databases({ secondary: [:writing] })
end

Learnings

This research project taught us that we needed to change a few things in our application code to avoid accidentally creating read/write connections to the primary.

Use the right connection

The first learning and refactor in our code was that, now that we used different connections depending on the model we're querying, we should use the methods on the appropriate model to use the correct database connection. For our SecondaryRecord this would mean that instead of:

ActiveRecord::Base.transaction(requires_new: true) do
 # code involving records in the secondary database
end

We should instead do:

SecondaryRecord.transaction(requires_new: true) do
 # code involving records in the secondary database
end

With the connects_to configuration above, this seamlessly swaps the connection as appropriate for the environment we're using. We implemented our connects_to block conditionally like this:

class SecondaryRecord < ApplicationRecord
 self.abstract_class = true

 if Helper.test_environment?
   connects_to database: { writing: :secondary, reading: :secondary }
 end
end

That way we could manage the connections on an infrastructure level with the appropriate pod configuration, and didn't have to rely on branching application logic.

Stub code where needed

We have some code that may be called in our isolated branches which will rely on information which only exists in the primary database, such as feature flags. We use Flipper to manage our feature-flags, and store the configuration in our primary database.

We took a conditional approach depending on whether we wanted these feature-flags to be enabled or toggle-able. In the case where we could make them permanently enabled in our test environment, we could simply check:

if Helper.test_environment? || Flipper.enabled?(:feature_flag)
 # do the thing
end

Otherwise we could check the value of the feature-flag against the replica, rather than using the primary connection:

feature_flag = ActiveRecord::Base.connected_to(role: :reading, prevent_writes: true) do
 Flipper.enabled?(:feature_flag)
end

if feature_flag
 # do the thing
end

For other cases where we had specific edge-cases where we had tables to check on the primary we didn't want to replicate to the secondary, we'd directly stub the code to return a sensible response:

def has_extra_configuration?
 return false if Helper.test_environment?
 # otherwise call the real code and connect to the primary
end

Statesman

Once of the issues we ran into wasn't specifically with our Rails application, but with one of our gems. Statesman is a gem for defining and using state machines that we wrote and maintain, and it reflected the same single-database mentality that we saw in our application.

Thankfully, because of its sensible construction and Rails' connection model, it was fairly simple to refactor the code to use contextual information about the connection from the parent or the state transition model instead of relying on ActiveRecord::Base. This meant that, as with our other refactors, simply modifying the model to use the correct connection would cascade down into any Statesman code it called, requiring no further application changes.

Once we'd tested these changes out and confirmed that they worked in our application we also published them as a new Statesman version—if you'd like to take advantage simply update your version of Statesman to 12.0.0 or later.

The end result

After some refactoring of our application and updates to our models, we successfully isolated our test pods from the primary database. By taking a pragmatic, test-driven approach we were able to identify all connection violations locally and haven't seen any rogue connections on live pods (touch wood).

The multi-database functionality provided by Rails made this a fairly painless process, and most of the complexity came in tidying up our application and supporting code. If you haven't given multi-database functionality a try in Rails, give it a go.