We recently embarked on a research project to extract some functionality in our codebase to a separate database. Our motivation was to investigate scalability options for our back-end—we've previously written about our Postgres database, and as we continue to scale the business our need for database performance grows accordingly.
To properly test out new database back-ends we needed to connect up some representative workloads in our Rails application, which led us to investigate the feasibility of adding new databases to our existing environments. This isn't entirely novel—we already have code that does this in our Rails application—but the twist was that we wanted to entirely isolate it from being able to write to our primary database. We could still read from the replica if required, but any writes had to be directed exclusively to our new database.
(This post focuses on the setup for multiple databases, but we'll talk more about what platforms we've been experimenting with in a future post).
Multiple databases in Rails
As a bit of background if you're not familiar with Rails—the core framework has good support for multiple databases, as detailed in this guide, supporting any number of database connections. When setting up your databases the core config is managed in the database.yml file—here's a basic example of a primary database with a read-replica, and a secondary database for additional content:
production:
# Both the primary and replica connect to the same database,
# but with different roles.
primary:
host: primarydb.vpc.local
adapter: postgresql
username: foo
password: bar
database: blog
replica:
host: replicadb.vpc.local
adapter: postgresql
username: foo
password: bar
database: blog
replica: true
# This connects to a different database. This might store information
# on records we don't always need to access.
secondary:
host: commentdb.vpc.local
adapter: postgresql
username: bar
password: baz
database: comment_metadata
The primary configuration will be used by default per Rails naming conventions. If we want to direct queries to our replica to offload some read traffic we can use the appropriate helpers to configure the appropriate database for the writing and reading roles:
# Set the base ApplicationRecord to use the replica when reading.
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { writing: :primary, reading: :replica }
end
# When code should connect to the replica, set the role with connected_to.
ActiveRecord::Base.connected_to(role: :reading, prevent_writes: true) do
# code to use the replica goes here
end
Switching between actual databases is configured in a similar manner, we just swap out our primary and replica connections for the secondary database:
# Abstract base class for models.
class SecondaryRecord < ApplicationRecord
self.abstract_class = true
connects_to database: { writing: :secondary, reading: :secondary }
end
# Concrete model class for a blog post.
class BlogPost < SecondaryRecord
# Model-specific code goes here.
end
This is the approach we took for our research project, although as you'll see there were some other considerations we had to make to isolate ourselves entirely from the primary database.
You can also do manual connection switching with ActiveRecord::Base.establish_connection()
or ActiveRecord::ConnectionPool.with_connection
, but we found this was more confusing than managing the connection config on the models, and could also force us to establish a connection to the primary we didn't want.
Behind the scenes, connections are handled by the connection handler—you can access this via ActiveRecord::Base.connection_handler
. This has knowledge of all of the connection pools and connections (via ActiveRecord::Base.connection_handler.connection_pools
), of which the active connection is exposed via ActiveRecord::Base.connection
.
ActiveRecord::Base.connection
will be the connection that is implicitly used for all database operations unless an alternative is specified, including helpers like opening connections with ActiveRecord::Base.transaction
. This becomes important to understand when you're managing multiple connections, as you want to ensure that you open transactions against the correct database to avoid cross-database queries or rogue connections.
Our setup
Our setup is very similar to the above example, with a primary database, read-replica, and a separate secondary database we want to provide access to.
Historically all of our pods have had access to both the primary and replica connections. For the purpose of our research project we wanted to prevent the possibility of accidental writes to the primary, so we disabled the read/write primary connection leaving only the connections to the replica and the secondary database. This meant that we had to be acutely aware of how connections were established, since any connections to the primary would fail.
Thankfully the Ruby test framework that we use (RSpec) lets you create custom matchers to use in your tests, and Rails emits a variety of notification sent by Rails when it executes a database query. By checking which connection the queries are executed against, we can then assert whether our code has connected to the expected databases:
# `expected_dbs` should be a Hash of the form:
# {
# primary: [:writing, :reading],
# replica: [:reading],
# }
RSpec::Matchers.define :exactly_query_databases do |expected_dbs|
match do |block|
@expected_dbs = expected_dbs.transform_values(&:to_set).with_indifferent_access
@actual_dbs = Hash.new { |h, k| h[k] = Set.new }.with_indifferent_access
ActiveSupport::Notifications.
subscribe("sql.active_record") do |_name, _start, _finish, _id, payload|
pool = payload.fetch(:connection).pool
next if pool.is_a?(ActiveRecord::ConnectionAdapters::NullPool)
name = pool.db_config.name
role = pool.role
@actual_dbs[name] << role
end
block.call
@actual_dbs == @expected_dbs
end
failure_message do |_block|
"expected to query exactly #{@expected_dbs}, but queried #{@actual_dbs}"
end
supports_block_expectations
end
This then lets us add simple assertions to our specs like:
it "doesn't query the primary database" do
expect { some_query }.to exactly_query_databases({ secondary: [:writing] })
end
Learnings
This research project taught us that we needed to change a few things in our application code to avoid accidentally creating read/write connections to the primary.
Use the right connection
The first learning and refactor in our code was that, now that we used different connections depending on the model we're querying, we should use the methods on the appropriate model to use the correct database connection. For our SecondaryRecord this would mean that instead of:
ActiveRecord::Base.transaction(requires_new: true) do
# code involving records in the secondary database
end
We should instead do:
SecondaryRecord.transaction(requires_new: true) do
# code involving records in the secondary database
end
With the connects_to
configuration above, this seamlessly swaps the connection as appropriate for the environment we're using. We implemented our connects_to
block conditionally like this:
class SecondaryRecord < ApplicationRecord
self.abstract_class = true
if Helper.test_environment?
connects_to database: { writing: :secondary, reading: :secondary }
end
end
That way we could manage the connections on an infrastructure level with the appropriate pod configuration, and didn't have to rely on branching application logic.
Stub code where needed
We have some code that may be called in our isolated branches which will rely on information which only exists in the primary database, such as feature flags. We use Flipper to manage our feature-flags, and store the configuration in our primary database.
We took a conditional approach depending on whether we wanted these feature-flags to be enabled or toggle-able. In the case where we could make them permanently enabled in our test environment, we could simply check:
if Helper.test_environment? || Flipper.enabled?(:feature_flag)
# do the thing
end
Otherwise we could check the value of the feature-flag against the replica, rather than using the primary connection:
feature_flag = ActiveRecord::Base.connected_to(role: :reading, prevent_writes: true) do
Flipper.enabled?(:feature_flag)
end
if feature_flag
# do the thing
end
For other cases where we had specific edge-cases where we had tables to check on the primary we didn't want to replicate to the secondary, we'd directly stub the code to return a sensible response:
def has_extra_configuration?
return false if Helper.test_environment?
# otherwise call the real code and connect to the primary
end
Statesman
Once of the issues we ran into wasn't specifically with our Rails application, but with one of our gems. Statesman is a gem for defining and using state machines that we wrote and maintain, and it reflected the same single-database mentality that we saw in our application.
Thankfully, because of its sensible construction and Rails' connection model, it was fairly simple to refactor the code to use contextual information about the connection from the parent or the state transition model instead of relying on ActiveRecord::Base
. This meant that, as with our other refactors, simply modifying the model to use the correct connection would cascade down into any Statesman code it called, requiring no further application changes.
Once we'd tested these changes out and confirmed that they worked in our application we also published them as a new Statesman version—if you'd like to take advantage simply update your version of Statesman to 12.0.0 or later.
The end result
After some refactoring of our application and updates to our models, we successfully isolated our test pods from the primary database. By taking a pragmatic, test-driven approach we were able to identify all connection violations locally and haven't seen any rogue connections on live pods (touch wood).
The multi-database functionality provided by Rails made this a fairly painless process, and most of the complexity came in tidying up our application and supporting code. If you haven't given multi-database functionality a try in Rails, give it a go.