Copy
You're reading the Ruby/Rails performance newsletter by Speedshop.

Looking for an audit of the perf of your Rails app? I've partnered with Ombu Labs to do do just that.

日本語は話せますか?このニュースレターの日本語版を購読してください
I just published a blog post about why Gusto's system tests were unusually slow. It's a story about profiling and how Rails reloads code. Give it a read!

A common Rails tale: memory usage "spikes", but no one knows why. Is it a leak? Probably not, but it might be this.

I'm often asked to improve the memory usage of a Rails application. The first thing I do is look at the memory usage graph and figure out which category this app is in:
  1. 1Memory usage slowly increases over time, in a logarithm-like curve approaching an asymptote.
  2. Memory usage grows rapidly and infrequently, in "spikes" that last about 1 second or so. Memory usage may or may not go back down afterward.
  3. Memory usage increases at a linear rate until the process is killed. Memory growth continues linearly even if the process has been alive for 24 hours or more.

Category 3 is usually an indication of a true memory leak, usually caused by a rogue C native extension. Category 1 is usually an indication of either a very old and mature application which just creates lots of objects, or an app which is running multi-threaded but not using the jemalloc allocator.

Category 2 is what I want to talk about today. These apps have "spiky" or "cliff-like" increases in memory usage.
These memory usage spikes are almost always caused by iterating over large collections and keeping lots of objects in memory at one time.
 

Not using find_each


Sometimes this is quite easy to spot, if you've simply loaded a large collection all at once:

SomeMassiveCollection.all.each

That code will load every single record of SomeMassiveCollection's table into memory before passing each row to the each block.

What we want is to reduce the memory usage at any one particular point in time and spread that memory usage out over time. Instead of holding 1 million objects in memory at once, we want to hold 100,000 objects in memory, free them, and then repeat 10 times.

This is, basically, streaming.

Rails has you covered here, with the ActiveRecord::Batches API.

It's a drop-in replacement for all.each. That's why there's a RuboCop rule for it!
 

ActiveRecord Relation "cache"


Another unusual source of holding lots of objects in memory at once comes from code like this:

user = User.find_by(...)
user.posts.each do |post|
  post.comments.find_or_initialize_by(...)
end


What happens here is that each iteration of the loop creates an object which is added to the internal @records array of the Post object.

So, the object structure looks something like this:

user
  posts (Array)
    comments (Array
)

So, as long as we are iterating through this loop, every single comment you find or initialize remains in-scope, and cannot be garbage collected. If you do this with enough posts... that's a lot of memory usage.

Instead, you're better off using something like:

Comment.find_or_initialize_by(post: post, ...)

That way, the record is not added to any internal arrays, and the comment will be garbage collectable after each iteration of the loop.
 

Beware ActiveRecord QueryCache


Every time you trigger a SQL query with ActiveRecord, it checks to see if that query has already been run and the result cached.

This is the ActiveRecord::QueryCache. When you look at your Rails logs and see lines like:

[CACHE] User Find (0.0ms)

... that's a cache hit. This cache has two important characteristics:
  1. It is cleared after each request (i.e. it is a thread-local)
  2. It has no size limit.
This has the effect of making big requests or jobs even more expensive than they normally would be. N+1s now store their result in a Ruby object which will not be cleared until the end of the request. The worst case scenario would be to make 1000s of SQL queries, each missing the query cache and storing a new result but never reading it.

The solution? Well, don't make so many N+1s, of course. Easier said than done.
 

What other "caches" exist?


What other "caches" of this type might exist in your app? Do you have any custom "request stores" or other request-based caches which might be accidentally turned "on" in environments that don't have requests, like Rake tasks? Or maybe people are storing things in the "request store" that are much larger than you thought?

I hope you've been enjoying the weekly newsletter pace lately. Unfortunately, I'll be taking the next two weeks off: next week for Railsconf, and then the next week for Japan's Golden Week holiday. See you in the middle of May :)

-Nate
You can share this email with this permalink: https://mailchi.mp/railsspeed/two-common-sources-of-memory-leaks-in-rails-apps?e=[UNIQID]

Copyright © 2023 Nate Berkopec, All rights reserved.


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.