Generally, "performance" for a web application is how fast your app serves pages. But there are different ways to measure "fast":
Many apps will have many pages with no particular performance issues on their own, but a few pages with grave performance issues. For some sites, the most complex and worst-performing pages are the most-used, like a user's timeline. For others, the worst offenders might be infrequently used but high-impact pages; for instance, checkout or account signup on an e-commerce site.
Scalability is the ability of your app to use additional resources to handle load, and the overall ability to handle load without failing. For instance, if your app uses a lot of webserver CPU time, you can usually get bigger webservers (scale vertically) or add more webservers under a load balancer (scale horizontally). Even if your app is slow, it can still handle heavy load if you throw enough resources at it.
On the other hand, if your app is taxing a resource that you can no longer increase (for instance, a single database that is as large as you can make it, since you can't usually just add multiple databases to an application and integrate them seamlessly) then you have a scalability problem.
The first thing to remember about performance optimization is when not to do it.
To optimize effectively, you must start with clean, expressive, concise code that fully implements its features.
This does not mean you have to finish your entire app before you work on performance. Performance problems are often localized to specific features. As long as your specific problem features are implemented and clean, you can work on making them faster.
Sometimes, code is slow because it is actually wrong, which means you can improve the correctness of your code and speed it up at the same time. And sometimes, code is slow because it is too complex, when means that you can improve the quality of your code and speed it up at the same time. But if your app is slow even after your code is correct and beautiful, than optimization may make it less correct and less beautiful.
The more you try to optimize, the more complex the code becomes. Fast is a feature and adding features to code increases complexity. So, don't implement fast, the feature, until you know you need it.
Since performance optimization often increases code complexity, you should approach slowness like you would approach any other problem: identify it, reproduce it, try to fix it, and roll back your changes if your fix doesn't work. If you take a "shotgun" approach to fixing performance problems, you may introduce a lot more complexity than you need to fix the problem (if you fix it at all).
Try to find out what the problem is exactly, and benchmark it. Benchmarking is very hard because real-world speed of a feature depends on a lot of details about your application's environment, so finding performance issues can be very frustrating. Still, you have to find some way to benchmark the problem, or else you won't know where to start or when you're finished. Fortunately, the bigger the problem is, the easier it is to reproduce and isolate.
When the problem happens within the request/response cycle, before the browser has received the response, it's a backend issue.
Backend performance issues are often scalability issues, because performance problems on the backend eat up limited resources. Every visitor to a website brings their own frontend resources (i.e. a computer or mobile device with a browser), but they all use the same backend resources.
Web applications generally run on time scales that make code execution speed and CPU usage irrelevant. In other disciplines such as games (which may need to, for instance, update the state of every object in memory 60 times per second) CPU usage is critically important, but web apps tend to have bigger problems.
Ruby, as an interpreted, high-level language with flexible typing, is extremely slow compared to other languages (often by factors of ten, 100, even 1000). Despite that, code execution speed is usually still small potatoes compared to properly slow actions like database queries or API calls.
Also, web server processes are usually scalable both vertically and horizontally, so when you do have execution speed issues, as long as they're not exponential, you may be able to solve them by increasing system resources.
Still, if you use the wrong algorithms to process data, you can still run into code execution speed issues, and sometimes they can even scale exponentially with the size of your data set. For instance, clumsy XML parsing, repeatedly searching large data sets in memory without proper sorting, etc.
ab
, or more comprehensively with cloud tools and interaction scripts)ActiveRecord::Base.logger = Logger.new(STDOUT)
)EXPLAIN ANALYZE
and process list (locally or through Heroku pg_extra
tools)Let's set up a database to go over some backend troubleshooting techniques.
This database is stored in memory, which may make a performance tuning demonstration a little awkward since memory operations will be extremely fast. In a real-world web app where the database is in the same data center, but on a different machine from the web server, every individual database query will have a measurable round-trip time (RTT), plus the time it takes to serve the actual request. Additionally, if the database has to hit the disk to serve the request, that can add tens of milliseconds. These costs can quickly add up.
require "active_record"
require "sqlite3"
require "benchmark"
ActiveRecord::Base.establish_connection(
adapter: 'sqlite3',
database: ':memory:',)
ActiveRecord::Base.logger = Logger.new(STDOUT) # ideally this would output inside the notebook,
# but instead it comes out in the terminal
#We'll define a few simple tables that will illustrate core concepts here.
#Apparently, Rails table definitions do not customarily use foreign keys to express relationships
ActiveRecord::Migration.class_eval do
create_table :posts do |t|
t.string :title
t.text :body
t.integer :user_id
end
create_table :users do |t|
t.string :username
t.string :email
end
create_table :comments do |t|
t.integer :post_id
t.integer :user_id
t.text :text
end
end
class User < ActiveRecord::Base
has_many :posts
has_many :comments
end
class Post < ActiveRecord::Base
has_many :comments
belongs_to :user
end
class Comment < ActiveRecord::Base
belongs_to :post
belongs_to :user
end
#Now populate the user and post tables. Since this is a performance demonstration, we'll make a lot of records.
#This population process itself is a huge performance issue, and will fire over 100000 INSERT requests.
#If it weren't a local, in-memory database it would take a very long time. Even as it is, it takes several minutes.
#Making this process fast will require bulk inserts -- a standard feature on some ORMs, but not ActiveRecord
#as of yet. So, probably a lot of raw SQL would be required.
(1..100).each do |n|
User.create username: "User #{n}", email: "#{n}@example.com"
end
User.all.each do |user|
(1..100).each do |n|
Post.create user: user, title: "#{user.username}'s post number #{n}", body: "Lorem ipsum, #{user.username}: #{n}"
end
end
Post.all.each do |post|
(1..10).each do |n|
Comment.create post: post, user_id: n, text: "User #{n}'s comment to post #{post.id}"
end
end
Comment.count
-- create_table(:posts) -> 0.0052s -- create_table(:users) -> 0.0008s -- create_table(:comments) -> 0.0006s
100000
Here's an example of a common issue in Rails where you have one database read to get a list of objects, but then N database reads to get specifics about those objects.
Benchmark.measure do |x|
Post.all.each do |post|
post.user.username # we're not doing anything with this in this example, but the data will still be read
end
end
6.350000 0.170000 6.520000 ( 6.582814)
Even on an in-memory database, this takes a substantial amount of time to run.
The problem here is that Post.all
fetches the list of posts, but peeking at post.user.username
additionally fetches the user for each post -- one by one. This example is contrived, but in Rails applications, similar problems can be hard to spot because the list-fetching will probably happen in the controller, but the N individual fetches may happen in the view.
Whenever possible, avoid accessing the database in the view and instead access it in the controller and pass the data into the view. That will make performance troubleshooting more straightforward, as your database calls will be in the same part of your code. The view layer is for presentation, not for reaching out and fetching data from external resources.
In this example, "N+1" equals 10001. Because our example here has such fast (local, in-memory) database access, a large number is necessary to show the problem. In a web application it will be a smaller number, because the initial list you fetch will probably be paginated or limited in some way. However, even if N+1 is more like 51 (and frequently it's really something like 4N+1, because if you have one database call inside a for... each loop, you might have others), in a real web application those calls will have a significant performance penalty because of the round trip time to the production database.
Benchmark.measure do |x|
Post.all.includes(:user).each do |post|
post.user.username # we're not doing anything with this in this example, but the data will still be read
end
end
0.590000 0.000000 0.590000 ( 0.594021)
Here's the log from the database:
Post Load (34.5ms) SELECT "posts".* FROM "posts"
User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)
N+1=10001 is now down to 2. One to select all the posts, and one to select all of the users referenced by the posts.
This is actually one more than it could be -- you could offload some of this work to the database and instead make these 2 queries into 1, with a join. However, ActiveRecord idiomatically prefers this style of using includes and multiple calls over joins, unlike many other ORMs. You can still do joins in ActiveRecord, but since you need to specify SQL fragments, they are more trouble than it is worth to eliminate a single database call. You may need to use joins to solve more advanced problems, though.
Whether a join would actually be faster in this case depends on the round trip time among other factors. But the difference in this case will be negligible.
You may notice that the elapsed time as measured by Benchmark is much higher than the elapsed time shown in the database logs. The database is very fast in this example, but a lot of time is spent on ruby code to instantiate objects, only to throw them away. Normally you won't deal with 10000 objects in a request/response cycle, so this is not as big a problem as it might seem.
When you have a lot of records in your database, it can become very expensive to fetch specific rows or order tables in certain ways. For instance:
#this is extremely fast
Benchmark.measure do |x|
Comment.find(50)
#Comment Load (0.3ms) SELECT "comments".* FROM "comments" WHERE "comments"."id" = ? LIMIT 1 [["id", 50]]
end
0.000000 0.000000 0.000000 ( 0.001440)
#this is much slower
Benchmark.measure do |x|
Comment.find_by text: "User 50's comment to post 7841"
#Comment Load (14.1ms) SELECT "comments".* FROM "comments" WHERE
#"comments"."text" = 'User 50''s comment to post 7841' LIMIT 1
end
0.020000 0.000000 0.020000 ( 0.015858)
#this is slower still
Benchmark.measure do |x|
Comment.order(:text).last
#Comment Load (62.8ms) SELECT "comments".* FROM "comments" ORDER BY "comments"."text" DESC LIMIT 1
end
0.060000 0.000000 0.060000 ( 0.058175)
Even the two "slow" examples may seem fast (14 and 62 ms respectively), but the problem scales with the size of the table, and a real-world database is likely to be slower unless the entire data set is loaded into memory (as opposed to only being available on disk). It does not take very much activity for a website that collects user activity to acquire tables big enough to matter here.
Setting the third example aside, look at the SQL for the first and second example. They are almost identical! Both of them are SELECT queries with single WHERE clauses for a single row. But, the second one takes much longer. The reason is because the first query is against an indexed column, and the second is not. In order to find the correct row in the first example, the database does a binary search on an index; in order to find the row in the second example, the database does a "table scan", an exhaustive search of every row. That's still pretty fast, as it's loaded into memory, but the index query scales as log(n)
where the table scan query scales linearly with n
.
The id
column is indexed because all PRIMARY KEY
columns are indexed. More generally, all unique columns (PRIMARY KEY
columns being unique) are necessarily indexed, because otherwise the database would need to do a full table scan on each INSERT
to check if the insertion violates uniqueness.
We can make the database index the text
column, too:
ActiveRecord::Migration.class_eval do
add_index('comments', 'text')
end
#(194.7ms) CREATE INDEX "index_comments_on_text" ON "comments" ("text")
#Same query, but much faster
Benchmark.measure do |x|
Comment.find_by text: "User 50's comment to post 7841"
#Comment Load (0.2ms) SELECT "comments".* FROM "comments"
#WHERE "comments"."text" = 'User 50''s comment to post 7841' LIMIT 1
end
#An even bigger improvement here
Benchmark.measure do |x|
Comment.order(:text).last
#Comment Load (0.2ms) SELECT "comments".* FROM "comments" ORDER BY "comments"."text" DESC LIMIT 1
end
-- add_index("comments", "text") -> 0.1864s
0.000000 0.000000 0.000000 ( 0.000491)
All columns that you regularly search, on tables that you expect to get big, should probably be indexed.
There is a cost to each indexing, however: every time you insert or update data in the table, all of the indexes on the table must be updated. This is akin to writing and filing a different index card in every card catalogue in your library every time you file a new book. The process of filing the book becomes very cumbersome. But, as a result, you can search by a number of different methods to find the book later.
There is no point in indexing columns on tables that you do not expect to ever have more than a few hundred rows. The cost of referring to the the index is potentially larger than the cost of checking every row, in which case the database will simply ignore the index at read time.
Sometimes you need information that you can only obtain in an irreducably slow way. This is common, but also a complex problem.
For instance, perhaps you want to display a list of users ordered by how many comments they have made. You could do it like this:
Benchmark.measure do |x|
User.select("users.id, users.username, COUNT(comments.id) AS comment_count").
joins(:comments).order("comment_count DESC").limit(5).each do |u|
#User Load (33.9ms) SELECT users.id, users.username, COUNT(comments.id) AS comment_count FROM "users"
#INNER JOIN "comments" ON "comments"."user_id" = "users"."id" ORDER BY comment_count DESC LIMIT 5
puts u.username
end
end # for some reason the output here is not what I expect -- regardless, the correctish query seems to be firing
User 10
0.040000 0.000000 0.040000 ( 0.034703)
For each user in the table, the database has to individually count the comment rows associated with that user. Then sorts, in-memory, the users by their comment count. This didn't take very long in wall clock time, but the algorithm is very inefficient. As the numbers of users grows, the process slows.
The best solution here is probably to cache your comment count. The fully normalized comment count data is stored in the comments table -- it is the exactly the number of comment rows. But because it's so slow to find that out, you can cache (denormalize) the data by putting it elsewhere, such as a new integer column on the users table.
The problem is that once you put that data elsewhere, it exists in two places, and only one is canonical (the comments table). Any other place you put the comments count in is only a reflection of the true value, and you have to keep it updated whenever you change the canonical data store (again, the comments table). If the two get out of sync, the application could produce inaccurate results.
This is the cache invalidation problem. In this case, the problem is not too bad -- we just update the cache every time we update the comments table. But even that is hard. For instance, it's straightforward to increment the cache, let's call ut User.comment_count
, every time you insert a row. But you also have to do so when you delete a row. And when you edit a row and change the user_id
field. If you decide later that you want to have a separate comment count ranking that only counts comments over 100 characters long, than that's an easy tweak to the fully normalized query above, but it would require a totally new cache with new code to regulate the cache.
In this specific, very common case, ActiveRecord has a feature called counter_cache
that does the dirty work automatically. You can even index the counter_cache
for even faster sorts. However, even slightly more complicated caching cases will require a lot of thoughtful code and a lot of careful testing.
This section is necessarily incomplete -- we could write an entire talk on cache invalidation alone. Suffice it to say that caching, say, the results of a slow API query that may change without warning is much harder, and requires some painful tradeoffs between speed and data consistency.
If the problem happens after the HTTP response is received by the client, for instance before the browser has finished rendering the page, it's a frontend issue.
Frontend issues can be diagnosed with browser dev tools. Issues that are reproducible on all browsers are usually straightforward to diagnose, but some issues only happen on certain platforms, certain browsers, certain mobile devices or resource-constrained clients of any type.
As with all frontend issues, you may eventually have to accept that performance will be poor in certain client circumstances. For instance, modern websites perform poorly when downloaded over a 2G cellular modem, and trying to solve that problem is only worthwhile if mid-'00s mobile devices are truly a target platform for your app.