How to test GraphQL performance?
Quickly identify potential DoS, Complexity, N+1 issues, and more.
Many developers tend to steer clear of GraphQL APIs due to concerns about performance issues. But is GraphQL generally worse at performance than REST? Or is this concern primarily rooted in developers lacking the appropriate tools to evaluate GraphQL performance effectively?
In this blog post, let's dive into GraphQL performance. We'll explore the factors that can influence GraphQL's efficiency and discuss strategies that can help developers manage performance issues when working with GraphQL APIs. By the end of this article, you should understand how you can automate testing to address performance concerns and get the most out of GraphQL for your projects.
What makes GraphQL APIs perform poorly?
We found that 80% of all GraphQL endpoints scanned by Escape since its inception are vulnerable to Denial of Service (DoS) and Complexity attacks. That's a lot of APIs with performance problems!
GraphQL being a Query Language with user-supplied queries, an attacker can craft a request requiring a lot of resources for your server to process. All these requests have in common that they use little-known features of GraphQL.
DoS attacks are made more accessible by a GraphQL feature named aliasing. With GraphQL aliases, an attacker can call a resolver many times in the same query or mutation, bypassing HTTP-based rate limiting. πRead more about this vulnerability and its remediations.
Another common cause of performance issues is the ability of clients to nest GraphQL queries and even make them deeply recursive. That happens when your graph contains self-referencing objects. (e.g., a user has friends, who also have friends, who also have friends, etc.) Attackers can abuse this feature to cause CPU or memory exhaustion. π Learn how it works and how you can protect your API here.
Performance issues may also result from the infamous "N+1 problem" in GraphQL. For instance, imagine we want to fetch all users' applications' names. The endpoint will fetch all users first, then resolve subsequent applications sequentially to get their names. We, therefore, have 1
request for the users plus N
requests for the applications, one for each user.
query {
users {
id
email
applications {
name
}
}
}
If the underlying storage of the endpoint is a relational database, logs will look like this:
-- Fetch all users
SELECT * FROM users;
-- Fetch applications for all users, sequentially
SELECT name FROM applications WHERE userId = 1;
SELECT name FROM applications WHERE userId = 2;
SELECT name FROM applications WHERE userId = 3;
SELECT name FROM applications WHERE userId = 4;
Last but not least, performance problems may also result from debug queries. As a developer, if you have a huge query that allows you to fetch most of your database, be sure to have at least two types of environments (dev and production), with debug queries disabled in production. We wrote two articles about best practices for the production environment: GraphQL Error Handling Best Practices for Security and When GraphQL Errors become a Security Issue.
Now, let's see how you can find those vulnerabilities.
How does the automated GraphQL testing work?
During the scan, Escape's scanner sends a custom header (X-Escape-Request-ID) with each request sent to your API endpoint. This lets us know when we sent it and when we received it. We call this request tracing. The scanner keeps track of all response times and puts the concerning requests in a queue.
After the scan, we synchronize the queue to ensure the flagged requests outline performance problems in your app.
Let's take a look at the Escape platform for browsing performance issues found during a scan.
How can you improve your GraphQL performance?
Here is the performance issues view of a scan run with Escape's GraphQL Security Scanner, which enables you to detect and fix your performance problem. Escape lists poorly performing queries and associates a severity score depending on the response time. A response time above 4 seconds is flagged as medium severity and above 10 seconds as high.
You can read our documentation about DOS and complexity to learn more about security timeout.
You can get more details by clicking on a performance vulnerability. For example, in the screenshot above, our scanner found that the debug
query took 5 seconds to process, flagging the issue as medium. As a developer, you can see how to reproduce the request by copying the query and trying it yourself.π
Want to secure your GraphQL application now? Try Escape for free π