File Inclusion and Directory Traversal in GraphQL

Back at the office, you try to connect to one of your application's servers. But for some reason, your account is no longer granted access.

Anyway. You have a backup so you can restore it to a previous state in a minute. Still, you have to figure out what happened so it doesn't happen again.

From the activity logs, it looks like you changed your user's password on this machine yourself but from an unusual address.

Of course, you did not...

That means someone had access to your credentials for this machine.

Lucky guess? Probably not.

You follow all the password management best practices.

What if the flaw came directly from your directory application using GraphQL?

What happened?

Reading the logs of GraphQL calls, you see one function that was called with unexpected parameters:

getContactPicture("../../../etc/shadow")

This function is supposed to return an image from a filename:

getContactPicture("filename.png")

Here, the file whose content has been requested is not located in the contact picture folder, but in a directory directly at the root of the server: /etc/shadow.

This is a particularly sensitive file as it contains the hashes of user passwords. However, the worker allowing GraphQL to run having root rights, the content of this file was accessible via the function call.

Other sensitive folders/files could have been exposed, such as ~/.ssh , /etc/passwd , /etc/group/, ...

What allows this vulnerability?

This vulnerability is a file inclusion. It can happen when filepaths or folders are requested by the user.

In that case, the user can provide the path of his choice, including paths that can give access to other locations of the server that runs the application server (GraphQL in our case).

There are two types of paths: absolute and relative.

The absolute path is a complete path, from the root of the file system to a folder or a file (eg. home/user/files/myFile.txt).

The relative path is a path from one location in the file system to another. It starts from the current working directory and allows you to move around the file system, either down the tree or up, and this is where the flaw may lie.

Indeed, ./ corresponds to the current folder and ../ corresponds to the parent directory.

For example, the path ../../myFolder/randomFile.txt will have the effect of moving up two levels in the file tree and then fetching the file randomFile.txt in the sub-folder myFolder. And if it is a sensitive file to which the worker has read access, it will be that file that is returned, hence the potential flaw.

Many implementations of functions that rely on user input to retrieve requested files often simply determine the path of the file to return from expressions like:

fileToReturn = './myStorageFolder/' + userRequestedFile

Using the user input directly without any validation leaves room for an attack of this type.

What should we do then?


If you want to catch file inclusion and directory traversal vulnerabilities as well as 100+ other GraphQL security vulnerabilities before it's too late, checkout Escape. Run hundreds of security scans in your CI/CD 🚀


Remediation

Avoid using parameters entered directly by the user

The easiest solution is to avoid bringing into play user input directly. The more you minimize the uncertainty from a code point of view, the more robust you will be.

Let's go back to the previous example, with the function:

getContactPicture("file/path/to/my/image.png")

It can easily be modified to directly give a user ID for example (easy to check) and no longer require the use of a path:

getContactPicture(userId)

Once this new argument has been received, all that is left to do is to retrieve the associated photo from its path, which can for example be stored in one of the contact's attributes.

However, some use cases require the retrieval of a file path from the user and some others are not conducive to refactoring of this type.

Setting up a file/folder name whitelist system

If you need to maintain a function using a path passed as a parameter, you can set up validations on what is entered by the user.

A common practice is to whitelist certain folders, and certain types of extensions, thus excluding all others. This is not very flexible when changes are made but is quite effective if you have a very specific need, with extensions or paths known in advance. If it is too difficult for you to set up a whitelist, the reverse operation - which consists in banning files according to a blacklist this time - also works.

Direct cleaning of user-supplied paths is also possible but much riskier as it requires taking into account all possible attack variants without preventing the user from making legitimate requests. Here is an example of a non-exhaustive list of paths to block.

Compartmentalising your data and setting up middleware

Another option is to implement middleware. it can take the form of an interface to the (potentially external) file system on which the data users may request s stored. If the attached data storage is dedicated to this purpose only and does not contain sensitive data, the risk is limited, even if a user manages to bypass the limitations that this middleware can put in place.

In addition, adding an interface makes it easier to change the type of storage (on-premise to cloud, change of cloud provider, etc.)

Going further

Limit access of the GraphQL worker to what is strictly necessary

By restricting as much as possible the files and folders to which the GraphQL worker has access, you reduce the range of files potentially exposed by an attack. Such a worker probably doesn't need root access and will never need to consult a good part of the folders on your server!

Take advantage of virtualisation

With virtualisation, it is possible to have several virtual machines completely isolated from each other. The GraphQL worker can therefore be isolated on its own virtual machine, allowing it access only to the elements absolutely necessary for its proper execution.

It is therefore not recommended to run a GraphQL client to serve files on the same machine that also stores sensitive internal data.

Any entry point, even if it is apparently well secured, can always turn into an entry point for malicious users.

In other words, only use user-provided paths if you have no other alternatives. Thinking about how your functions work should be done when you design your GraphQL model but also throughout the application's life. Sometimes it is better to change part of the business logic, rather than expose yourself to the risk of data being compromised!

Reference