DevSecOps

DevSecOps 101 Part 2: Detecting Insecure Source Code

In this tutorial, we will learn how to detect and fix vulnerable Python code using Semgrep.

Tristan Kalos

Oct 19, 2021 • 4 min read

Are you concerned about the security of your software applications? Today protecting your code against potential vulnerabilities is crucial.

Welcome to our comprehensive guide on how to detect insecure source code. In this tutorial, we delve deep into the techniques and tools essential for identifying and mitigating security risks within your codebase; more particularly, we will learn how to detect and fix vulnerable Python code using Semgrep.

This article is part of a series about integrating security tooling into the development process. You can find the rest of the articles here:

This tutorial will be based on the repository resulting from Part 1, so be sure to follow it first if you want to reproduce the steps below.

Why detecting insecure source code is important

Detecting insecure source code is crucial for several reasons, especially in the context of "shift left" security practices.

💡

Discover why DevOps recommends "shift left" principles

Firstly, it helps prevent potential security breaches and data leaks (just consider recent application security case studies) that can have severe consequences for both users and organizations, including financial losses and reputational damage.

Secondly, identifying vulnerabilities early in the development process allows for timely remediation, reducing the cost and effort associated with fixing issues later in the software lifecycle.

Thirdly, compliance with industry regulations and standards often requires thorough security assessments, making robust code analysis indispensable.

Ultimately, by proactively detecting and addressing insecure code, developers can enhance the overall resilience and trustworthiness of their software applications.

What is Semgrep and why choosing it to secure code?

Analyzing source code to find security vulnerabilities, namely Static Application Security Testing, has been part of the enterprise software development process for years.

But the tools used to do it were expensive, slow, and hard to master. Until recently, the only open-source tools with a decent developer experience were the linters like pylint, eslint, or their equivalents in other languages.

Thus, only the big corporations were able to test the security of their source code, leaving solo developers or tiny teams to rely on testing by hand, or worth: faith.

But this time has come to an end with the release of an exciting tool: Semgrep.

Semgrep is, as its names suggest, like grep, but for source code. It allows developers to automatically find patterns in their source code while taking into account semantics like variable renaming. You can find an example of Semgrep finding XSS in Django code here.

Even better, Semgrep supports a lot of languages, and the Semgrep community already has written plenty of rulesets to detect bad practices and security flaws for those.

The goal of this tutorial is to deploy Semgrep on our vulnerable Python app to detect vulnerable code. And guess what? It only takes a few minutes!

Detecting insecure code patterns

Let's go into our dvpwa repository and source the virtualenv

cd <your_path_to_dvpwa>/dwpva
source .venv/bin/activate

And then install semgrep using pip

pip install semgrep

And then run semgrep

semgrep --config "p/ci" --exclude .venv --error

You might ask yourself What the hell did I just write?, so let's explain a bit the simple options we used here:

--config "p/ci" means "use the community-written security rules for running in a ci environment"
--exclude .venv means "do not search for vulnerable source code in the .venv folder" (otherwise it would return hundreds of alerts!)
--error means return a non-zero error code if alerts are found. Useful for making the CI fail if insecure patterns are detected

You then should see the following output:

Of course! dvpwa uses the md5 algorithm to hash passwords, which is known for being insecure! Semgrep even gives us advice on how to solve the problem.

Adding Semgrep to the CI/CD

Now that we discovered we were using vulnerable code, what about putting Semgrep inside our CI/CD to avoid ever doing that in the future?

Let's improve our Github Action from part 1 to also use Semgrep.

Open .github/workflows/main.yaml and add the following job:

  code_analysis:
    runs-on:  ubuntu-latest
    name: Analyse code for security flaws
    steps:
      - uses: actions/checkout@v2
      - name: Code Security Analysis
        run: pip3 install semgrep && semgrep --config "p/ci" --error
        shell: bash

Your main.yaml file should look like this:

on: [push]

jobs:
  dependency_analysis:
    runs-on:  ubuntu-latest
    name: Test dependencies for security flaws
    steps:
      - uses: actions/checkout@v2
      - name: Dependency Security
        run: pip3 install safety && safety check
        shell: bash
  code_analysis:
    runs-on:  ubuntu-latest
    name: Analyse code for security flaws
    steps:
      - uses: actions/checkout@v2
      - name: Code Security Analysis
        run: pip3 install semgrep && semgrep --config "p/ci" --error
        shell: bash

Now, let's push our changes on the distant repository:

git add .github/workflows/main.yaml
git commit -m "Add static analysis security testing."
git push origin master

Which should create an action named "Analyse code for security flaws" in your Github Action panel

Of course, this action fails because dvpwa contains insecure code!

Conclusion

In only a few steps, we installed a tool that scans all our Python code to find insecure patterns, gives us recommendations on how to solve them, and integrates seamlessly into our CI/CD.

But the power of Semgrep goes far beyond: with it, you can write custom tests (as you can do with Escape to test your app's API security), create automated refactoring, and enforce complex coding patterns. For more details, check out their documentation.

In the next tutorial, we will have a look at dynamic analysis, aka programs that interact with your running app to find security flaws.

💡 Want know more about DevSecOps? Check out the following articles:

DevOps vs DevSecOps: exploring the key differences and how to make the shift
Top 7 DevSecOps Best Practices
How to automate and secure deployment within GitLab CI with Syft and Grype
9 GraphQL Security Best Practices to learn how to build safe GraphQL APIs