DevSecOps 101 Part 2: Detecting Insecure Source Code
In this tutorial, we will learn how to detect and fix vulnerable Python code using Semgrep.
Are you concerned about the security of your software applications? Today protecting your code against potential vulnerabilities is crucial.
Welcome to our comprehensive guide on how to detect insecure source code. In this tutorial, we delve deep into the techniques and tools essential for identifying and mitigating security risks within your codebase; more particularly, we will learn how to detect and fix vulnerable Python code using Semgrep.
This article is part of a series about integrating security tooling into the development process. You can find the rest of the articles here:
- Part 1: Detecting Insecure Dependencies (SCA)
- Part 3: Scanning Live Web Applications with Nuclei scanner
- Part 4: Scanning Docker Images With Trivy
This tutorial will be based on the repository resulting from Part 1, so be sure to follow it first if you want to reproduce the steps below.
Why detecting insecure source code is important
Detecting insecure source code is crucial for several reasons, especially in the context of "shift left" security practices.
Firstly, it helps prevent potential security breaches and data leaks (just consider recent application security case studies) that can have severe consequences for both users and organizations, including financial losses and reputational damage.
Secondly, identifying vulnerabilities early in the development process allows for timely remediation, reducing the cost and effort associated with fixing issues later in the software lifecycle.
Thirdly, compliance with industry regulations and standards often requires thorough security assessments, making robust code analysis indispensable.
Ultimately, by proactively detecting and addressing insecure code, developers can enhance the overall resilience and trustworthiness of their software applications.
What is Semgrep and why choosing it to secure code?
Analyzing source code to find security vulnerabilities, namely Static Application Security Testing, has been part of the enterprise software development process for years.
But the tools used to do it were expensive, slow, and hard to master. Until recently, the only open-source tools with a decent developer experience were the linters like pylint, eslint, or their equivalents in other languages.
Thus, only the big corporations were able to test the security of their source code, leaving solo developers or tiny teams to rely on testing by hand, or worth: faith.
But this time has come to an end with the release of an exciting tool: Semgrep.
Semgrep is, as its names suggest, like grep, but for source code. It allows developers to automatically find patterns in their source code while taking into account semantics like variable renaming. You can find an example of Semgrep finding XSS in Django code here.
Even better, Semgrep supports a lot of languages, and the Semgrep community already has written plenty of rulesets to detect bad practices and security flaws for those.
The goal of this tutorial is to deploy Semgrep on our vulnerable Python app to detect vulnerable code. And guess what? It only takes a few minutes!
Detecting insecure code patterns
Let's go into our dvpwa repository and source the virtualenv
cd <your_path_to_dvpwa>/dwpva
source .venv/bin/activate
And then install semgrep using pip
pip install semgrep
And then run semgrep
semgrep --config "p/ci" --exclude .venv --error
You might ask yourself What the hell did I just write?, so let's explain a bit the simple options we used here:
--config "p/ci"
means "use the community-written security rules for running in a ci environment"--exclude .venv
means "do not search for vulnerable source code in the .venv folder" (otherwise it would return hundreds of alerts!)--error
means return a non-zero error code if alerts are found. Useful for making the CI fail if insecure patterns are detected
You then should see the following output:
Of course! dvpwa uses the md5 algorithm to hash passwords, which is known for being insecure! Semgrep even gives us advice on how to solve the problem.
Adding Semgrep to the CI/CD
Now that we discovered we were using vulnerable code, what about putting Semgrep inside our CI/CD to avoid ever doing that in the future?
Let's improve our Github Action from part 1 to also use Semgrep.
Open .github/workflows/main.yaml and add the following job:
code_analysis:
runs-on: ubuntu-latest
name: Analyse code for security flaws
steps:
- uses: actions/checkout@v2
- name: Code Security Analysis
run: pip3 install semgrep && semgrep --config "p/ci" --error
shell: bash
Your main.yaml file should look like this:
on: [push]
jobs:
dependency_analysis:
runs-on: ubuntu-latest
name: Test dependencies for security flaws
steps:
- uses: actions/checkout@v2
- name: Dependency Security
run: pip3 install safety && safety check
shell: bash
code_analysis:
runs-on: ubuntu-latest
name: Analyse code for security flaws
steps:
- uses: actions/checkout@v2
- name: Code Security Analysis
run: pip3 install semgrep && semgrep --config "p/ci" --error
shell: bash
Now, let's push our changes on the distant repository:
git add .github/workflows/main.yaml
git commit -m "Add static analysis security testing."
git push origin master
Which should create an action named "Analyse code for security flaws" in your Github Action panel
Of course, this action fails because dvpwa contains insecure code!
Conclusion
In only a few steps, we installed a tool that scans all our Python code to find insecure patterns, gives us recommendations on how to solve them, and integrates seamlessly into our CI/CD.
But the power of Semgrep goes far beyond: with it, you can write custom tests (as you can do with Escape to test your app's API security), create automated refactoring, and enforce complex coding patterns. For more details, check out their documentation.
In the next tutorial, we will have a look at dynamic analysis, aka programs that interact with your running app to find security flaws.