Automation

Objectives: By the end of this topic, you will be able to…

  • Create scripts in Bash or Python to automate security tasks
  • Integrate Kali tools into functional scripts
  • Read, process, and filter results from other tools
  • Use scripting to support reconnaissance, analysis, and defense

Scripting as a force multiplier

Security professionals who script gain a qualitative advantage, not just a quantitative one. The difference is not merely “faster”: it is the ability to reason at a higher level of abstraction. A penetration tester who can script can test 10,000 hosts in the time it takes to test 10 manually, but more importantly, they can iterate on hypotheses rapidly: “Are there any hosts on this subnet responding to port 8080 with this specific HTTP header?” becomes a ten-line script rather than a morning’s work.

The security tool ecosystem is built on composability. Most command-line security tools follow the Unix philosophy: do one thing well, read from stdin, write to stdout. Your scripts are the glue that turns individual tools into multi-stage pipelines: nmap → grep → python enrichment → VirusTotal → report, workflows that would take hours to replicate manually.

This composability also means the gap between “running tools” and “building tools” is smaller in security than in most other domains. When you understand what a tool is doing at the network level, you can replicate it, extend it, or detect it. That dual perspective (building and defending) is what makes security scripting genuinely interesting for programmers.


How network scanners actually work

Before writing scripts that wrap nmap, it is worth understanding what nmap is doing at the socket level. This knowledge helps you interpret results correctly, write better wrappers, and understand why certain scans require root privileges.

TCP connection mechanics

A TCP connection follows a three-way handshake: the client sends a SYN, the server replies with SYN-ACK if the port is open, and the client completes with ACK. A closed port responds with RST immediately. A filtered port (blocked by a firewall) does not respond; the scanner infers this from timeout.

Client          Server
  |--- SYN ------->|   Port open:    server replies SYN-ACK
  |<-- SYN-ACK ----|
  |--- ACK ------->|

  |--- SYN ------->|   Port closed:  server replies RST immediately
  |<-- RST --------|

  |--- SYN ------->|   Port filtered: no response, scanner times out
  (timeout)

nmap’s default SYN scan (-sS) exploits this by sending a SYN and reading the response, then immediately sending RST rather than completing the handshake. Because no full connection is established, the target application layer never logs it; only the kernel-level firewall sees the packet. This requires root privileges because crafting raw IP packets bypasses the OS socket API.

Building a minimal port scanner

With Python’s socket module, you can build a full TCP connect scanner (which completes the handshake, and is therefore logged):

import socket
 
def scan_port(host: str, port: int, timeout: float = 1.0) -> bool:
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.settimeout(timeout)
        try:
            s.connect((host, port))
            return True
        except (socket.timeout, ConnectionRefusedError, OSError):
            return False
 
host = "192.168.1.1"
open_ports = [p for p in range(1, 1025) if scan_port(host, p)]
print(f"Open ports: {open_ports}")

This works, but sequential scanning at 1-second timeout means 1024 ports takes up to 17 minutes in the worst case. Understanding this limitation leads directly to concurrency.


Concurrency: from slow to fast

The single biggest performance gain in network scripting comes from concurrency. Network I/O is dominated by waiting: while your script waits for a TCP response, it could be sending dozens of other probes simultaneously. There are two practical concurrency models in Python for this use case.

Threading for I/O-bound scanning

Python’s threading module is appropriate for I/O-bound tasks. The Global Interpreter Lock (GIL) prevents true CPU parallelism, but since network scanning spends most time waiting for responses (not computing), threading provides dramatic speedups. The concurrent.futures.ThreadPoolExecutor makes this ergonomic:

from concurrent.futures import ThreadPoolExecutor
import socket
 
def scan_port(args: tuple[str, int]) -> int | None:
    host, port = args
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.settimeout(0.5)
        try:
            s.connect((host, port))
            return port
        except (socket.timeout, ConnectionRefusedError, OSError):
            return None
 
host = "192.168.1.1"
with ThreadPoolExecutor(max_workers=500) as executor:
    results = executor.map(scan_port, [(host, p) for p in range(1, 10001)])
 
open_ports = sorted(p for p in results if p is not None)
print(f"Open ports: {open_ports}")

500 concurrent connections reduces the worst-case 10,000-port scan from hours to under a minute.

asyncio for high-concurrency scanning

asyncio is more efficient than threading at very high concurrency because it uses a single thread with a cooperative event loop rather than OS threads, each of which carries memory and scheduling overhead. This matters when scanning thousands of hosts:

import asyncio
 
async def scan_port(host: str, port: int) -> int | None:
    try:
        _, writer = await asyncio.wait_for(
            asyncio.open_connection(host, port), timeout=0.5
        )
        writer.close()
        await writer.wait_closed()
        return port
    except (asyncio.TimeoutError, ConnectionRefusedError, OSError):
        return None
 
async def scan_host(host: str, ports: range) -> list[int]:
    tasks = [scan_port(host, p) for p in ports]
    results = await asyncio.gather(*tasks)
    return sorted(p for p in results if p is not None)
 
open_ports = asyncio.run(scan_host("192.168.1.1", range(1, 10001)))

Rate limiting: stealth and target health

Blasting 10,000 concurrent connections is fast but noisy. Intrusion detection systems (IDS) flag high-rate connection attempts trivially: a sudden spike of SYNs from one source is one of the oldest detection signatures in existence. Aggressive scanning can also crash fragile embedded devices (routers, printers, SCADA systems). An asyncio.Semaphore caps concurrency without changing the overall structure:

async def scan_host_limited(host: str, ports: range, max_concurrent: int = 100) -> list[int]:
    semaphore = asyncio.Semaphore(max_concurrent)
 
    async def limited_scan(port: int) -> int | None:
        async with semaphore:
            await asyncio.sleep(0)  # yield to event loop between acquisitions
            return await scan_port(host, port)
 
    results = await asyncio.gather(*[limited_scan(p) for p in ports])
    return sorted(p for p in results if p is not None)

Adding random delays between probes (asyncio.sleep(random.uniform(0, 0.1))) further mimics natural traffic patterns. This is also how defenders recognize automated tools: no human types that fast and that regularly.


Parsing structured security data

Security tools produce output in many formats. Reading it with your eyes is the lowest-leverage approach. Parsing it programmatically turns raw output into data you can filter, sort, join with other sources, and feed into reports.

nmap XML output

nmap’s -oX flag produces structured XML that is far easier to parse than its human-readable output format. The -oX - variant writes XML to stdout, which you capture directly:

import subprocess
import xml.etree.ElementTree as ET
 
def nmap_scan(target: str) -> dict[str, list[int]]:
    result = subprocess.run(
        ["nmap", "-p-", "--open", "-oX", "-", target],
        capture_output=True, text=True, check=True
    )
    root = ET.fromstring(result.stdout)
    hosts = {}
    for host in root.findall("host"):
        addr = host.find("address").get("addr")
        ports = [
            int(p.get("portid"))
            for p in host.findall(".//port")
            if p.find("state").get("state") == "open"
        ]
        if ports:
            hosts[addr] = ports
    return hosts

This pattern (drive a tool via subprocess, parse its structured output format) applies to dozens of security tools: gobuster has a JSON output mode, nuclei outputs JSONL, masscan writes grepable or XML output.

Log mining with regular expressions

Web server and authentication logs follow predictable formats. Regular expressions extract structured fields from lines of plain text, turning log files into queryable datasets:

import re
from collections import Counter
from pathlib import Path
 
LOG_PATTERN = re.compile(
    r'(?P<ip>\S+) \S+ \S+ \[.+?\] "(?P<method>\w+) (?P<path>\S+) \S+" '
    r'(?P<status>\d+) (?P<size>\S+)'
)
 
ATTACK_PATTERNS = re.compile(
    r"(union.*select|insert.*into|drop\s+table"  # SQL injection
    r"|\.\./|\.\.\\\\)"                           # path traversal
    r"|<script"                                    # XSS
    r"|cmd=|exec=|shell=",                         # command injection
    re.IGNORECASE,
)
 
def analyze_access_log(path: str) -> dict:
    ip_counts = Counter()
    status_counts = Counter()
    suspicious = []
 
    for line in Path(path).open():
        m = LOG_PATTERN.match(line)
        if not m:
            continue
        ip, status, req_path = m.group("ip"), m.group("status"), m.group("path")
        ip_counts[ip] += 1
        status_counts[status] += 1
        if ATTACK_PATTERNS.search(req_path):
            suspicious.append({"ip": ip, "path": req_path, "status": status})
 
    return {
        "top_ips": ip_counts.most_common(10),
        "status_distribution": dict(status_counts),
        "suspicious_requests": suspicious,
    }

The for line in Path(path).open() pattern streams the file line by line; it processes multi-gigabyte logs without loading them into memory.

Detecting brute-force in authentication logs

from collections import defaultdict
 
AUTH_FAIL = re.compile(r"Failed password for \S+ from (\d+\.\d+\.\d+\.\d+)")
 
def detect_brute_force(log_path: str, threshold: int = 10) -> list[str]:
    attempts: dict[str, int] = defaultdict(int)
    for line in Path(log_path).open():
        if m := AUTH_FAIL.search(line):
            attempts[m.group(1)] += 1
    return [ip for ip, count in attempts.items() if count >= threshold]

This is the core logic of fail2ban, which is written in Python and does exactly this.


Threat intelligence APIs

Public threat intelligence services expose APIs that let you enrich raw findings with context: whether an IP is known malicious, whether a file hash appears in malware databases, whether a domain is associated with phishing campaigns.

VirusTotal file hash lookup

import os
import hashlib
import requests
from pathlib import Path
 
VT_API_KEY = os.environ["VIRUSTOTAL_API_KEY"]  # Never hardcode API keys
 
def check_file(filepath: str) -> dict:
    sha256 = hashlib.sha256(Path(filepath).read_bytes()).hexdigest()
    resp = requests.get(
        f"https://www.virustotal.com/api/v3/files/{sha256}",
        headers={"x-apikey": VT_API_KEY},
    )
    if resp.status_code == 404:
        return {"hash": sha256, "known": False}
    stats = resp.json()["data"]["attributes"]["last_analysis_stats"]
    return {
        "hash": sha256,
        "known": True,
        "malicious": stats["malicious"],
        "suspicious": stats["suspicious"],
        "total_engines": sum(stats.values()),
    }

Shodan IP enrichment

Shodan is a search engine for internet-exposed services. Its API returns what services are running on a given IP and what CVEs are associated with them, without sending a single packet to the target:

import shodan
 
api = shodan.Shodan(os.environ["SHODAN_API_KEY"])
 
def enrich_ip(ip: str) -> dict:
    try:
        host = api.host(ip)
        return {
            "ip": ip,
            "org": host.get("org"),
            "country": host.get("country_name"),
            "ports": host.get("ports", []),
            "cves": list(host.get("vulns", [])),
            "hostnames": host.get("hostnames", []),
        }
    except shodan.APIError as e:
        return {"ip": ip, "error": str(e)}

Building a scan-and-enrich pipeline

These pieces compose naturally into a multi-stage pipeline:

import time
 
def scan_and_enrich(target: str) -> list[dict]:
    scan_results = nmap_scan(target)         # {ip: [ports]}
    enriched = []
    for ip, ports in scan_results.items():
        entry = {"ip": ip, "open_ports": ports}
        entry.update(enrich_ip(ip))
        enriched.append(entry)
        time.sleep(1)                         # Respect Shodan's rate limit
    return enriched

API rate limits are not optional. VirusTotal’s free tier allows 4 requests per minute; Shodan allows 1 per second. Exceeding these limits gets your key blocked. In production pipelines, use a token-bucket rate limiter or cache results locally to avoid redundant API calls.


Statistical anomaly detection

Beyond signature-based detection (looking for known-bad patterns), statistical anomaly detection identifies deviations from a baseline without knowing in advance what “malicious” looks like. This is how modern SIEMs work at their core.

The 3-sigma rule (empirical rule) states that for normally distributed data, ~99.7% of observations fall within three standard deviations of the mean. Anything outside that window is statistically unusual and warrants investigation:

import re
import statistics
from collections import Counter
from pathlib import Path
 
def detect_traffic_spikes(log_path: str) -> list[str]:
    hourly = Counter()
    for line in Path(log_path).open():
        if m := re.search(r'\[(\d{2}/\w+/\d{4}):(\d{2}):', line):
            hourly[f"{m.group(1)}-{m.group(2)}"] += 1
 
    counts = list(hourly.values())
    if len(counts) < 2:
        return []
    mean = statistics.mean(counts)
    stdev = statistics.stdev(counts)
    threshold = mean + 3 * stdev
 
    return [
        f"{hour}: {count} requests ({(count - mean) / stdev:.1f}σ above mean)"
        for hour, count in hourly.items()
        if count > threshold
    ]

The interesting property of this approach is that it adapts to the baseline: a server that normally gets 10,000 requests per hour will not alert on 10,500; a server that normally gets 100 will. The threshold is relative, not absolute.


OPSEC for scripts

Scripts that interact with sensitive systems have operational security requirements that general-purpose programming often ignores.

Credential management

Never hardcode credentials. Not in the script, not in a config file committed to version control. Hardcoded secrets in git repositories are a common, catastrophic vulnerability: GitHub and GitLab continuously scan public repositories for API keys, and leaked secrets are weaponized within minutes of exposure.

The safe patterns, in order of preference:

  1. Environment variables — load with os.environ["KEY"]. The script fails loudly (raises KeyError) if the variable is missing, which is correct: better to crash than to run without credentials and fail silently.
  2. .env files with python-dotenv — never commit .env to git. Add it to .gitignore before you create it.
  3. OS keyring — the keyring library stores secrets in GNOME Keyring, macOS Keychain, or Windows Credential Manager. No files on disk, no risk of committing it.
  4. Secrets managers — HashiCorp Vault, AWS Secrets Manager, etc. for production systems.
import os
from dotenv import load_dotenv
 
load_dotenv()  # Reads .env if present; never overrides already-set env vars
 
VT_KEY = os.environ["VIRUSTOTAL_API_KEY"]   # Fails loudly if missing
SHODAN_KEY = os.environ["SHODAN_API_KEY"]

Audit logging

Security scripts should log what they do, not just what they find. In a penetration test engagement, you may need to prove what your script did if a system becomes unavailable during the test. Timestamped logs are your defense:

import logging
from datetime import datetime
 
logging.basicConfig(
    filename=f"scan_{datetime.now():%Y%m%d_%H%M%S}.log",
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
)
 
logging.info(f"Starting scan of {target} (authorized by: {engagement_id})")
# ... perform scan ...
logging.info(f"Scan complete: {len(open_ports)} open ports found on {target}")

Logging is also essential for debugging: a script that runs for 30 minutes and produces no output gives you nothing to work with when it fails.


Best practices

Scope and authorization first. A script that scans networks you do not have permission to scan is illegal, regardless of intent. Verify that your target is within the authorized scope before executing any automated tool. Many pentest agreements specify exactly which IP ranges and time windows are authorized.

Fail loudly. Scripts that swallow errors silently are dangerous in security contexts. An error during scanning means your results may be incomplete or wrong, and incomplete results reported as complete is worse than no results. Use check=True with subprocess, re-raise exceptions you cannot handle, and exit with a non-zero code on failure.

Parameterize everything. Hardcoded IPs, port ranges, and file paths make scripts single-use and brittle. Use argparse from the start:

import argparse
 
parser = argparse.ArgumentParser(description="Network scanner with enrichment")
parser.add_argument("target", help="IP address or CIDR range to scan")
parser.add_argument("-p", "--ports", default="1-1024", help="Port range (default: 1-1024)")
parser.add_argument("-o", "--output", default="results.json", help="Output file")
parser.add_argument("--rate", type=int, default=100, help="Max concurrent connections")
parser.add_argument("--enrich", action="store_true", help="Enrich results via Shodan")
args = parser.parse_args()

Minimal privilege. TCP connect scans work without root; SYN scans require it. Understand what privileges your script actually needs and run with the minimum. Never instruct users to sudo your script unless the specific operation requires it.

Idempotency. Design scripts to be safe to re-run. Write output files with timestamped names. Check whether work is already done before repeating it. A scan that ran overnight and crashed at 95% should be resumable, not re-startable from zero.


Hands-on lab

Requirements: Kali Linux, Python 3.10+, nmap, whois, dig, curl

The lab is structured as a pipeline: each part produces output that the next part consumes. By the end you will have a complete, multi-stage reconnaissance and analysis tool.


Part 1: From sequential to concurrent scanner

Start with this working sequential scanner:

import socket
import time
 
def scan_port(host: str, port: int, timeout: float = 1.0) -> bool:
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.settimeout(timeout)
        try:
            s.connect((host, port))
            return True
        except (socket.timeout, ConnectionRefusedError, OSError):
            return False
 
if __name__ == "__main__":
    host = "127.0.0.1"
    start = time.perf_counter()
    open_ports = [p for p in range(1, 1025) if scan_port(host, p)]
    elapsed = time.perf_counter() - start
    print(f"Open ports: {open_ports} ({elapsed:.1f}s)")

A. Run this against 127.0.0.1 (localhost) and record the elapsed time. Then rewrite it using ThreadPoolExecutor. Test max_workers values of 50, 200, and 500 and record the elapsed time for each. Plot or tabulate the results.

B. Rewrite again using asyncio + asyncio.Semaphore to cap concurrency at a configurable limit. Measure its performance at the same concurrency levels and compare against the threaded version.

C. Add a proper CLI to whichever version you prefer, using argparse:

FlagDescriptionDefault
targetIP address to scan(required)
--portsPort range, e.g. 1-1024 or 22,80,4431-1024
--rateMax concurrent connections200
--timeoutPer-port timeout in seconds0.5
--outputJSON output filestdout

Parse the --ports argument to support both comma-separated lists (22,80,443) and ranges (1-1024).

D. Output results as JSON to the path given by --output:

{
  "target": "127.0.0.1",
  "scan_time_seconds": 2.3,
  "timestamp": "2026-05-04T14:32:00",
  "open_ports": [22, 80, 443]
}

Question

At very high concurrency (e.g. --rate 2000), you may observe false negatives (open ports reported as closed). Explain the mechanism behind this. Why does this mean “the scanner did not detect it” is not the same claim as “the port is closed”? What does this imply about how you should interpret scan results from any tool, including nmap?


Part 2: Structured output and enrichment

A. Run nmap against the lab network with service version detection and XML output:

nmap -sV --open -oX scan.xml 192.168.1.0/24

Before writing any code, open scan.xml and read through the structure. Identify how hosts, ports, states, service names, and version strings are organized in the XML hierarchy.

B. Write parse_scan.py that reads scan.xml and produces a structured Python object (dict) for each live host:

{
    "ip": "192.168.1.10",
    "hostname": "gateway.local",
    "open_ports": [
        {"port": 22, "service": "ssh", "version": "OpenSSH 8.9"},
        {"port": 80, "service": "http", "version": "Apache httpd 2.4.54"}
    ]
}

Use xml.etree.ElementTree — do not use a third-party nmap library.

C. For each host with port 22 open, run ssh-keyscan via subprocess and parse the output to extract the key type (e.g. ecdsa-sha2-nistp256, ssh-ed25519). Add a "ssh_host_key_type" field to that host’s dict. Handle the case where ssh-keyscan times out or the host does not respond.

D. Write the enriched list of hosts to hosts.json. Your script should accept --input (XML file) and --output (JSON file) arguments.

Question

Service version detection (-sV) works by sending probe packets and matching responses against a signature database (nmap-service-probes). This makes the scan significantly slower and generates more network traffic than a plain port scan. From a defender’s perspective: why is the version string in a service banner valuable intelligence for an attacker? What is the security-relevant difference between Apache httpd 2.4.54 and a server that returns no version string at all?


Part 3: Log analysis and anomaly detection

This part uses log files for analysis. Either use the real logs on your Kali system (/var/log/auth.log, if sshd is running) or generate a synthetic one with the script below. Save it as auth.log and access.log respectively.

Generate a synthetic auth log for testing:

python3 - <<'EOF'
import random, datetime
 
ips = ["10.0.0.1", "10.0.0.2", "185.220.101.5", "192.168.1.50", "45.33.32.156"]
users = ["root", "admin", "ubuntu", "daniel"]
now = datetime.datetime.now()
 
with open("auth.log", "w") as f:
    for _ in range(500):
        ip = random.choices(ips, weights=[2, 2, 40, 1, 30])[0]
        user = random.choice(users)
        ts = (now - datetime.timedelta(seconds=random.randint(0, 86400))).strftime("%b %d %H:%M:%S")
        f.write(f"{ts} kali sshd[1234]: Failed password for {user} from {ip} port {random.randint(40000,60000)} ssh2\n")
    # Add some successful logins
    for _ in range(20):
        ts = (now - datetime.timedelta(seconds=random.randint(0, 86400))).strftime("%b %d %H:%M:%S")
        f.write(f"{ts} kali sshd[1234]: Accepted publickey for daniel from 192.168.1.1 port {random.randint(40000,60000)} ssh2\n")
EOF

A. Write auth_analysis.py that reads auth.log and produces:

  • A list of IPs that have more than 10 failed login attempts, sorted descending by attempt count
  • The list of user accounts being targeted (which usernames are attackers guessing?)
  • The ratio of failed to successful logins overall

B. Generate a synthetic web access log (or use a real one if available):

python3 - <<'EOF'
import random, datetime
 
ips = ["10.0.0.1", "185.220.101.5", "45.33.32.156", "66.249.66.1", "192.168.1.50"]
normal_paths = ["/", "/index.html", "/about", "/contact", "/static/main.css"]
attack_paths = [
    "/?id=1' UNION SELECT 1,2,3--",
    "/admin/../../../etc/passwd",
    "/search?q=<script>alert(1)</script>",
    "/wp-admin/",
    "/cgi-bin/test.cgi?cmd=id",
]
now = datetime.datetime.now()
 
with open("access.log", "w") as f:
    for hour in range(24):
        count = random.randint(80, 120)
        if hour == 3:  # Anomalous spike
            count = 950
        for _ in range(count):
            ip = random.choices(ips, weights=[30, 5, 5, 10, 3])[0]
            path = random.choices(normal_paths + attack_paths, weights=[20]*5 + [1]*5)[0]
            status = 200 if path in normal_paths else random.choice([200, 403, 500])
            ts = (now.replace(hour=hour, minute=random.randint(0,59))).strftime("%d/%b/%Y:%H:%M:%S +0000")
            f.write(f'{ip} - - [{ts}] "GET {path} HTTP/1.1" {status} {random.randint(200,5000)}\n')
EOF

C. Write log_analysis.py that reads access.log and produces:

  • All requests matching SQL injection, path traversal, XSS, or command injection patterns (use the regex from the theory section as a starting point, but extend it)
  • Top 5 IPs by request volume
  • HTTP status code distribution

D. Implement the 3-sigma anomaly detector on hourly request counts. Your output should identify anomalous hours and report their z-score:

[ANOMALY] 03:00 — 950 requests (z=4.2σ, threshold=3.0σ)

E. Combine the outputs of A–D into a single report.md file with sections for each finding category.

Question

The 3-sigma rule assumes data is approximately normally distributed. Web server traffic often has strong daily periodicity (high during business hours, low overnight). How does this periodicity affect the validity of a single global baseline? Describe a modified approach that would produce fewer false positives on a server with predictable daily traffic cycles.


Part 4: Integrated reconnaissance tool

Build recon.py — a single-file command-line tool that performs multi-stage reconnaissance on a domain or IP address and produces a structured report.

Requirements:

  1. CLI interface via argparse:

    • target: domain name or IP address (required)
    • --mode: domain or ip, auto-detected if omitted
    • --output: directory for output files (default: ./recon_<target>_<timestamp>/)
    • --verbose: print progress to stderr as the tool runs
  2. Audit log — every action the tool takes must be logged to audit.log inside the output directory with a timestamp. This includes: what command was run, what it returned (success/error), and the timestamp. This is non-negotiable.

  3. Domain mode — when target is a domain name:

    • whois: extract registrar, registration date, expiry date, registrant organization
    • dig: query and save A, MX, NS, and TXT records
    • curl -I: extract HTTP response headers (server, X-Powered-By, Content-Security-Policy, Strict-Transport-Security)
    • Each step runs independently; a failure in one must not stop the others
  4. IP mode — when target is an IP address:

    • nmap -sV --open -oX: scan common ports (use the --top-ports 100 flag to keep it fast), parse the XML output as in Part 2
    • Reverse DNS lookup via dig -x
    • whois on the IP to extract the owning organization and country
  5. Structured output — write results.json with all findings in a single dict keyed by tool name.

  6. Markdown report — generate report.md from results.json. The report must include: a summary table of findings, open ports (if IP mode), DNS records (if domain mode), and any notable security headers that are missing (CSP, HSTS, X-Frame-Options).

Grading criteria:

CriterionWeight
Audit log present and accurate20%
Each step fails independently without crashing20%
results.json contains structured data (not raw text)25%
report.md is readable and complete20%
Argparse works correctly for both modes15%

Question

Your tool performs active reconnaissance: it sends packets to the target. Shodan performs passive reconnaissance: it has already scanned the internet, and you query its database without touching the target at all. From the perspective of an attacker, what are the operational differences between these two approaches? From the perspective of a defender with network monitoring in place, which approach is harder to detect, and why? In what scenarios would each be more appropriate?


Submission

A link to a public GitHub repository with the following structure:

├── scanner.py          # Part 1: concurrent port scanner
├── parse_scan.py       # Part 2: nmap XML parser and enricher
├── auth_analysis.py    # Part 3: auth log analysis
├── log_analysis.py     # Part 3: web log analysis + anomaly detection
├── recon.py            # Part 4: integrated tool
├── sample_output/      # One run of recon.py on a target of your choice
│   ├── results.json
│   ├── report.md
│   └── audit.log
└── README.md           # Setup instructions and per-script explanation

The README.md must include: Python version and dependencies (pip install command), how to run each script with an example command, and a brief explanation of what each script does and why the design choices you made are appropriate.


Key concepts

TermDefinition
SYN scanPort scan that sends SYN and reads the response without completing the TCP handshake; requires root privileges
TCP three-way handshakeSYN → SYN-ACK → ACK sequence that establishes a TCP connection; the basis of port state detection
ConcurrencyExecuting multiple tasks that overlap in time; essential for performant network scanning
asyncioPython’s built-in async I/O framework; uses a single-threaded event loop for high-concurrency network tasks
ThreadPoolExecutorPython’s managed thread pool for I/O-bound parallel tasks
Rate limitingCapping the speed of automated requests to avoid detection and avoid harming target systems
IDSIntrusion Detection System; network monitoring tool that flags anomalous traffic patterns like port scans
Threat intelligenceStructured data about known malicious actors, IPs, domains, and file hashes from external sources
ShodanSearch engine for internet-exposed services; provides passive reconnaissance without touching the target
VirusTotalMulti-engine malware analysis service accessible via API for file hash and URL reputation checks
3-sigma ruleStatistical rule stating ~99.7% of normal data falls within 3 standard deviations; used to detect anomalies
Audit logTimestamped record of actions a script performed; required in professional penetration testing engagements
subprocessPython module for spawning and interacting with external processes and capturing their output
argparsePython module for building CLI argument parsers; makes scripts configurable and reusable
OPSECOperational Security; practices that protect sensitive information and avoid exposing credentials or methods

Navigation:Previous | Home | Next