Cryptography
Objectives: By the end of this topic, you will be able to…
- Apply encryption and decryption techniques with available tools
- Verify file integrity using hashes
- Understand differences between symmetric and asymmetric encryption
- Use public and private keys in a practical and secure manner
What is cryptography?
Cryptography is the discipline that studies techniques to protect information, ensuring its confidentiality, integrity, authenticity, and non-repudiation, even when transmitted over insecure channels.
Through mathematical algorithms, cryptography allows data to be encrypted (making it unreadable to unauthorized parties), integrity to be verified (detecting any alteration in transit), identities to be authenticated, and documents or messages to be digitally signed. It is a fundamental pillar of modern cybersecurity, used in HTTPS, encrypted emails, digital signatures, cryptocurrencies, VPNs, and secure storage.
Classical cryptography (Caesar, Vigenere)
Classical methods serve as the foundation to understand substitution, transposition, and keys.
The Caesar Cipher replaces each letter with another shifted a fixed number of positions — with a shift of 3, A becomes D and B becomes E. Its simplicity is also its weakness: there are only 25 possible shifts, making it trivial to break by brute force.
The Vigenère Cipher improves on Caesar by using a keyword to define a different shift for each position in the plaintext, introducing the concept of a variable-length key. It is more resistant to brute force, but remains vulnerable to frequency analysis when the ciphertext is long enough relative to the key.
Both methods illustrate two important principles: confusion (making the relationship between plaintext and ciphertext difficult to see) and the role of the key as the element that controls encryption and decryption.
Symmetric encryption (AES)
Symmetric encryption uses the same secret key to encrypt and decrypt information. It is fast and efficient for large volumes of data.
AES (Advanced Encryption Standard) is a block cipher that processes data in 128-bit blocks and supports key sizes of 128, 192, or 256 bits. It is the modern standard that replaced DES, operating through multiple rounds of substitutions, permutations, column mixing, and key addition.
Common modes of operation: ECB (Electronic Codebook) is not recommended because identical plaintext blocks produce identical ciphertext blocks, revealing patterns. CBC (Cipher Block Chaining) is more secure — it XORs each block with the previous ciphertext block using a random initialization vector (IV), so identical plaintext produces different ciphertext. GCM (Galois/Counter Mode) goes further, providing both confidentiality and authenticated encryption in a single pass.
Typical uses: file encryption, secure communications (VPN, HTTPS), storage of sensitive data.
Asymmetric cryptography (RSA)
Asymmetric cryptography employs a key pair: one public (for encryption) and one private (for decryption). Based on hard mathematical problems like factoring large integers.
RSA (Rivest-Shamir-Adleman):
- Widely used asymmetric algorithm
- Security based on the difficulty of factoring large prime numbers
- Enables encryption, decryption, and digital signing
Basic operation: RSA generates a key pair — a public key (e, n) used for encryption and a private key (d, n) used for decryption. Encryption computes C = M^e mod n and decryption recovers the original message as M = C^d mod n.
Characteristics: slower than symmetric algorithms, not used for large files directly but to encrypt symmetric keys (as in TLS).
Common uses: establishing secure connections (SSL/TLS), secure key exchange, digital signature and authentication.
Hash functions (MD5, SHA-1, SHA-256)
A hash function takes an input of any length and produces a fixed-length output (hash or digest), representing the “fingerprint” of the original content.
A good hash function has four key properties. It must be deterministic — the same input always produces the same hash. It must be fast to compute, so it can be applied to large files or high-frequency operations efficiently. It must be collision resistant, making it computationally infeasible to find two different inputs that produce the same digest. And it must be preimage resistant, meaning the original message cannot be reconstructed from the hash alone.
The digest length is one of the most visible differences between algorithms. Running the same input through all three shows the output space growing with each generation:
import hashlib
data = b"hello"
print(hashlib.md5(data).hexdigest()) # 5d41402abc4b2a76b9719d911017c592
print(hashlib.sha1(data).hexdigest()) # aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d
print(hashlib.sha256(data).hexdigest()) # 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824MD5 produces 32 hex characters (128 bits), SHA-1 produces 40 (160 bits), and SHA-256 produces 64 (256 bits). The larger the output space, the harder it is to find two inputs that collide.
Common algorithms:
| Algorithm | Output | Status |
|---|---|---|
| MD5 | 128 bits | Obsolete, vulnerable to collisions |
| SHA-1 | 160 bits | Compromised |
| SHA-256 | 256 bits | Currently secure, widely used |
Typical uses: file integrity verification, digital signatures, password storage (with salts and key derivation like bcrypt/scrypt/argon2).
Applications
Integrity verification
When a file is created or released, a hash is computed and published alongside it. Any subsequent change to the file — even a single flipped bit — produces a completely different hash, immediately revealing tampering. This mechanism is widely used to verify software downloads, ISO images, and backups.
Digital signature
A digital signature combines asymmetric cryptography and hash functions. The sender hashes the document and encrypts that hash with their private key, producing the signature. The receiver decrypts the signature with the sender’s public key, recomputes the hash independently, and confirms they match — proving both that the content was not altered and that it came from the claimed sender.
Basic obfuscation
Lightweight encryption or hashing is sometimes used to hide sensitive strings in binaries, scripts, or logs — making it harder for automated tools or casual inspection to recognize credentials, keys, or command patterns. This technique is common in malware and in CTF challenges, and recognizing it is an important skill in malware analysis.
Hands-on lab
Requirements: Kali Linux,
openssl,gpg,sha256sum, Python 3
Part 1: Data integrity with hash functions
- Create a file with known content:
printf "This is a confidential document.\nAuthor: $(whoami)\nDate: $(date)\n" > document.txt
cat document.txt- Compute the SHA-256 hash and save it to a checksum file — this is how software distributors ship verified downloads:
sha256sum document.txt
sha256sum document.txt > document.txt.sha256
cat document.txt.sha256- Verify integrity using the checksum file:
sha256sum -c document.txt.sha256- Compare the three most common hash algorithms side by side. Note the different digest lengths:
md5sum document.txt
sha1sum document.txt
sha256sum document.txtThe outputs are 32 hex characters (128 bits) for MD5, 40 for SHA-1, and 64 for SHA-256. Longer digests mean a larger output space, making collisions exponentially harder to find.
- Demonstrate the avalanche effect: change a single character and recompute all three:
sed 's/confidential/Confidential/' document.txt > document_modified.txt
md5sum document.txt document_modified.txt
sha1sum document.txt document_modified.txt
sha256sum document.txt document_modified.txt? How significant was the change in the hash after modifying just one character? What property of hash functions does this demonstrate?
- Simulate a tampered download: append a line to the original file, then run checksum verification again:
echo "Injected malicious line." >> document.txt
sha256sum -c document.txt.sha256Record the exact error message. This is the detection mechanism that package managers like
aptuse to catch corrupted or tampered packages.
Part 2: Symmetric encryption with AES and openssl
- Create a file with highly repetitive, structured content — this will make the difference between CBC and ECB visible later:
python3 -c "print('SECRET: password=hunter2\n' * 20)" > secret.txt
cat secret.txt- Encrypt with AES-256-CBC. The
-pbkdf2flag uses a modern key derivation function to derive the encryption key from the password:
openssl enc -aes-256-cbc -salt -pbkdf2 -in secret.txt -out secret_cbc.enc- Inspect the encrypted file — it should be unreadable binary. Look at the raw bytes:
file secret_cbc.enc
xxd secret_cbc.enc | head -8- Encrypt the same file a second time with the same password and compare the two outputs:
openssl enc -aes-256-cbc -salt -pbkdf2 -in secret.txt -out secret_cbc2.enc
sha256sum secret_cbc.enc secret_cbc2.encAre the two CBC-encrypted files identical? Why or why not? What role does the random IV (initialization vector) play in each encryption run?
- Decrypt the first file and verify it is a perfect copy of the original:
openssl enc -aes-256-cbc -d -pbkdf2 -in secret_cbc.enc -out secret_decrypted.txt
sha256sum secret.txt secret_decrypted.txt
diff secret.txt secret_decrypted.txt-
Try decrypting with the wrong password and observe the result.
-
Now encrypt the same file using ECB mode and run the same experiment:
openssl enc -aes-256-ecb -salt -pbkdf2 -in secret.txt -out secret_ecb.enc
openssl enc -aes-256-ecb -salt -pbkdf2 -in secret.txt -out secret_ecb2.enc
sha256sum secret_ecb.enc secret_ecb2.enc- Use
xxdto examine both outputs side by side and look for repeating 16-byte block patterns:
xxd secret_cbc.enc | head -20
xxd secret_ecb.enc | head -20? What behavioral difference did you observe when switching from CBC to ECB mode? Why is ECB considered insecure for encrypting structured or repetitive data?
Part 3: Asymmetric encryption with GPG
- Generate a key pair. When prompted, choose RSA and RSA with a 4096-bit key size:
gpg --full-generate-key- List your keyring to confirm the key was created. Record the key ID and fingerprint:
gpg --list-keys
gpg --fingerprint "your name"- Export your public key to share with your partner:
gpg --export -a "your name" > yourname.pub
cat yourname.pub- Import your partner’s public key into your keyring:
gpg --import partnername.pub
gpg --list-keys- Verify the imported fingerprint by reading it out loud to your partner or comparing screens directly — not through the same channel used to share the file. This step prevents a man-in-the-middle attack where someone swaps the public key in transit:
gpg --fingerprint "partner's name"- Create a message, encrypt it for your partner, and sign it with your private key so they can confirm it came from you:
echo "This is a secret, authenticated message." > message.txt
gpg -se -r "partner's name" message.txtThis produces
message.txt.gpg— encrypted with your partner’s public key and signed with your private key.
- Decrypt and verify the signature of the message your partner sent you:
gpg -d message.txt.gpgGPG automatically verifies the signature and reports whether it is valid. A
Good signaturemessage means the content was not tampered with and came from the expected sender.
? What would happen if someone intercepted the encrypted message but did not have the recipient’s private key? What property of asymmetric encryption ensures the message remains confidential?
Part 4: Classical cryptography — Vigenère
Step 1: Implement the cipher
Write vigenere.py in Python with three functions: encrypt(plaintext, key), decrypt(ciphertext, key), and a main block that reads mode, text, and key from command-line arguments.
Algorithm:
- Normalize text and key to uppercase; ignore non-alphabetic characters in the key
- For each alphabetic character in the text, shift it by the corresponding key character’s value (
A=0, B=1, ..., Z=25), cycling through the key - Pass non-alphabetic characters through unchanged (spaces, punctuation stay in place)
Expected behavior:
$ python3 vigenere.py encrypt "Hello, World!" KEY
Rijvs, Ambpb!
$ python3 vigenere.py decrypt "Rijvs, Ambpb!" KEY
Hello, World!
Verify that decrypt(encrypt(text, key), key) returns the original text for at least 3 different keys and messages of varying length.
Step 2: Encrypt a long text for your partner
Choose a paragraph of at least 300 alphabetic characters of English text (a Wikipedia introduction, a news article excerpt, etc.). Choose a key of 4–8 letters that you keep secret from your partner.
python3 vigenere.py encrypt "$(cat plaintext.txt)" YOURKEY > ciphertext.txtExchange ciphertext.txt with your partner — but not the key.
Step 3: Implement the cracker
Write crack_vigenere.py to break your partner’s ciphertext. The attack works in two stages.
Stage 1 — Key length estimation with the Index of Coincidence (IoC)
The IoC measures how unevenly distributed the letters in a text are. For a string with N letters and letter counts f_i (for each of the 26 letters):
IoC = Σ f_i × (f_i − 1) / (N × (N − 1))
English plaintext has IoC ≈ 0.065. Random or well-mixed ciphertext has IoC ≈ 0.038. The key insight: if you split a Vigenère ciphertext into k groups by stride (group 0 = positions 0, k, 2k, …; group 1 = positions 1, k+1, 2k+1, …; etc.), and k matches the true key length, each group becomes a simple Caesar cipher — and its IoC will be close to 0.065.
For each candidate key length from 1 to 20:
- Split the ciphertext (letters only) into
kgroups by stride - Compute the IoC of each group using the formula above
- Average the IoCs across all
kgroups - Print the candidate length and its average IoC
The length whose average IoC is highest and closest to 0.065 is your best guess for the key length.
Stage 2 — Recover each key letter with frequency analysis
Once you have the key length k:
- Split the ciphertext into
kgroups using the same stride method - For each group, count how often each of the 26 letters appears
- Find the most frequent letter in the group. Assume it decrypts to E — the most common letter in English. The key letter for that position is then
(index_of_most_frequent − 4) % 26, where A=0, B=1, …, E=4, … - Build the full candidate key from all
krecovered letters
Decrypt the ciphertext with your candidate key using vigenere.py. If the output is readable English, you have broken the cipher. If a few words look wrong, try swapping one key letter at a time with the second or third most frequent letter in that group — short texts do not always have E as the most frequent letter in every column.
? At what key length does the IoC method start to require much longer ciphertexts to work reliably? What is the theoretical upper limit of Vigenère security, and what cipher design eventually solved it?
Submission
Compressed folder containing:
- Screenshots of each major step:
sha256sum -cverification, tampered-file detection,xxdinspection, AES encrypt/decrypt, GPG encrypt/decrypt document.txt,document_modified.txt, and both.sha256checksum filessecret_cbc.enc,secret_ecb.enc, andsecret_decrypted.txtwithsha256sumcomparison outputyourname.pub,message.txt.gpg, and the decryption output showing the signature verificationvigenere.pyandcrack_vigenere.pywith usage examples and sample outputplaintext.txt,ciphertext.txt(the one you sent your partner), and the decrypted result of your partner’s ciphertext- Short document (1–2 pages) explaining: what the avalanche effect showed, why the CBC outputs differed between runs, what the
Good signaturemessage in GPG means, and what key length your cracker recovered and how many columns needed a second-guess correction
Key concepts
| Term | Definition |
|---|---|
| AES | Standard symmetric encryption algorithm with 128-bit blocks |
| Symmetric encryption | System that uses the same key to encrypt and decrypt |
| Asymmetric encryption | System that uses a key pair: public and private |
| RSA | Asymmetric algorithm based on prime number factorization |
| SHA-256 | 256-bit hash function, currently secure and widely used |
| Hash | Function that converts data into a fixed-length string |
| GPG | Free implementation of OpenPGP for encryption and digital signatures |