OSINT
Objectives: By the end of this topic, you will be able to…
- Collect publicly available information useful for a security assessment
- Use OSINT tools included in Kali Linux
- Recognize the legal and ethical limits of using these tools
What is OSINT
OSINT (Open Source Intelligence) refers to the process of collecting and analyzing publicly available information to obtain useful intelligence.
OSINT draws from accessible sources — search engines like Google, Bing, and Yandex; social media platforms such as LinkedIn, Facebook, and Twitter; public records including government databases and WHOIS registries; metadata embedded in documents and images; forums, blogs, and paste sites like Pastebin and Reddit; and exposed domains and web services. It does not require invasive techniques, and it is legal as long as it respects applicable privacy policies and terms of service. Security professionals use it in defensive contexts such as audits and threat hunting, while adversaries apply the same techniques for red teaming and criminal reconnaissance.
Importance of OSINT in cybersecurity
OSINT is the first phase of an attack chain (Kill Chain), specifically in the reconnaissance stage.
Passive vs active reconnaissance
Passive reconnaissance collects information without directly interacting with the target, making it stealthier and harder to detect; active reconnaissance involves direct interaction with the target’s infrastructure or services, which carries a higher risk of triggering alerts. Passive techniques carry low detection risk while yielding a large volume of information — enough to build a detailed profile of a person or organization before any direct contact is made.
Attackers leverage OSINT to select vulnerable targets and craft convincing social engineering lures, while security teams use the same techniques to identify unintentional public exposures — shadow IT, data leaks, or poor staff practices that an adversary could exploit first.
OSINT tools in Kali Linux
| Tool | Description |
|---|---|
theHarvester | Collects email addresses, usernames, hosts, and subdomains |
Maltego CE | Visualizes relationships between entities such as people, domains, IPs |
whois | Displays domain owner information |
dig | Queries DNS records for a domain |
dnsrecon | Automates DNS information gathering |
ExifTool | Extracts metadata from images and documents |
| Google Dorks | Advanced use of search engine operators to find sensitive data |
Although these tools automate processes, human analysis remains key for interpreting results.
OSINT techniques by target type
Gathering on people
Goal: obtain data to identify, locate, or profile a person.
Information sought: social media profiles, email addresses, resumes, photographs with metadata, forum participation.
Common techniques include advanced searches using Google Dorks, queries on services like Hunter.io and HaveIBeenPwned to surface email addresses and leaked credentials, analysis of image metadata with tools like ExifTool, and reverse image searches on Google or Yandex to trace a photograph back to its origin.
Gathering on infrastructure
Goal: understand an organization’s digital infrastructure and its public exposure.
Typical data: WHOIS information, DNS and subdomains, IP addresses, public emails, exposed services.
For infrastructure, analysts run WHOIS and DNS queries with dig and dnsrecon, enumerate email addresses and subdomains with theHarvester, map relationships visually in Maltego, and search for exposed services and devices using Shodan or Censys.
Ethical and legal considerations
Although OSINT is based on public sources, not everything accessible is legal to use. Posting something on social media does not imply consent for automated collection, and many platforms explicitly prohibit scraping in their terms of service. In professional audits, written authorization from the client must exist before gathering information about their environment. Throughout any investigation, the privacy and security of third parties must be respected — gathering information for harmful purposes crosses both legal and ethical lines.
In professional cybersecurity, ethics and legality are as important as technical knowledge.
Hands-on lab
Requirements: Kali Linux, internet access, Maltego CE
Part 1: Passive reconnaissance on a domain
Each pair will receive an assigned domain or identity.
- Use
whois,nslookup, anddigto profile the domain - Run
theHarvesterto search for emails, hosts, and social media:
theHarvester -d example.com -b google,bing- Optional: use
crt.sh,hunter.io,amass, or web OSINT tools
? What email addresses, subdomains, or hosts did
theHarvestersurface? Were any results unexpected or particularly sensitive? How could an attacker leverage this information against the target?
Part 2: Metadata analysis
Files (PDFs, images, DOCX) with embedded metadata will be provided.
- Analyze with
exiftooland online tools - Identify authorship, timestamps, software, location
- Extract coordinates and plot them on a map
? What does the metadata reveal about the document’s origin and history? Could any of these details — author name, software version, or GPS coordinates — be used to profile or target a specific individual?
Part 3: Visualization in Maltego CE
- Create entities and use basic transformations
- Identify non-obvious connections
? What relationships did Maltego surface that you would not have found through manual searching? What security risk do these connections represent for the organization?
- Export the graph image for inclusion in the report
Submission
Report per pair (max. 3 pages plus images):
- Techniques and tools used
- Main findings per section
- Maltego visualization
- Critical analysis of risks, ethics, and privacy
- Brief personal reflection (one per student)
Additional resources: OSINT Cheatsheet | Report Template
Key concepts
| Term | Definition |
|---|---|
| OSINT | Process of collecting and analyzing publicly available information |
| theHarvester | OSINT tool that collects emails, subdomains, and hosts |
| Google Dorks | Advanced search techniques for finding exposed information |
| Kill Chain | Model describing the phases of a cyberattack, starting with reconnaissance |
| Sniffing | Capture of traffic or information on a network or public source |