Advanced Pentest & Offensive

OSINT — passive information gathering: Shodan, LinkedIn, WHOIS, Google dorks

OSINT (Open Source Intelligence) is the collection of information from publicly available sources. In pentesting, it is the safest phase: you learn about the target without sending a single packet against it.

WHOIS and DNS records

# Domain information
whois example.com

# Subdomain resolution
host -t mx example.com
dig example.com ANY
dig axfr @ns1.example.com example.com   # attempt zone transfer

# Dictionary-based subdomain enumeration
subfinder -d example.com
amass enum -d example.com

Relevant data: registrant, emails, name servers, associated IPs, expiration dates.

TLS certificates — crt.sh

SSL certificates are public and reveal subdomains:

# Via browser
https://crt.sh/?q=%25.example.com

# Via curl (JSON)
curl -s "https://crt.sh/?q=%25.example.com&output=json" \
  | jq '.[].name_value' | sort -u

Typical output:
  "api.example.com"
  "admin.example.com"
  "staging.example.com"
  "vpn.example.com"

Shodan — internet-exposed assets

Shodan indexes service banners across the entire internet:

Useful dorks:
  org:"Example Corp"              → all IPs owned by organization
  hostname:example.com            → subdomains with running services
  ssl.cert.subject.cn:example.com → filter by TLS certificate
  port:22 org:"Example Corp"      → exposed SSH
  http.title:"GitLab"            → public GitLab instances
  product:"Apache httpd" version:"2.4.49"  → specific vulnerable version

CLI:
  shodan search --fields ip_str,port,org "hostname:example.com"

Google Dorks

Advanced search operators to find sensitive files and pages:

Configuration files:
  site:example.com filetype:env
  site:example.com filetype:xml "password"
  site:example.com filetype:sql

Admin panels:
  site:example.com inurl:admin
  site:example.com intitle:"phpMyAdmin"
  site:example.com inurl:wp-admin

Exposed information:
  site:example.com "Index of /"
  site:example.com ext:log
  site:example.com "DB_PASSWORD"

Cached and older versions:
  cache:example.com/admin

LinkedIn and human sources

What to look for:
  - Technology stack (job postings reveal versions in use)
    Example: "Node.js 14 + Kubernetes 1.21 + AWS RDS position"
  - IT staff names → social engineering targets
  - Vendors and partners → supply chain attack surface
  - Recently laid-off employees → potential insider risk

Tools:
  theHarvester -d example.com -b linkedin
  hunter.io → corporate emails by domain

Pastebin and leak sites

Search for leaked credentials:
  site:pastebin.com "example.com"
  site:github.com "example.com" "password"

Specialized services:
  haveibeenpwned.com → emails found in known breaches
  dehashed.com       → search across multiple dumps
  intelx.io          → pastes, dark web, emails

Metadata analysis

Public documents (PDF, DOCX, XLSX) contain metadata:

exiftool document.pdf

Reveals:
  Author: john.smith
  Creator: Microsoft Word 2016
  Producer: GPL Ghostscript 9.18
  ModifyDate: 2024:03:15 14:22:10
  Company: Example Corp

→ internal usernames, software versions, working hours

Automation with recon-ng and theHarvester

theHarvester -d example.com -b google,bing,linkedin,shodan -l 200

recon-ng:
  marketplace install all
  modules load recon/domains-hosts/hackertarget
  options set SOURCE example.com
  run

Solid OSINT saves hours of active scanning — and sometimes reveals the entry vector before any active tool does.