☀️ 🌙

🏗️ Platform Architecture

DNS Science is built on a distributed, scalable architecture designed for real-time domain intelligence. Our platform processes millions of DNS records daily through a network of specialized daemons and verification systems.

Infrastructure Overview

1.1M+ Domains Tracked
17+ Active Services
16 Celery Workers
$239M+ Domains Valued

Technology Stack

Python 3.11+ Flask Celery PostgreSQL (RDS) Redis Apache + WSGI AWS EC2 DNSPython Systemd S3

Architecture Layers

🌐 Presentation Layer

Flask-based web application serving dynamic content with real-time updates. Built with responsive design principles and progressive enhancement.

🔧 Application Layer

RESTful API endpoints handling domain lookups, dark web monitoring, RDAP queries, and comprehensive DNS analysis.

⚙️ Service Layer

22 specialized daemons running continuously to discover, enrich, and monitor domain data from multiple sources.

💾 Data Layer

PostgreSQL database with Redis caching for high-performance queries. Optimized indexes and materialized views for analytics.

Daemon Architecture

Our platform runs 22 specialized daemons, each focused on a specific aspect of domain intelligence:

Data Discovery & Enrichment

domain_discovery_daemon.py Discovers domains from Certificate Transparency logs and public DNS datasets. Currently monitoring 1.1M+ domains.
enrichment_daemon.py Enriches domain records with WHOIS, geolocation, and metadata.
rdap_daemon.py Queries RDAP servers for authoritative registration data.
gtld_daemon.py Monitors generic top-level domain (gTLD) zones and changes.

Security & Threat Intelligence

ssl_monitord.py Monitors SSL/TLS certificates for expiration and anomalies.
threat_intel_daemon.py Integrates with threat intelligence feeds (AbuseIPDB, IPInfo).
reputationd.py Calculates domain reputation scores based on multiple signals.
arpad_daemon.py Advanced RPAD (Reputation & Phishing Analysis Daemon) for detecting malicious domains.

Network Security Integration

suricata_integration_daemon.py Integrates with Suricata IDS for network threat detection.
zeek_integration_daemon.py Processes Zeek network monitoring logs for DNS analysis.

Specialized Monitoring

web3d.py Monitors blockchain DNS systems (ENS, Handshake, Namecoin).
geoipd.py Real-time geolocation lookups for IP addresses.
emaild.py Email validation and deliverability scoring.
recordtyped.py Analyzes DNS record types and configurations.

User Management

auto_renewal_daemon.py Handles automatic subscription renewals and billing.
domain_expiry_daemon.py Monitors domain expirations and sends alerts.
email_scheduler_daemon.py Schedules and sends email notifications and reports.

Client Network DNS Monitoring

Our advanced DNS monitoring capabilities allow us to monitor and debug clients' internal network DNS problems, monitor for traffic trends, analyze attack data, and much more:

Internal DNS Debugging Monitor recursive DNS queries, resolution failures, and misconfigurations in client networks. Identify split-horizon DNS issues and internal resolution problems.
Traffic Pattern Analysis Track DNS query volumes, identify unusual traffic spikes, and detect traffic anomalies that may indicate DDoS attacks or DNS tunneling.
Attack Detection & Analysis Real-time detection of DNS-based attacks including cache poisoning attempts, amplification attacks, and DNS exfiltration. Comprehensive attack data logging and forensics.
Performance Monitoring Monitor DNS resolution times, identify slow resolvers, track query success rates, and analyze propagation delays across different geographic regions.

Data Flow & Processing Pipeline

  1. Discovery: Certificate Transparency logs, zone files, passive DNS
  2. Validation: DNS resolution, RDAP lookup, WHOIS query
  3. Enrichment: GeoIP, SSL cert, threat intel, reputation scoring
  4. Storage: PostgreSQL with Redis caching layer
  5. API Exposure: RESTful endpoints with rate limiting
  6. Real-time Updates: WebSocket connections for live data

Verification Process

Every domain in our database goes through a multi-stage verification process:

  1. DNS Resolution Verification - Confirm domain resolves via authoritative nameservers
  2. RDAP Validation - Query regional internet registries for registration data
  3. WHOIS Cross-Reference - Compare RDAP data with WHOIS records for consistency
  4. Certificate Transparency - Check for SSL certificates in CT logs
  5. Historical Analysis - Compare against historical DNS data for anomalies
  6. Reputation Scoring - Calculate score based on age, registrar, hosting, and threat intel

Dark Web Monitoring Architecture

Our dark web monitoring system provides passive intelligence on Tor hidden services, I2P networks, and blockchain DNS:

Tor Network Monitoring

Tracking 1,160 active Tor exit nodes updated hourly from Tor Project APIs. Database of known .onion addresses mapped to clearnet domains.

Blockchain DNS

Monitoring ENS (Ethereum Name Service), Handshake (.hns), and Namecoin (.bit) domains for alternative DNS registrations.

Certificate Analysis

Analyzing SSL certificates for hidden services, identifying anomalies and self-signed certs indicative of dark web infrastructure.

Passive Intelligence

100% passive monitoring - no active crawling. All data from public sources, CT logs, and community-verified mappings.

Database Schema

Dark web monitoring utilizes 10 specialized tables:

Distributed Processing Architecture

Processing millions of domains required a fundamental shift from sequential to parallel processing. Here's how we scaled our infrastructure.

The Scaling Journey

1,600 Valuations/hr (Before)
30,000+ Valuations/hr (After)
12 days Backlog Clear (Before)
~15 hours Backlog Clear (After)

Problem: Sequential Processing Bottleneck

Our original domain valuation daemon processed domains sequentially:

# Original approach - single threaded
while True:
    domains = get_domains_needing_valuation(batch_size=100)
    for domain in domains:
        # 3-4 DB queries per domain
        age_data = fetch_rdap_data(domain)
        ssl_data = fetch_ssl_cert(domain)
        email_data = fetch_email_records(domain)

        # Calculate and save
        valuation = calculate_value(domain, age_data, ssl_data, email_data)
        save_to_database(valuation)

    time.sleep(60)  # Wait before next batch

Issues encountered at scale:

Solution: Celery-Based Parallel Processing

We migrated to a distributed task queue architecture using Celery and Redis:

Task Distribution

Celery Beat schedules batches of 1,000 domains every 2 minutes. Each domain becomes an independent task that can be processed by any available worker.

Parallel Workers

16 concurrent Celery workers process valuations simultaneously. Each worker handles one domain at a time with automatic retry on failure.

Redis Queue

Redis serves as the message broker, queuing tasks and distributing them to workers. Provides persistence and visibility into queue depth.

Auto-Retry

Failed tasks automatically retry up to 3 times with exponential backoff. No more silently dropped valuations.

# New approach - Celery distributed tasks
@app.task(bind=True, max_retries=3)
def value_domain(self, domain_id, domain_name):
    try:
        # Same valuation logic, but runs in parallel
        valuation = calculate_and_save_valuation(domain_id, domain_name)
        return {'domain': domain_name, 'value': valuation}
    except Exception as e:
        self.retry(exc=e)  # Automatic retry with backoff

@app.task
def queue_valuation_batch(batch_size=1000):
    domains = get_domains_needing_valuation(batch_size)
    # Queue all domains as parallel tasks
    tasks = group(value_domain.s(d.id, d.name) for d in domains)
    tasks.apply_async()  # Fan out to all workers

Why This Design?

Service Configuration

# /etc/systemd/system/dnsscience-celery-valuation.service
[Service]
ExecStart=/usr/local/bin/celery -A celery_config worker \
    -Q valuation \
    -c 16 \                    # 16 concurrent workers
    -n valuation@%h \
    --loglevel=INFO
Restart=always
RestartSec=10

Self-Healing Infrastructure

Operating 17+ background services requires automated monitoring and recovery. Manual intervention doesn't scale.

The Problem

During development, we encountered several recurring issues:

Multi-Layer Health Monitoring

Process Health (Every 15 min)

Checks if all enabled services are running. Auto-restarts crashed daemons. Logs restart events to system journal.

Data Freshness Checks

Monitors table timestamps. If no new valuations in 60 min, restarts valuation daemon. If no new domains in 60 min, restarts discovery.

Ingestion Rate Monitoring

Tracks records per hour. Alerts and restarts if below thresholds (e.g., <50 domains/hr or <100 valuations/hr).

Boot Recovery

Systemd service ensures all daemons start on instance reboot. No manual intervention required after AWS maintenance.

Health Monitor Implementation

# /usr/local/bin/dnsscience-health-monitor.sh (runs via cron every 15 min)

# Check data freshness - restart if stale
check_data_freshness() {
    local table=$1
    local service=$2
    local max_minutes=$3

    LAST_UPDATE=$(psql -c "SELECT EXTRACT(EPOCH FROM
        (NOW() - MAX(created_at)))/60 FROM $table;")

    if [ "$LAST_UPDATE" -gt "$max_minutes" ]; then
        logger "STALE DATA: $table - restarting $service"
        systemctl restart $service
    fi
}

# Domain discovery - should have new domains every hour
check_data_freshness "discovered_domains" "domain-discovery.service" 60

# Valuations - should value domains every hour
check_data_freshness "domain_valuations" "dnsscience-domain-valuation.service" 60

Deployment Automation

Single command deploys all services, daemons, and configuration to production:

./deploy_all_services.sh

# Syncs to S3, deploys to instance, enables services, restarts everything
# No more "forgot to deploy" issues

Scalability & Performance

Horizontal Scaling

Our architecture is designed to scale horizontally across multiple dimensions:

Web Tier

Auto Scaling Group with load balancer. Currently running t3.medium instances with capacity to scale to t3.xlarge.

Database Tier

RDS PostgreSQL with read replicas. Multi-AZ deployment for high availability.

Cache Tier

Redis ElastiCache (cache.t3.small) with 1.5GB memory for hot data.

Daemon Distribution

Daemons run independently and can be distributed across multiple worker instances.

Performance Optimizations

Current Performance Metrics

134ms Avg. Page Load
~50ms API Response
99.9% Uptime SLA
10K+ Queries/Hour

Security Architecture

Defense in Depth

Compliance

Our architecture is designed with compliance in mind:

API Architecture

RESTful Design

Our API follows REST principles with predictable endpoints:

GET  /api/stats/live          # Real-time platform statistics
GET  /api/darkweb/stats       # Dark web monitoring stats
GET  /api/darkweb/onion/:domain  # Check for .onion alternatives
POST /api/lookup              # Domain intelligence lookup
GET  /api/rdap/:domain        # RDAP registration data
GET  /api/whois/:domain       # WHOIS information
POST /api/bulk-lookup         # Batch domain analysis
POST /api/scan                # Domain scan (Simple/Advanced/Expert modes)
GET  /api/ip/:ip/scan         # IP scan (Simple/Advanced/Expert modes)

Expert Mode Scanning

Our platform offers three scanning modes with progressively granular control:

Simple Mode

Quick scans with automatic checks for DNS, SSL, and basic security indicators. Perfect for rapid assessments.

Advanced Mode

Comprehensive scans including DNSSEC validation, threat intelligence feeds, and enhanced security checks.

Expert Mode

Fully customizable scans with granular control over intelligence sources, DNS resolvers, and data collection methods. Choose exactly which checks to run.

Expert Mode Options

Domain Scans: Customize DNS analysis (records, DNSSEC, propagation), security checks (SSL, certificate transparency), email security (SPF, DKIM, DMARC), and threat intelligence sources.

IP Scans: Configure geolocation providers (IPInfo, MaxMind, BGP, RIPEstat), security sources (AbuseIPDB, RBL, threat feeds), and advanced analysis (Cloudflare detection, reverse DNS, WHOIS lookups).

# Example: Expert Mode Domain Scan
POST /api/scan
{
  "domain": "example.com",
  "expert": true,
  "options": {
    "dns": ["records", "dnssec", "propagation"],
    "security": ["ssl", "ssl-chain", "cert-transparency"],
    "email": ["spf", "dkim", "dmarc", "mx-health"],
    "intel": ["whois", "reputation", "threat"]
  }
}

# Example: Expert Mode IP Scan
GET /api/ip/8.8.8.8/scan?expert=true&options={"geo":["ipinfo","maxmind"]}

Rate Limiting Tiers

Free Tier 15,000 requests/day - Anonymous & registered users
Essentials ($29/mo) 80,000 requests/day - Small teams
Professional ($99/mo) 135,000 requests/day - Security teams
Commercial ($299/mo) 375,000 requests/day - Enterprises
Research ($199/mo) 275,000 requests/day - Academic institutions
Enterprise (Custom) Unlimited - Custom integrations

Future Architecture Enhancements

Planned Advanced Features

GraphQL API

Timeline: 3-4 weeks

Cost: $0-10/month

Flexible querying interface for complex use cases. Query exactly the data you need with a single request. Perfect for advanced integrations and custom dashboards.

Tech: Graphene-Python, Apollo Server, GraphQL subscriptions

Real-time WebSockets

Timeline: 2-3 weeks

Cost: $50-150/month

Live domain monitoring feeds with instant notifications. Stream CT log discoveries, SSL certificate changes, and DNS updates in real-time.

Tech: Socket.IO, Redis Pub/Sub, AWS API Gateway WebSocket

Machine Learning

Timeline: 6-8 weeks

Cost: $50-200/month

Predictive analytics for domain reputation scoring. Anomaly detection, phishing prediction, and automated threat classification using TensorFlow and scikit-learn.

Tech: TensorFlow, scikit-learn, AWS SageMaker

AI Features

Timeline: 4-6 weeks

Cost: $200-800/month

Natural language queries, automated report generation, and intelligent domain recommendations. Powered by large language models and vector embeddings.

Tech: OpenAI GPT-4, Claude API, Pinecone vector DB

Infrastructure Enhancements

Note: Detailed implementation plans, technical architecture diagrams, and cost breakdowns are available in our internal documentation. These features are being prioritized based on user feedback and enterprise requirements.

This architecture documentation is continuously updated as we enhance our platform.

Last updated: November 2025