DNS troubleshooting tool
  • Go 99.5%
  • Makefile 0.5%
Find a file
Tero Halla-aho 82a16a5031 Add host-level UDP packet-loss diagnostics
Adds an internal/hostnet package that reads kernel UDP counters from
world-readable /proc on Linux — no sudo required — so 'resolvr' can
tell whether the host it runs on is silently dropping UDP datagrams to
socket receive-buffer overflow. That is the invisible cause of 'random'
DNS timeouts when something else on the box (chatty multicast listeners,
etc.) is flooding UDP and starving the resolver's sockets.

Wired into both 'health' and 'random':

  * health captures before/after snapshots around its probe and prints
    a Local host UDP line. The verdict gains two new branches: a
    definitive 'this host dropped N datagrams' when RcvbufErrors > 0,
    and a 'looks like packet loss' shape heuristic when there are
    timeouts but successful queries are fast (p95 < 250 ms) — packet
    loss leaves a different fingerprint from a slow resolver.

  * random prints the same Local host UDP footer under its resolver
    breakdown.

Linux-only via build tags, with a graceful no-op stub on macOS and
other platforms, so the tool stays portable: run it on the client to
narrow the problem, then move to the resolver host where host checks
light up. Opt out with --no-host-checks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-30 20:54:35 +03:00
cmd Add host-level UDP packet-loss diagnostics 2026-05-30 20:54:35 +03:00
internal Add host-level UDP packet-loss diagnostics 2026-05-30 20:54:35 +03:00
.gitignore Rename to resolvr; relicense as MIT; add SPDX headers 2026-05-30 20:54:35 +03:00
go.mod Rename to resolvr; relicense as MIT; add SPDX headers 2026-05-30 20:54:35 +03:00
go.sum Initial commit with first working version 2026-05-25 16:56:58 +03:00
LICENSE Rename to resolvr; relicense as MIT; add SPDX headers 2026-05-30 20:54:35 +03:00
main.go Rename to resolvr; relicense as MIT; add SPDX headers 2026-05-30 20:54:35 +03:00
Makefile Rename to resolvr; relicense as MIT; add SPDX headers 2026-05-30 20:54:35 +03:00
README.md Rename to resolvr; relicense as MIT; add SPDX headers 2026-05-30 20:54:35 +03:00

resolvr

A DNS troubleshooting CLI for diagnosing resolver reliability, email infrastructure, and domain health. Written in Go.

Distinguishing features:

  • Host-level UDP packet-loss detection. When run on a Linux resolver host, resolvr reads kernel UDP counters from world-readable /proc (no sudo) and tells you whether the host is silently dropping datagrams — the kind of fault that looks like random DNS timeouts but is invisible to a pure query test.
  • Fair multi-resolver comparison. --warm-cache and --interleave flags eliminate the cold-cache tax that otherwise unfairly penalises whichever resolver is queried first (especially when one resolver forwards to another, e.g. bind → Pi-hole).
  • Plain-language verdicts that distinguish packet loss from slow resolver from upstream problem, so you know where to look next.
resolvr health                            # is my DNS broken?
resolvr random --real --count 20 --timeout 2s --retry 2 --answers --crosscheck --parallel 5
resolvr random --warm-cache --interleave  # fair multi-resolver comparison
resolvr stress --random --duration 5m
resolvr analyze example.com
resolvr compare example.com

Contents


Installation

Requires Go 1.22 or later.

git clone https://github.com/thallaa/resolvr
cd resolvr
make build          # produces bin/resolvr
# or
go build -o bin/resolvr .

Move the binary somewhere on your $PATH:

cp bin/resolvr /usr/local/bin/

Global flags

These flags apply to every subcommand and must be placed before the subcommand name.

Flag Default Description
--resolver <addr> system DNS resolver(s) to use. Repeatable. system reads from /etc/resolv.conf. IP addresses without a port default to :53. Examples: 8.8.8.8, 1.1.1.1:53, 192.168.0.1.
--timeout <duration> 5s Per-query timeout. Accepts Go duration strings: 500ms, 2s, 1m.
--json false Emit machine-readable JSON to stdout. Suppresses progress bars and ANSI colour.
--no-color false Disable ANSI colour in terminal output.
--verbose false Print raw DNS response messages.
--no-host-checks false Skip reading local-host UDP packet-loss counters. The default (on) reads world-readable /proc/net/snmp and /proc/net/udp on Linux — no sudo required — and is a silent no-op on macOS and other platforms.

Examples:

# Use Cloudflare instead of the system resolver
resolvr --resolver 1.1.1.1 analyze example.com

# Use two resolvers (stress command will round-robin between them)
resolvr --resolver 8.8.8.8 --resolver 1.1.1.1 stress --random --duration 2m

# Tighten the per-query timeout
resolvr --timeout 1s random --real --count 50

Commands

health

The fastest first step when something feels wrong. Queries a sample of real popular domains against your local resolver and a clean reference resolver simultaneously, then gives a plain-language verdict.

resolvr health

Flags

Flag Default Description
--count <n> 15 Number of domains to test, drawn from the embedded real-domain list.
--reference <addr> 8.8.8.8:53 Clean reference resolver to compare against.

Output

A two-row table showing, for each resolver: success count, timeouts, errors, p50/p95/p99 latency, and a colour-coded score (green ≥95%, yellow ≥80%, red <80%). Followed by a verdict:

Verdict Meaning
✓ Local resolver appears healthy ≥95% success on both — nothing obvious to investigate.
✗ This HOST dropped N UDP datagrams Strongest signal. The host itself overflowed a socket receive buffer during the run — packets are being lost below the resolver. This is not a DNS-config problem; a process is flooding UDP or a buffer is too small. Only fires when run on a Linux resolver host.
⚠ Local resolver looks like it is LOSING PACKETS Queries timed out but the ones that succeeded came back fast (p95 < 250 ms). That shape points to dropped datagrams, not a slow resolver. Run this on the resolver host so the host UDP check can confirm where the loss is.
⚠ Local resolver is DEGRADED Local failure rate is meaningfully worse than reference — fault is at your router or ISP upstream.
✗ Both resolvers are struggling Both are failing — likely an internet connectivity problem, not DNS-specific.
⚠ Elevated failure rate Local is below 95% but not dramatically worse than reference — run random --crosscheck for detail.

Local host UDP diagnostic

When run on a Linux host (typically the resolver itself), health snapshots kernel UDP counters before and after the test and reports the delta as a single line:

Local host UDP:  ✓ no receive-buffer drops during this run (308 datagrams)

If any datagrams were dropped to socket receive-buffer overflow during the run, it instead shows:

Local host UDP:  ✗ 473 datagrams dropped to receive-buffer overflow (5.0% of 9469)
                 → socket(s) on port 8153 dropped 244
                 This is host-level packet loss, not a DNS misconfiguration.

That second form is the smoking gun for the class of fault where some other process on the box is flooding UDP (multicast listeners are classic offenders) and starving the resolver's sockets. Investigate with ss -uapm (look at the d drop column) and nstat -az; <rerun>; nstat | grep -i Udp. No sudo is required; the counters come from world-readable /proc. On macOS and other non-Linux platforms the section is silently omitted.

Examples

# Quick health check using defaults (system resolver vs 8.8.8.8)
resolvr health

# Test against Cloudflare instead of Google as the reference
resolvr health --reference 1.1.1.1

# Use more domains for a more statistically reliable result
resolvr health --count 30

# JSON for scripting
resolvr health --json | jq '{local: .local.success_rate, ref: .reference.success_rate}'

random

Fire DNS queries to a batch of domains and collect latency, rcode, and retry statistics. Useful for exposing intermittent resolver failures, measuring cache hit rates, and detecting Pi-hole blocking.

resolvr random [flags]

Domain source

Two mutually exclusive modes control which domains are queried.

Hex mode (default): generates random hex labels guaranteed not to exist, e.g. b7c60a8fc5a787c8.com. Every response should be NXDOMAIN; anything else indicates DNS hijacking or a captive portal.

Real mode (--real): picks from an embedded list of ~270 popular real-world domains. Every response should be NOERROR; failures indicate resolver problems.

Flags

Flag Default Description
--count <n> 20 Number of domains to query.
--type <type> A DNS record type to query (A, AAAA, MX, TXT, etc.).
--real false Use real popular domains from the embedded list instead of hex-generated ones.
--domain-list <file> Path to a plain-text file with one domain per line. Implies --real.
--tld <tld> com TLD used when generating hex domains.
--label-len <n> 16 Character length of the hex label for generated domains.
--retry <n> 1 Maximum number of retries on timeout or error (0 = no retry). The table shows how many attempts each domain needed; retried successes are marked . A domain that fails once then succeeds on retry is a strong signal of an intermittent resolver problem.
--answers false Add an Answers column to the table showing resolved IPs or record values.
--pihole false Enable Pi-hole blocking detection. Implies --real. Queries each domain against both the local resolver and a clean reference resolver, then classifies the result. See Pi-hole detection below.
--reference-resolver <addr> 8.8.8.8:53 Clean reference resolver for Pi-hole comparisons.
--crosscheck false After the primary run, re-query every timed-out or failed domain against a reference resolver to determine whether the failure is at your local resolver or globally. See Crosscheck below.
--crosscheck-resolver <addr> 8.8.8.8:53 Reference resolver used for crosscheck queries.
--parallel <n> 1 Number of queries to fire concurrently. The default (1) is sequential and preserves the exact order of results in the output. Higher values stress the resolver simultaneously — useful for reproducing intermittent failures that only appear under load. Combines with --retry: each concurrent slot may make up to retry+1 attempts independently.
--warm-cache false Pre-query every domain against every resolver once (results discarded) before the measured run, so all resolvers start with warm caches. See Fair multi-resolver comparison.
--interleave false Query all resolvers per-domain in a shuffled order, instead of finishing one resolver before starting the next. Distributes cold-cache lookups evenly. See Fair multi-resolver comparison.

Output columns

Column Description
Domain The queried domain name.
Resolver The resolver address that was used.
RCode DNS response code (NOERROR, NXDOMAIN, TIMEOUT, ERROR, …).
Latency Round-trip time for the final successful attempt.
Tries Number of attempts made (shown when any domain was retried).
Answers Resolved IPs or record values (shown with --answers).
Status OK, NXDOMAIN (expected), TIMEOUT ×N, ERROR, coloured accordingly. Retried successes show OK ↺N.

A latency summary is printed below the table for successful queries: min, mean, p50, p95, p99, max. The p95 and p99 values are highlighted in yellow (>200 ms) or red (>500 ms).

Pi-hole detection

When --pihole is set, each domain is queried against both the local resolver (which routes through Pi-hole) and the reference resolver simultaneously. The tool detects three Pi-hole blocking modes:

Status Meaning
OK Local and reference answers agree — domain is not blocked.
BLOCKED (null) Local resolver returned 0.0.0.0 or :: — null-blocking mode.
BLOCKED (NXDOMAIN) Local returned NXDOMAIN, reference returned NOERROR — NXDOMAIN-blocking mode.
BLOCKED (redirect) Local returned a private/LAN IP, reference returned a public IP — redirect-blocking mode.
unknown Query errors or ambiguous results.

Fair multi-resolver comparison

When you give random more than one --resolver, by default it queries them in order: every domain against resolver 1, then every domain against resolver 2, and so on. That's simple but unfair: whichever resolver runs first pays the cold-cache tax (slow upstream lookups, transient SERVFAILs), while later resolvers benefit from the cache it just warmed. This is especially distorting when one resolver forwards to another (e.g. bind → Pi-hole), because every cold lookup the forwarder makes warms Pi-hole's cache before Pi-hole is measured directly.

Two flags fix this and can be combined:

  • --warm-cache — run a discarded pre-pass that queries every domain against every resolver once. The measured run then starts with all caches warm.
  • --interleave — for each domain, query all resolvers in a shuffled order before moving on, so cold lookups are spread evenly across resolvers instead of all landing on the primary.

Use --warm-cache --interleave together for the cleanest apples-to-apples comparison; use --interleave alone if you want to see real cold-cache behaviour but distributed fairly.

Local host UDP diagnostic

The multi-resolver breakdown is followed by the same Local host UDP: line described under health. It surfaces socket receive-buffer overflow on the host running resolvr, which is the easy-to-miss cause of "random" DNS timeouts when something else on the box (chatty multicast listeners, etc.) is flooding UDP. No sudo required; Linux only.

Crosscheck

When --crosscheck is set, any domain that timed out or errored in the primary run is re-queried against the crosscheck resolver (default 8.8.8.8:53). A second progress bar tracks this phase, and a CROSSCHECK section is appended to the output.

The verdict line at the end tells you where the problem lies:

→ All 4 timeouts resolved on 8.8.8.8:53 — the problem is your local resolver
→ 2/4 timed-out domains resolved on 8.8.8.8:53 (partial local resolver failure)
→ None of the timed-out domains resolved on 8.8.8.8:53 either — domains may be globally unreachable

Examples

# 20 random hex queries (default) — NXDOMAIN is the expected success
resolvr random

# 50 real popular domains, 2-second timeout, 2 retries, show IPs
resolvr random --real --count 50 --timeout 2s --retry 2 --answers

# Same run but also crosscheck every timeout against 8.8.8.8
resolvr random --real --count 50 --timeout 2s --retry 2 --answers --crosscheck

# Stress the router with 10 simultaneous queries to reproduce load-dependent failures
resolvr random --real --count 50 --timeout 2s --retry 2 --parallel 10 --answers

# Check for Pi-hole blocking using 1.1.1.1 as reference
resolvr random --pihole --reference-resolver 1.1.1.1 --count 30

# Query your own domain list
resolvr random --domain-list ./my-domains.txt --answers --retry 3

# Test AAAA records only
resolvr random --real --type AAAA --count 20 --answers

# JSON output for scripting
resolvr random --real --count 100 --json > results.json

stress

Run a sustained DNS query loop for a fixed duration and collect reliability statistics: success rate, latency percentiles (p50/p95/p99), timeout count, and per-resolver breakdowns. Designed to expose flaky behaviour that only appears under sustained load.

resolvr stress --random --duration 5m
resolvr stress --domain example.com --duration 10m --interval 500ms

Exactly one of --random or --domain is required.

Flags

Flag Default Description
--random false Generate a fresh random hex domain for every query. Bypasses all caching — every query requires a full recursive lookup from the resolver.
--domain <domain> Repeatedly query a specific domain. Likely to be served from cache after the first hit; useful for measuring resolver cache consistency.
--duration <duration> 1m How long to run the test.
--interval <duration> 1s Time between queries. Set to a lower value (e.g. 100ms) to apply more load; set higher to run a low-frequency long-term soak test.
--type <type> A DNS record type to query.

When multiple resolvers are configured with --resolver, the stress test round-robins across them and produces a per-resolver breakdown in the summary.

Output

The live progress bar shows running totals:

Queries: 142 | Success: 97% | Timeouts: 4  ████████████████░░░░  (142/300)

The final summary includes:

  • Total / Successes / NXDOMAIN / Failures / Timeouts counts
  • Success rate (highlighted red if below 95%)
  • Latency table: min, mean, p50, p95, p99, max
  • Per-resolver breakdown when more than one resolver is in use

Examples

# 5-minute soak test with random domains (no caching)
resolvr stress --random --duration 5m

# 10-minute test of a specific domain, querying every 500ms
resolvr stress --domain example.com --duration 10m --interval 500ms

# Compare two resolvers over 2 minutes
resolvr --resolver 192.168.0.1 --resolver 8.8.8.8 stress --random --duration 2m

# JSON output for later analysis
resolvr stress --random --duration 5m --json > stress.json

analyze

Perform a comprehensive inspection of a single domain. All checks run concurrently where possible; a live progress bar tracks each step.

resolvr analyze <domain>

Checks performed

Check What it examines
DNS records A, AAAA, MX, NS, TXT, SOA, CNAME — all queried in parallel.
SPF Fetches and parses the SPF TXT record. Validates syntax, counts DNS lookups against the RFC 7208 limit of 10, and reports the all qualifier (-all = strict, ~all = soft fail, +all = insecure).
DKIM Probes a list of common selectors (default, google, selector1, selector2, k1, mail, dkim, smtp, email, …). Reports which selectors have a published key and the key type (RSA/Ed25519).
DMARC Fetches _dmarc.<domain>, parses p=, pct=, aspf=, adkim= tags, and warns about common misconfigurations.
DNSSEC Checks for RRSIG records (signed), DNSKEY presence, DS record at the parent, and validates the signature.
TLS Opens a TLS connection to port 443, inspects the leaf certificate (subject, issuer, SANs, expiry), and validates the chain. Expiry warnings at ≤30 days; critical at ≤14 days.
PTR / FCrDNS Performs reverse DNS (PTR) lookups for every A and AAAA address, then forward-confirms that the PTR name resolves back to the original IP (FCrDNS).
SMTP Probes ports 25, 465, and 587 on every MX host and reports open/closed status and the banner.

Flags

Flag Default Description
--skip-smtp false Skip SMTP port probing. Useful when the MX hosts are behind a firewall or when a quick check is needed.
--skip-tls false Skip TLS certificate inspection.
--skip-dnssec false Skip DNSSEC validation.
--dkim-selectors <list> (built-in defaults) Comma-separated list of DKIM selectors to probe, e.g. google,selector1,myselector.

Examples

# Full analysis of a domain
resolvr analyze example.com

# Skip slow network checks for a quick look
resolvr analyze example.com --skip-smtp --skip-tls

# Probe specific DKIM selectors
resolvr analyze example.com --dkim-selectors google,selector1,protonmail

# Use a specific resolver (e.g. to check propagation from a particular nameserver)
resolvr --resolver 1.1.1.1 analyze example.com

# Machine-readable output
resolvr analyze example.com --json | jq '.spf.all_qualifier'

compare

Send the same DNS query to multiple resolvers concurrently and compare their answers. Detects mismatches that could indicate split-brain DNS, stale caches, or incomplete propagation.

resolvr compare <domain>

Each resolver is queried --count times; latencies are averaged over those samples.

Flags

Flag Default Description
--count <n> 5 Number of queries per resolver. Results are averaged.
--type <type> A DNS record type to query.
--resolvers <list> 8.8.8.8:53,1.1.1.1:53,system Comma-separated list of resolvers to compare. Overrides the global --resolver flag. system expands to the addresses in /etc/resolv.conf.

Note: The global --resolver flag takes precedence over --resolvers if explicitly set on the command line.

Output

The results table shows, for each resolver:

Column Description
Resolver IP:port of the resolver.
Answers Resolved record values.
Avg RTT Average round-trip time across the --count queries.
Min / Max Fastest and slowest individual queries.
Errors Count of queries that returned an error.
RCode Response code from the last query.

If resolvers return different answers, a mismatch warning is printed and each resolver's answer set is listed.

Examples

# Compare across Google, Cloudflare, and the system resolver
resolvr compare example.com

# Check if a newly published record has propagated to specific resolvers
resolvr compare example.com --resolvers 8.8.8.8,1.1.1.1,9.9.9.9

# Compare MX records
resolvr compare example.com --type MX

# Increase sample count for more stable latency averages
resolvr compare example.com --count 20

# Query a specific resolver plus public ones
resolvr compare example.com --resolvers 192.168.0.1,8.8.8.8,1.1.1.1

# JSON output
resolvr compare example.com --json | jq '.mismatch'

Output formats

Terminal (default)

Coloured tables rendered with ANSI escape codes. Latency values are colour-coded: green (fast), yellow (>200 ms), red (>500 ms). Use --no-color to disable colour for plain-text output.

JSON (--json)

All commands support --json, which writes a single JSON object or array to stdout and suppresses the progress bar, colour, and headers. The schema matches the internal Go types:

Command JSON root type
health HealthReport
random []RandomResult
random --pihole []PiholeResult
stress Summary
analyze DomainReport
compare CompareReport

Examples:

# Extract timeout rate from a stress run
resolvr stress --random --duration 2m --json \
  | jq '{timeouts: .timeouts, total: .total, rate: (.timeouts/.total)}'

# List all timed-out domains from a random run
resolvr random --real --count 100 --timeout 2s --json \
  | jq '[.[] | select(.result.timed_out) | .result.domain]'

# Check DMARC policy
resolvr analyze example.com --json | jq '.dmarc.policy'

# Find resolvers that returned different answers
resolvr compare example.com --json | jq 'select(.mismatch) | .notes'

Common workflows

Diagnose intermittent DNS failures on a home or office network

# 1. Quick health check — instantly tells you if it's your resolver or the internet
resolvr health

# 2. Reproduce the problem under load — 20 queries, 5 at a time, tight timeout
resolvr random --real --count 20 --timeout 2s --retry 2 --parallel 5 --answers

# 3. Crosscheck failures to confirm the fault is at your local resolver
resolvr random --real --count 20 --timeout 2s --retry 2 --parallel 5 --answers --crosscheck

# 4. Sustained soak test to measure the baseline failure rate over time
resolvr stress --random --duration 10m

Track down "random" timeouts on your own resolver host

When you can SSH to the box that runs your resolver (router, Pi-hole host, bind host, etc.), run resolvr there. The Local host UDP line surfaces socket receive-buffer overflow — packets the kernel silently dropped before the resolver ever saw them — which is the classic invisible cause of intermittent DNS timeouts on busy boxes.

# On the resolver host: compare bind, Pi-hole and an external reference, with fair caches
resolvr random --real --count 50 \
  --resolver 192.168.0.1:53 --resolver 127.0.0.1:8153 --resolver 8.8.8.8:53 \
  --warm-cache --interleave

If the Local host UDP line reports drops, find the offender and the affected socket(s):

ss -uapm | sort                # look at the 'd' (drops) field; high values name the noisy app
nstat -az; sleep 30; nstat | grep -i Udp   # confirm drops are happening live

Common culprits on home/office routers include UPnP/SSDP listeners binding one socket per network interface, mDNS, and other multicast-heavy services. Constraining those, or denying their multicast on irrelevant interfaces (docker bridges, WAN), typically fixes the DNS symptom.

Check email infrastructure before going live

resolvr analyze yourdomain.com

This single command checks SPF, DKIM, DMARC, MX reachability (SMTP ports), TLS, DNSSEC, and reverse DNS in one pass.

Verify DNS propagation after a record change

# Compare your domain across Google, Cloudflare, Quad9, and your local resolver
resolvr compare yourdomain.com \
  --resolvers 8.8.8.8,1.1.1.1,9.9.9.9,system

# Poll until all resolvers agree (simple shell loop)
until resolvr compare yourdomain.com --json | jq -e '.mismatch == false' > /dev/null; do
  echo "Still propagating…"; sleep 30
done
echo "Propagation complete"

Audit Pi-hole for over-blocking

# Query 50 real domains — see which are being blocked and in which mode
resolvr random --pihole --count 50 --reference-resolver 1.1.1.1

Continuous monitoring / scripting

# Run every hour and alert if success rate drops below threshold
resolvr stress --random --duration 1m --json \
  | jq -e '.success_rate >= 0.98' || echo "ALERT: DNS reliability degraded"

License

resolvr is released under the MIT License.

Copyright © 2026 Tero Halla-aho.

You're free to use, modify, and redistribute this software — including in commercial work — provided you keep the copyright notice and the MIT permission notice from LICENSE in your copy. If you build something on top of resolvr, a link back to this repository is appreciated but not required beyond that attribution.