GHOST TAX
GHOST TAX
DETECTION METHODOLOGY
This page explains what the engine observes, what it infers, what it estimates, and how confidence is formed. We publish our methodology, our limitations, and our boundaries.
EXECUTIVE SUMMARY
60s
Free scan, no system access
85/100
Max confidence score (never overclaimed)
$490
Full Detection Protocol
Ghost Tax detects financial exposure in SaaS, Cloud, and AI spending using public signals and optional declared inputs. No system access required. The free scan delivers a structured preview in 60 seconds. The paid protocol ($490) adds deep enrichment, stakeholder memos, and vendor negotiation playbooks — delivered in 48 hours.
PIPELINE ARCHITECTURE
The analysis runs as 21 deterministic phases in strict sequence. Each phase streams as a newline-delimited JSON object (NDJSON) over HTTP — the UI renders incrementally as evidence accumulates. The executive snapshot streams last. Reordering is not permitted.
01
enrichment
02
context
03
exposure
04
lossVelocity
05
costOfDelay
06
diagnosis
07
causalGraph
08
proofEngine
09
proof
10
marketMemory
11
peerComparison
12
driftMonitor
13
correctionMomentum
14
scenarios
15
counterfactual
16
decisionFriction
17
decisionPressure
18
negotiation
19
confidenceModel
20
decisionPack
21
executiveSnapshot
Source: lib/analysis.ts (~2100 lines) — server-only, no client-side AI. Streaming endpoint: app/api/intel/route.ts (maxDuration=60).
01 — WHAT THE SYSTEM OBSERVES
The system collects observable data from two channels: public web enrichment (via Exa neural search) and user-declared inputs.
PUBLIC ENRICHMENT
USER-DECLARED INPUTS
Observed signals carry the highest evidence weight. When the system can directly verify a technology mention from multiple public sources, that signal is classified as "observed."
02 — WHAT THE SYSTEM INFERS
When the public enrichment reveals a company's technology stack, the system applies heuristic rules to detect structural patterns that commonly correlate with financial exposure.
AI Tool Redundancy
Multiple overlapping AI tools (e.g., OpenAI + Anthropic + GitHub Copilot) suggest capability duplication.
Observability Overlap
Multiple monitoring/analytics platforms (e.g., Datadog + Amplitude) indicate feature overlap.
Plan Oversize
Enterprise-tier tools detected for organizations below 50 employees.
Multi-Cloud Waste
Multiple cloud providers suggest underutilized commitments.
Shadow IT Risk
Rapid hiring signals correlate with ungoverned tool adoption.
License Sprawl
Large tool footprints carry statistically predictable inactive license rates.
Elevated Per-Employee Spend
Declared spend exceeding industry median per-employee benchmarks.
Inferred signals are always labeled as such. They are never presented as observed facts. Each carries bounded impact ranges, not point estimates.
03 — WHAT THE SYSTEM ESTIMATES
When signal-level data is insufficient, the system falls back to industry-calibrated baselines to produce bounded exposure estimates.
BASELINE MODEL
12–22% of annual IT spend is the typical "Ghost Tax" range for organizations with 50–500 employees. Source: Flexera 2024, Zylo 2024, Gartner 2025 composite.
WHEN BASELINES ARE USED
When no monthly spend is declared, the system estimates it at ~380 EUR/employee/month. This is clearly marked as "estimated" and carries the lowest confidence tier.
Estimated outputs always carry the lowest confidence scores and are explicitly separated from observed and inferred signals in the proof architecture.
04 — CONFIDENCE MODEL
Every output carries a numeric confidence score from 0 to 100. This score is derived from four weighted inputs:
Exa enrichment depth
Up to 25 pointsMore public signals = higher confidence in technology footprint.
Vector memory matches
Up to 20 pointsSimilar historical cases in the knowledge base improve accuracy.
Detected signal count
Up to 30 pointsMore independent signals = stronger convergence.
Declared spend data
15 pointsUser-provided spend data materially improves exposure accuracy.
The system never claims 100/100 confidence. Maximum is capped at 85. Below 30, results include an explicit limitation warning. strong (≥60), moderate (≥35), directional (<35).
This analytical rigor applies to your infrastructure in 48h.
Analyse my infrastructure →05 — BOUNDARIES AND CAVEATS
Does not access internal billing systems, ERP, or vendor APIs.
Does not read contracts, invoices, or utilization logs.
Does not perform real-time monitoring or continuous scanning.
Does not provide department-level or per-user attribution.
Does not use neural networks or ML models — all detection is deterministic and heuristic.
Cannot detect exposure patterns that leave no public signal.
Actual exposure may differ from estimates — ranges reflect structural uncertainty.
These limitations are displayed in the analysis output itself, not hidden in fine print.
06 — WHY THE OUTPUT IS DECISION-USEFUL
The public/self-serve analysis works from publicly available signals and optional declared inputs. This is sufficient to:
Identify the likely shape and magnitude of financial exposure.
Classify signals by evidence tier so the buyer knows what is proven vs projected.
Produce bounded ranges that are directionally reliable for budget conversations.
Generate stakeholder memos that frame the case for internal circulation.
Create competitive pressure via peer benchmarking (when data is sufficient).
Quantify the cost of inaction through loss velocity.
07 — WHAT DEEPENS IN THE PAID PROTOCOL
The paid Detection Protocol ($490 / €490) adds a structured data intake phase where the organization provides billing exports, license inventories, and vendor contracts. This enables:
Vendor-level corrective actions
Specific renegotiation, downgrade, and consolidation recommendations per vendor.
Utilization-based license audit
Inactive and underutilized seats identified with exact counts.
Contract timeline analysis
Renewal dates, auto-renewal clauses, and negotiation windows mapped.
Implementation support
Sequenced action plan with owner assignment and timeline.
The paid protocol does not replace the public analysis — it deepens it. Confidence scores increase materially when internal data is available.
Related research
Test the methodology on a real domain.