What data does Ghost Tax observe?

Ghost Tax observes public data only: company domain, publicly referenced technology stack, hiring signals, and user-declared inputs (headcount, spend, industry). No internal systems are accessed at the detection stage.

How does Ghost Tax estimate financial exposure?

Exposure is estimated by combining detected signals (tool redundancy, license sprawl, etc.) with industry baselines (12-22% of annual spend is typical waste). When declared spend is provided, accuracy improves significantly. All estimates are bounded ranges, never point values.

What is the confidence model?

Confidence is scored 0-100 across five layers: signal detection, exposure estimation, peer benchmarking, scenario modeling, and causal analysis. The system never claims 100% confidence. When confidence is below 50, language is softened and ranges are widened.

What are the limitations?

The system has no access to internal billing, contracts, or utilization data. It cannot detect negotiated discounts, multi-year commitments, or usage-based overages. Exposure estimates are structural indicators, not audit-grade findings. Declared spend data materially improves accuracy.

← Back

DETECTION METHODOLOGY

How the system detects, classifies, and bounds financial exposure.

This page explains what the engine observes, what it infers, what it estimates, and how confidence is formed. We publish our methodology, our limitations, and our boundaries.

EXECUTIVE SUMMARY

60s

Free scan, no system access

85/100

Max confidence score (never overclaimed)

$490

Full Detection Protocol

Ghost Tax detects financial exposure in SaaS, Cloud, and AI spending using public signals and optional declared inputs. No system access required. The free scan delivers a structured preview in 60 seconds. The paid protocol ($490) adds deep enrichment, stakeholder memos, and vendor negotiation playbooks — delivered in 48 hours.

PIPELINE ARCHITECTURE

The analysis runs as 21 deterministic phases in strict sequence. Each phase streams as a newline-delimited JSON object (NDJSON) over HTTP — the UI renders incrementally as evidence accumulates. The executive snapshot streams last. Reordering is not permitted.

enrichment

context

exposure

lossVelocity

costOfDelay

diagnosis

causalGraph

proofEngine

proof

marketMemory

peerComparison

driftMonitor

correctionMomentum

scenarios

counterfactual

decisionFriction

decisionPressure

negotiation

confidenceModel

decisionPack

executiveSnapshot

Source: lib/analysis.ts (~2100 lines) — server-only, no client-side AI. Streaming endpoint: app/api/intel/route.ts (maxDuration=60).

01 — WHAT THE SYSTEM OBSERVES

Direct signals from public sources and declared inputs.

The system collects observable data from two channels: public web enrichment (via Exa neural search) and user-declared inputs.

PUBLIC ENRICHMENT

Technology stack mentions (job postings, integrations pages, press)
Hiring velocity signals (open roles mentioning specific tools)
Vendor partnership announcements
Public tech blog references

USER-DECLARED INPUTS

Company domain (required)
Headcount (optional — improves accuracy)
Monthly IT spend (optional — materially improves confidence)
Industry classification (optional)

Observed signals carry the highest evidence weight. When the system can directly verify a technology mention from multiple public sources, that signal is classified as "observed."

02 — WHAT THE SYSTEM INFERS

Structural patterns derived from the technology footprint.

When the public enrichment reveals a company's technology stack, the system applies heuristic rules to detect structural patterns that commonly correlate with financial exposure.

INFERRED

AI Tool Redundancy

Multiple overlapping AI tools (e.g., OpenAI + Anthropic + GitHub Copilot) suggest capability duplication.

INFERRED

Observability Overlap

Multiple monitoring/analytics platforms (e.g., Datadog + Amplitude) indicate feature overlap.

INFERRED

Plan Oversize

Enterprise-tier tools detected for organizations below 50 employees.

INFERRED

Multi-Cloud Waste

Multiple cloud providers suggest underutilized commitments.

OBSERVED

Shadow IT Risk

Rapid hiring signals correlate with ungoverned tool adoption.

ESTIMATED

License Sprawl

Large tool footprints carry statistically predictable inactive license rates.

OBSERVED

Elevated Per-Employee Spend

Declared spend exceeding industry median per-employee benchmarks.

Inferred signals are always labeled as such. They are never presented as observed facts. Each carries bounded impact ranges, not point estimates.

03 — WHAT THE SYSTEM ESTIMATES

Financial projections from industry baselines.

When signal-level data is insufficient, the system falls back to industry-calibrated baselines to produce bounded exposure estimates.

BASELINE MODEL

12–22% of annual IT spend is the typical "Ghost Tax" range for organizations with 50–500 employees. Source: Flexera 2024, Zylo 2024, Gartner 2025 composite.

WHEN BASELINES ARE USED

When no monthly spend is declared, the system estimates it at ~380 EUR/employee/month. This is clearly marked as "estimated" and carries the lowest confidence tier.

Estimated outputs always carry the lowest confidence scores and are explicitly separated from observed and inferred signals in the proof architecture.

04 — CONFIDENCE MODEL

Numeric confidence, not qualitative labels.

Every output carries a numeric confidence score from 0 to 100. This score is derived from four weighted inputs:

Exa enrichment depth

Up to 25 points

More public signals = higher confidence in technology footprint.

Vector memory matches

Up to 20 points

Similar historical cases in the knowledge base improve accuracy.

Detected signal count

Up to 30 points

More independent signals = stronger convergence.

Declared spend data

15 points

User-provided spend data materially improves exposure accuracy.

The system never claims 100/100 confidence. Maximum is capped at 85. Below 30, results include an explicit limitation warning. strong (≥60), moderate (≥35), directional (<35).

This analytical rigor applies to your infrastructure in 48h.

Analyse my infrastructure →

05 — BOUNDARIES AND CAVEATS

What the system does not do.

✗

Does not access internal billing systems, ERP, or vendor APIs.

✗

Does not read contracts, invoices, or utilization logs.

✗

Does not perform real-time monitoring or continuous scanning.

✗

Does not provide department-level or per-user attribution.

✗

Does not use neural networks or ML models — all detection is deterministic and heuristic.

✗

Cannot detect exposure patterns that leave no public signal.

✗

Actual exposure may differ from estimates — ranges reflect structural uncertainty.

These limitations are displayed in the analysis output itself, not hidden in fine print.

06 — WHY THE OUTPUT IS DECISION-USEFUL

Useful without internal ledger access. More useful with it.

The public/self-serve analysis works from publicly available signals and optional declared inputs. This is sufficient to:

✓

Identify the likely shape and magnitude of financial exposure.

✓

Classify signals by evidence tier so the buyer knows what is proven vs projected.

✓

Produce bounded ranges that are directionally reliable for budget conversations.

✓

Generate stakeholder memos that frame the case for internal circulation.

✓

Create competitive pressure via peer benchmarking (when data is sufficient).

✓

Quantify the cost of inaction through loss velocity.

07 — WHAT DEEPENS IN THE PAID PROTOCOL

Internal data intake unlocks precision the public analysis cannot reach.

The paid Detection Protocol ($490 / €490) adds a structured data intake phase where the organization provides billing exports, license inventories, and vendor contracts. This enables:

Vendor-level corrective actions

Specific renegotiation, downgrade, and consolidation recommendations per vendor.

Utilization-based license audit

Inactive and underutilized seats identified with exact counts.

Contract timeline analysis

Renewal dates, auto-renewal clauses, and negotiation windows mapped.

Implementation support

Sequenced action plan with owner assignment and timeline.

The paid protocol does not replace the public analysis — it deepens it. Confidence scores increase materially when internal data is available.

Related research

SaaS & AI Cost Exposure by Industry →Shadow AI Governance: Detection & Cost Impact →CFO Guide to Technology Spend Exposure →

Security & Data Handling →Procurement Guide →All Intelligence Benchmarks →

Test the methodology on a real domain.

ENTER DECISION ROOM Security & Data Handling

GHOST TAX