Accuracy benchmark

How accurate are the deep rules?

Each of GoForLaunch's structural data-flow rules is measured against paired fixtures: a vulnerable sample that must be flagged (recall) and a correctly-scoped safe sample that must not be (specificity). The numbers below are recomputed from the live scanner engine on every page load — they cannot drift from the rules they describe.

100%

Recall

34/34 vulnerable fixtures correctly flagged.

100%

Specificity

37/37 safe fixtures correctly cleared (no false positives).

Paired fixtures

34 vulnerable + 37 safe across 21 rules.

By rule

IDORrecall 4/4 · specificity 7/7

Reads the actual where-keys of find/update/delete in [id] routes; flags id-only queries with no owner column and no post-fetch ownership check.

Mass mutationrecall 2/2 · specificity 2/2

Flags Prisma deleteMany/updateMany with no (or empty) where clause — a single call that rewrites every row.

Concurrencyrecall 1/1 · specificity 1/1

Flags read-modify-write (findUnique → update/upsert) check-then-act races; ignores idempotent findUnique → delete.

Supabase RLSrecall 2/2 · specificity 4/4

Flags PostgREST .delete()/.update() with no filter in the chain; context-aware so matches inside strings/comments are ignored.

Input Validationrecall 1/1 · specificity 2/2

Alias trackingrecall 2/2 · specificity 2/2

Missing-await authrecall 2/2 · specificity 1/1

JWT decoderecall 2/2 · specificity 1/1

Identity fallbackrecall 2/2 · specificity 1/1

Taint (SQL)recall 1/1 · specificity 1/1

Taint (SSRF)recall 1/1 · specificity 0/0

Taint (path)recall 1/1 · specificity 0/0

Taint (eval)recall 1/1 · specificity 0/0

Taint (interproc)recall 1/1 · specificity 1/1

Taint (clean)recall 0/0 · specificity 1/1

Billingrecall 1/1 · specificity 2/2

Launch Configrecall 1/1 · specificity 2/2

Scalabilityrecall 1/1 · specificity 2/2

Launch Assetsrecall 4/4 · specificity 4/4

Build Configrecall 2/2 · specificity 1/1

Emailrecall 2/2 · specificity 2/2

Methodology

Every fixture is a real code snippet run through the same deterministic engine the product ships (71 fixtures total). No LLM passes are involved in these numbers.
Recall counts vulnerable fixtures that are flagged. Specificity counts safe fixtures that are correctly left alone — the false-positive control.
Context-awareness is part of the safe set: vulnerable patterns appearing only inside strings or comments must not be flagged.
The full fixture list lives in lib/scanner/benchmark-cases.ts and is asserted in CI by tests/scanner-benchmark.test.ts.
The engine has three layers: fast regex rules, AST-lite structural rules (with one-hop alias tracking), and a real recursive-descent parser feeding a cross-module interprocedural taint engine that tracks request input into dangerous sinks through helpers defined in other files, clearing taint at sanitizers and dropping import edges that don't resolve uniquely.
What this is not: a real-world accuracy guarantee. These numbers prove the rules behave as designed on a curated set; on an arbitrary repo, false positives and negatives will be higher. Taint is cross-module within the scanned set but doesn't follow third-party packages, and the regex rules and optional LLM passes are not part of this set. GoForLaunch is a high-precision scanner for the Next.js/Supabase/Stripe stack — strong, but not a full replacement for mature SAST like CodeQL.

See it on your own repo

These rules run on your code in minutes — or preview a full report first.

Sample report Start free