How to Use an AI Vulnerability Scanner Safely on Your Own Source Code

AI Security, Claude Code, security leadership, vibe coding security

If your business runs on a custom in-house application, there’s a good chance it has been sitting on your risk list for a while. The original developer left two years ago. The framework is older than the junior engineer who just joined. Nobody’s done a real security review since the last pen test, and the last pen test cost more than the team budget could comfortably absorb. An AI vulnerability scanner can now surface likely security issues in source code quickly. You also don’t want to be the one who pasted the entire repo into a chat window and hoped for the best.

Adelia Risk is a cybersecurity firm that helps small and midsized businesses with in-house dev teams turn an AI vulnerability scanner from a marketing demo into a repeatable, audit-friendly process. This post is the explanatory companion to our AI Vulnerability Scanner Checklist for In-House Source Code. It walks through how to pick the right tool, harden the workstation before the scan starts, set up the scan so the output is useful, and understand where AI scanning fits next to a real penetration test. The checklist is the tactical version. This post explains why each step is included.

Why This Matters Now

Anthropic shipped three security-review products in the past nine months. The /security-review slash command appeared in Claude Code in August 2025. The anthropics/claude-code-security-review GitHub Action followed. Claude Security, the hosted multi-stage product, went into limited preview in February 2026 and into public beta on April 30, 2026. The Frontier Red Team has used these tools to find more than 500 real vulnerabilities in production open-source code, including a 23-year-old bug in the Linux kernel. The category is real, but it needs guardrails.

It’s also unsafe by default. In April 2026, security researcher Aonan Guan and Johns Hopkins collaborators Zhengyu Liu and Gavin Zhong disclosed a prompt-injection flaw in the same claude-code-security-review GitHub Action that Anthropic ships to help dev teams.

The exploit, nicknamed “Comment and Control,” lets an attacker hide instructions inside a pull-request title, issue body, or comment. The agent ingests the comment as if a trusted reviewer wrote it, and on workflows configured with pull_request_target secret access, it leaks repo secrets out of the runner. Anthropic classified it as CVSS 9.4 Critical. The action’s own README states it’s “not hardened against prompt injection attacks and should only be used to review trusted PRs.”

That’s the working definition of an AI vulnerability scanner today. An LLM-powered tool that reads your source code and reports likely security weaknesses (SQL injection, cross-site scripting, hardcoded secrets, broken access control, business-logic flaws) faster than any human reviewer. Cheap to run, generous with output, and dangerous if you let it touch a production developer’s workstation without thinking it through.

Pick the Right AI Vulnerability Scanner for Your Code Base

The first decision is which tool you’re actually using. There are four reasonable options. The checklist’s “Pick the Right Entry Point” section covers the tradeoffs in detail; here’s the short version.

The /security-review slash command. Built into Claude Code. As of late April 2026, Claude Code is included with paid Claude plans, including Pro, Max, Team, and Enterprise. Pro still has lower per-session limits than Max or Team. It runs against pending changes inside a terminal or IDE. Best for ad-hoc work or a developer scanning their own branch before opening a PR.

The anthropics/claude-code-security-review GitHub Action. Open source. Post findings as PR comments, scoped to the diff. This is the option with the known prompt-injection issue. Use it on internal pull requests from trusted contributors only. Configure your repo to require manual approval before workflows run on PRs from outside contributors. Anthropic’s own guidance is that the Action isn’t hardened against prompt injection. That’s the company saying don’t point this at the open internet.

Claude Security (hosted). Public beta as of April 30, 2026, after a February 2026 limited research preview. Multi-stage adversarial verification, central dashboard, CSV, and Markdown export. Anthropic’s Use Claude Security help article currently lists Claude Security as available in public beta for Enterprise users. The hosted product is also where Anthropic describes multi-stage verification before findings are surfaced.

A custom Claude Code skill. A reusable, parameterized prompt plus reference files, stored as .md in .claude/skills/. This is the right call when off-the-shelf tools do not understand your stack (Struts 1.3, COBOL, an internal DSL), when business logic is the actual risk (multi-tenant data isolation, authorization rules), or when policy reasons rule out the hosted product. We see this most often for AI code audit work on legacy frameworks, where the slash command wasn’t benchmarked against.

Pick one and standardize. A team using all four interchangeably does not have a baseline.

Pre-Flight Hardening Before You Scan

Most of the safety work happens before the scanner ever opens your code. The checklist’s “Do These First” section is the backbone of this work. If you only do five things in this whole guide, do these.

CHECKLIST EXTRACT

Do These First

Strip secrets, credentials, and property files from the code base before scanning: API keys, database passwords, OAuth tokens, and .env files in the working tree can be picked up by an AI agent and treated as usable access during a scan. Pull them out, replace with placeholders, and commit the cleanup before pointing the scanner at the code.

Scrub prior commit history for secrets too: Removing a secret from HEAD does not remove it from git log. An AI agent can be pointed at history too. Run a history scan and rotate any secret that ever appeared, even if it has since been “removed.”

Move the code to local disk, not a network share, before scanning: Network shares can slow scans, expose file paths in logs or telemetry, and create cross-machine permission surprises if the agent starts looking beyond the intended folder.

Run scans on a segregated, hardened machine with restricted outbound access: A virtual machine or dedicated laptop with no production access and limited outbound destinations keeps the blast radius smaller.

Allowlist only the network endpoints the scanner needs: Block everything outbound by default, then permit Anthropic API endpoints and required package registries.

The single biggest mistake is pointing an AI scanner at code that still contains live credentials. The phrase “find secrets in source code” is a Google query for a reason. If the scanner can read a live secret, treat it as usable access.

We have seen a goal-seeking Claude Code instance, blocked from pulling credentials from 1Password, walk the local filesystem and find a credential someone had accidentally left in the wrong file. The agent finished the job, but through a path nobody had intended to authorize.

Strip secrets first. Run a history scan, because removing a secret from HEAD it doesn’t remove it from git log. Rotate anything a scan finds, even if it’s been “removed” for a year.

The segregated machine matters for the same reason. The agent can inherit the filesystem and network access the host has, including SSH keys, browser sessions, and 1Password agents. A clean virtual machine or a dedicated laptop with no production access keeps the blast radius small.

The August 2025 GitHub Copilot RCE (CVE-2025-53773) showed how an attacker can flip an AI agent’s auto-approve flag through a poisoned source-code comment, turn off all user confirmations (“YOLO mode”), and self-replicate the injection through normal refactoring, producing wormable remote command execution on the developer’s machine. Use a workstation you’d be comfortable wiping.

Outbound network access is the next layer. Block everything by default, then allowlist only what the scanner needs (Anthropic API endpoints, required package registries). Anthropic’s tooling sends telemetry and error reporting to third-party services by default; for HIPAA, CJIS, or CUI workloads, set the opt-out environment variables (DISABLE_TELEMETRY=1, DISABLE_ERROR_REPORTING=1, CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1). The application keeps working. On Bedrock, Vertex, or Foundry, these are out of the box.

Set Up the Scan Correctly

Once the workstation is hardened, the scan setup decisions are about getting useful output, not safe output.

Pick the model and effort tier that match the code base. Opus 4.7 with the 1M-context window handles larger code bases in a single pass and is auto-enabled on Max, Team, and Enterprise plans (Pro plans require /extra-usage opt-in). Sonnet is faster and cheaper for smaller code bases or first-pass triage. Higher effort tiers (“high” or “extra-high”) slow the scan but produce better reasoning on subtle flaws.

Run a triage scan on the first pass. Deep audits are wasted effort if you haven’t yet decided where to focus. A triage scan surfaces a candidate-finding list in 20 to 60 minutes on a small to mid-sized code base. Once you have that list, run a deep audit on the validated, interesting targets.

Define the languages, frameworks, and file paths the scan should target. Pointing the scanner at the whole repo (including node_modules, build artifacts, vendor code, and test fixtures) wastes context and produces findings on code you don’t own.

Watch the context window. Around 33 to 50 percent context usage, output quality degrades. Have the agent produce a structured handover document summarizing findings so far, run/clear, then resume. Avoid /compact. For security work, the lossy summarization can drop a real finding. Anthropic’s own “task complete” notification is unreliable. Watch the token counter, not the clock.

Validate Findings Before They Become Tickets

The biggest practical risk of AI static code analysis isn’t that it misses things. It’s that it produces too many things, most of which are wrong.

As of mid-2025, before Anthropic’s adversarial verification layer shipped in Claude Security, independent benchmarks showed AI scanners had high false-positive rates. Semgrep tested Claude Code on 11 real-world Python apps in September 2025 and reported a 14 percent true-positive rate against an 86 percent false-positive rate. On IDOR, the class Claude Code was strongest at, 88 percent of findings were still false positives.

The 2026 RealVuln benchmark (26 vulnerable Python repos, 796 hand-labeled cases) put Claude Sonnet 4.6 in the lead among general-purpose LLM scanners with precision near 0.78, well ahead of older Semgrep numbers but still trailing security-specialized tools on recall-weighted scores. The takeaway is the same either way: every finding needs human validation before it becomes a Jira ticket.

The validation workflow isn’t optional. In one engagement, we ran a triage scan against a small Java application; in roughly 20 minutes, Claude Code surfaced about 13 candidate findings. That ratio (a dozen-plus findings in twenty minutes) is fine for a triage layer. It’s not fine if you treat the output as a finished vulnerability report and hand it to a developer to “go fix the thirteen items.” Some of those findings are real. Most is noise. Without a validation step, you bury the team in tickets and erode their trust in the tool.

The validation pattern that works: a second-pass QA review on every finding. A different model, a fresh context window, or a senior developer reads each finding, looks at the cited code, and marks it valid, invalid, or “needs human review.” File only validated findings into your tracker.

Re-run the scan if the first pass returns “no issues found”; AI scanning is non-deterministic, and the same prompt against the same code can return three findings, then six, then eleven across three runs. A single clean run is one data point, not proof.

Write a brief reasoning log for each accepted or rejected finding. One or two sentences: what it is, whether it’s valid, and what was done. That log, not the raw scanner output, is the audit-trail artifact your auditor will ask for. SOC 2 CC7.1 expects vulnerability scanning and remediation of identified deficiencies on a timely basis. The Trust Services Criteria expect identified vulnerabilities to be evaluated and resolved through documented remediation activities. A 10,000-finding dashboard with no triage and no remediation log fails the control even when the scanner is sophisticated.

Operationalize Across the Team

A custom skill is how you turn one senior developer’s experience into something a junior teammate can run. The relevant question isn’t “what is a skill?” The question is “what does the skill know that a fresh prompt doesn’t?”

A Claude Code custom skill is a markdown file (with optional supporting files) that grounds the agent in a role, a process, and a set of guardrails before the actual work starts. You could type “you’re a web app cybersecurity expert, review this code base” every time. You’d get something usable.

The reason to build a skill is that the first version is rarely the final version. Each refinement (telling it you use Java with Struts 1.3 and JSP, telling it to spawn a QA sub-agent that argues with the primary agent’s findings, telling it to run a secrets sweep separately from the OWASP Top 10 pass, telling it where reports go) gets committed back to the skill. Six months in, the skill knows your stack, your frameworks, and your team’s quirks.

Source-control the skills. If three developers each tweak their own copy on their laptops, the team scans against three different baselines, and improvements never get shared. Check Claude Code custom skills into the repo (or a shared skills repo), branch them, and review changes the way you review code. Anthropic’s Team and Enterprise plans support shared skill libraries; for Pro plans, a shared git repo with PR review on changes prevents drift.

Set a scanning cadence appropriate to risk tolerance. Common patterns: every PR (via the GitHub Action on internal PRs only), weekly on the main branch, or monthly for less active code bases. Quarterly is the floor for PCI-scoped code (Req 11.3 mandates quarterly internal authenticated scans). Train the team on what “no issues found” means: this scan, with this model, on this version of the code, didn’t surface anything in this run. It doesn’t mean the code is safe.

For proprietary code, use commercial plans (Team, Enterprise, API, Bedrock, Vertex, or Foundry). Per Anthropic’s data usage docs, commercial plans don’t train on customer code by default. Consumer plans (Free, Pro, Max) shifted to opt-in for training in late 2025, but the toggle exists and was widely confusing. For HIPAA-regulated code, confirm BAA and Zero Data Retention eligibility with Anthropic enterprise sales before scanning; Claude Code CLI is only BAA-eligible for qualified accounts with ZDR enabled.

Where AI Scanning Fits and Where Pen Tests Still Belong

AI scanning is a vulnerability-management control. It isn’t a penetration test. The distinction is load-bearing for compliance.

PCI DSS 4.0 Requirement 11.4 requires manual-led penetration testing at least annually and after significant changes, performed by qualified, organizationally independent testers. The pending HIPAA Security Rule NPRM (published December 27, 2024, finalization targeted May 2026) would explicitly require pen testing every 12 months. SOC 2 auditors typically expect deeper assurance than automated scanning.

SMB external web-app pen tests run $10,000 to $50,000 (Adelia Risk practitioner experience, 2026). Anyone selling a “pen test” under that range is selling automated scanning with a fancier invoice.

AI scanning maps cleanly to the vulnerability-management side of the same frameworks. PCI DSS 4.0 Req 11.3 (quarterly internal authenticated scans) is satisfiable with a documented AI scanning workflow. SOC 2 CC7.1 (vulnerability identification) and NIST SP 800-171 control 3.11.2 (CMMC L2 vulnerability scanning) work the same way. The audit artifact is the loop, not the dashboard: when the scan ran, what was scanned, what model and effort tier, who reviewed the findings, what was filed, what was remediated, and when.

A common question on internal-only applications: what about a nefarious employee? An attacker who already has access doesn’t need to learn cross-site scripting; they can just take the data through the application. Insider risk is real, but the controls that address it are access management, logging, and monitoring, not whether the internal app’s HTML is escaped correctly. AI scanning addresses external-attacker risk and code-quality risk. It doesn’t address insider risk.

Bottom Line

An AI vulnerability scanner like Claude Code, Codex, or Anthropic’s hosted Claude Security gives a small dev team a fast, repeatable read on likely security weaknesses in their own code. With a segregated machine, secrets stripped, telemetry handled, and a validation workflow in place, it’ll surface real bugs in twenty minutes and turn institutional knowledge into a skill any teammate can run. The careless version (developer workstation, secrets still in the repo, single-pass scan filed straight to Jira) is the kind of shortcut that fails an auditor and burns out the dev team.

Adelia Risk helps small and mid-sized regulated businesses harden the workstations, write the skills, build the pen test cadence around them, and decide where AI scanning ends and a human pen tester begins. If you want a second set of eyes on the way your team is rolling out an AI vulnerability scanner, our Virtual CISO service is built for exactly that work.

Table of Contents

Picture of Josh Ablett

Josh Ablett

Josh Ablett, CISSP, has been meeting regulations and stopping hackers for 20 years. He has rolled out cybersecurity programs that have successfully passed rigorous audits by the SEC, the FDIC, the OCC, HHS, and scores of customer auditors. He has also built programs that comply with a wide range of privacy and security regulations such as CMMC, HIPAA, GLBA, SEC/FINRA, and state privacy laws. He has worked with companies ranging from 5 people to 55,000 people.

Share

Related Posts

Adelia Risk has been hearing some version of this conversation in client offices almost every week.

Do you think we might be a good match?

Healthcare Cybersecurity Services​ Page