Every codebase has vulnerabilities, and it is a mathematical certainty. The average commercial software application contains between 20 and 30 open-source dependencies, each carrying its own history of CVEs, misconfigurations, and undiscovered flaws. Multiply that across the dozens of services, libraries, and infrastructure components in a typical enterprise environment, and the attack surface becomes enormous. The question is not whether vulnerabilities exist. The question is whether you are finding them before someone else does.
At NextLink Labs, we have integrated Anthropic's Claude into our security assessment workflows to do exactly that. This is not a theoretical exercise or a demo we built to impress at a conference. We can evaluate client codebases, identify risks in their software supply chains, and surface the vulnerabilities that traditional scanning tools miss. I want to walk through what that looks like in practice, why it works, and where we see this going.
To be clear, we are not arguing that traditional security tools are useless. SAST tools, DAST scanners, SCA platforms are foundational security tools. We use these tools, and our clients use them. They catch a meaningful percentage of known vulnerabilities, and they are a necessary part of any security posture.
Yet, existing tools have well-documented blind spots. Static analysis tools are pattern-matchers at their core. They are excellent at flagging known vulnerability signatures like SQL injection patterns, buffer overflows, and hardcoded credentials. Where they struggle is with business logic vulnerabilities, context-dependent security flaws, and the kinds of subtle misconfigurations that only become dangerous in combination with other design decisions. Engineering teams routinely face hundreds or thousands of findings, many of which are false positives, and the triage burden alone can consume entire sprints.
Software Composition Analysis tools are better at catching known CVEs in dependencies, but they are fundamentally reactive. They check package versions against a database of disclosed vulnerabilities. If a vulnerability has not been disclosed and cataloged, SCA will not find it. In addition, even when it does flag a known CVE, the tool typically cannot tell you whether that vulnerability is actually reachable in the context of your application. Is the vulnerable function called? Is the input path exposed? Does the configuration make it exploitable in your specific environment?
That gap between "this CVE exists in a dependency you use" and "this CVE is actually exploitable in your application" is where a significant amount of real security risk lives. It is also where Claude adds the most value.
Our approach is not to replace existing security tooling. It is to add an analytical layer on top of it that can reason about code the way a senior security engineer does. This reasoning is all about context with an understanding of how different components interact.
Here is how we structure it across the engagements we run for clients:
When we assess a client's application, one of the first things we do is feed the full dependency manifest (package.json, requirements.txt, go.mod, Cargo.toml, whatever the ecosystem) into Claude along with the lock file and the actual usage patterns in the codebase. We are not just asking "which of these packages have known CVEs." We are asking Claude to reason about the dependency graph: which packages are direct versus transitive, which vulnerable functions are imported and called, what the data flow looks like from user input through the vulnerable code path, and whether the application's configuration makes a given vulnerability exploitable or not.
This is a fundamentally different question than what SCA tools answer. A traditional scanner might flag a critical CVE in a library you depend on, but Claude can tell you that the vulnerable function in that library is never called by your code, that the input vector required to trigger it is not exposed in your API surface, and that the severity in your context is effectively low. Conversely, it can trace a path from a user-facing endpoint through three layers of abstraction into a transitive dependency and demonstrate that the vulnerability is not just present but reachable. That distinction changes how engineering teams prioritize remediation.
Beyond dependencies, we use Claude to review application code directly. This includes first-party source code, infrastructure-as-code configurations (Terraform, CloudFormation, Kubernetes manifests), CI/CD pipeline definitions, and API contracts. We structure these reviews around specific threat categories like OWASP Top 10, CWE classifications, and cloud-specific misconfigurations. Yet, the real value is that Claude can identify vulnerabilities that do not map neatly to any known pattern.
A significant percentage of real-world breaches originate not from code vulnerabilities but from misconfigurations. Overly permissive IAM policies, publicly exposed storage buckets, missing encryption at rest, and security groups with unnecessary open ports are just a few misconfigurations. We feed infrastructure-as-code files into Claude and ask it to evaluate them against both established benchmarks (CIS, NIST) and practical security principles.
What makes this more useful than a checklist tool is that Claude can evaluate combinations of configurations. A single IAM policy might look acceptable in isolation. Yet, when Claude sees that the same role also has access to an S3 bucket containing PII, that the bucket's logging is disabled, and that the role is assumable by a broadly-scoped Lambda function triggered by a public API endpoint, it can connect those dots and surface a compound risk that no individual finding would have flagged.
I want to get specific about how we operationalize this, because the difference between "we ask Claude to look at code" and a reliable, repeatable security analysis process is significant.
The single most important factor in the quality of Claude's analysis is the quality of the context we provide. We do not dump an entire codebase into a prompt and ask "find vulnerabilities" — that approach produces shallow, generic output. Context quality determines output quality.
Instead, we structure our analysis in layers. We start with architecture-level context: what the application does, how services communicate, what the trust boundaries are, where sensitive data flows. After architecture, we move into targeted code review, feeding Claude specific modules along with the context it needs to understand how those modules fit into the larger system. We also include relevant configuration files, API schemas, authentication flows, and deployment descriptors alongside the code under review.
We rarely get the deepest findings on the first pass. Our workflow is deliberately iterative. An initial review surfaces high-level observations and areas of concern. We then go deeper on those areas, asking Claude to trace specific data flows, evaluate edge cases, consider attack scenarios, and assess the exploitability of identified weaknesses. This mirrors how a human penetration tester works. Broad reconnaissance first, then targeted investigation of the most promising findings.
In practice, this means a single engagement might involve dozens of structured interactions with Claude, each building on the findings of the previous pass. The cumulative result is an analysis with a depth and breadth that would take a human team significantly longer to produce.
Every finding Claude surfaces goes through human validation before it reaches a client. This is non-negotiable. Our security engineers review each finding, verify its accuracy, assess its real-world exploitability, and contextualize it for the client's specific environment and risk tolerance. Claude is not the auditor. Claude is the force multiplier that allows our auditors to cover more ground, go deeper, and miss less.
The practical outcome for the organizations we work with is straightforward: they get a more thorough security assessment, faster, with findings that are more actionable than what traditional tooling alone would produce.
I want to be direct about the boundaries, because overpromising on AI capabilities does a disservice to everyone.
Claude is not a replacement for a security team. It does not perform dynamic testing. It cannot verify runtime behavior. It does not replace penetration testing where you need someone probing a live system. And like any AI system, it can produce false positives of its own or miss things that a human expert with deep domain-specific knowledge would catch.
What it is — Claude is a powerful augmentation of human expertise. It allows our team to be more thorough, more consistent, and faster than we could be without it. It raises the floor on every assessment we deliver and frees our senior engineers to focus their attention on the most complex, nuanced findings rather than spending cycles on the kind of systematic review that AI handles well.
We are actively investing in deepening this capability. The models are getting better at reasoning about complex systems. Context windows are expanding, which allows us to include more of a codebase in a single analysis pass. We are building internal tooling that integrates Claude more tightly into our assessment pipeline: automated dependency graph construction, code flow analysis, and structured finding generation that feeds directly into our reporting workflows.
The organizations that will be best positioned over the next several years are the ones that recognize a fundamental shift is happening in how software security works. The volume and complexity of software systems is growing faster than security teams can scale. The attack surface is expanding. The dependency chains are getting deeper. AI is not a nice-to-have in this environment. It is becoming the only practical way to maintain comprehensive visibility into where your vulnerabilities truly are.
The question is not whether AI will play a role in software security. It is whether you will be using it proactively to find your vulnerabilities, or learning about them reactively when someone else does.
If you are responsible for a software portfolio and you want to understand where your real risks are — not just the risks your current tooling can see — that is exactly the kind of problem we solve at NextLink Labs. Please reach out, and we would welcome the conversation.