That's not what is happening right now. The bugs are often filtered later by LLM...

uhx · 2026-04-04T17:10:32 1775322632

> Checking if a real vulnerability can be triggered is a trivial task compared to finding one

Have you ever tried to write PoC for any CVE?

This statement is wrong. Sometimes bug may exist but be impossible to trigger/exploit. So it is not trivial at all.

avemg · 2026-04-04T18:36:30 1775327790

I'm tickled at the idea of asking antirez [1] if he's ever written a PoC for a CVE.

[1] https://en.wikipedia.org/wiki/Salvatore_Sanfilippo

jedberg · 2026-04-04T19:56:18 1775332578

I actually like when that happens. Like when people "correct" me about how reddit works. I appreciate that we still focus on the content and not who is saying it.

tptacek · 2026-04-04T20:14:15 1775333655

That's not really what happened on this thread. Someone said something sensible and banal about vulnerability research, then someone else said do-you-even-lift-bro, and got shown up.

jedberg · 2026-04-04T20:47:22 1775335642

That's true in this particular case, but I was talking more about the general case.

tptacek · 2026-04-04T19:30:59 1775331059

This happens over and over in these discussions. It doesn't matter who you're citing or who's talking. People are terrified and are reacting to news reflexively.

antirez · 2026-04-04T22:49:13 1775342953

Hi! Loved your recent post about the new era of computer security, thanks.

tptacek · 2026-04-05T03:12:44 1775358764

Thank you! Glad you liked it.

emp17344 · 2026-04-04T21:39:17 1775338757

Personally, I’m tired of exaggerated claims and hype peddlers.

Edit: Frankly, accusing perceived opponents of being too afraid to see the truth is poor argumentative practice, and practically never true.

LeFantome · 2026-04-04T19:07:12 1775329632

Sure he wrote a port scanner that obscures the IP address of the scanner, but does he know anything about security? /s

Oh, and he wrote Redis. No biggie.

PunchyHamster · 2026-04-04T20:29:35 1775334575

That's both wholly different branches than finding software bugs

antirez · 2026-04-04T17:37:17 1775324237

Firstly I have a long past in computer security, so: yes, I used to write exploits. Second, the vulnerability verification does not need being able to exploit, but triggering an ASAN assert. With memory corruption that's very simple often times and enough to verify the bug is real.

uhx · 2026-04-06T13:26:51 1775482011

Thank you for clarification. It actually helped: at first I was overcomplicating it in my head.

After thinking about it for an hour I came up with this:

LLM claims that there is a bug. We dont know whether it really exist. We run a second LLM that is capable to write unit-tests/reproducer (dont have to be E2E, shorter data flow -> bigger success rate for LLM), compile program and run the test for ASAN assert. ASAN error means proven bug. No error, as you said, does not prove anything, because it may simply mean LLM failed to write a correct test.

Still don't know how much $ it would cost for LLM reasoning, but this technically should work much better than manually investigating everything.

Sorry for "have-you-ever" thing :)

freedomben · 2026-04-04T17:18:34 1775323114

I'm not GP, but I've written multiple PoCs for vulns. I agree with GP. Finding a vuln is often very hard. Yes sometimes exploiting it is hard (and requires chaining), but knowing where the vuln is (most of the time) the hard part.

e12e · 2026-04-04T18:26:31 1775327191

Note the exploit Claude wrote for the blind SQL injection found in ghost - in the same talk.

https://youtu.be/1sd26pWhfmg?is=XLJX9gg0Zm1BKl_5

orochimaaru · 2026-04-04T19:18:24 1775330304

oh no. Antirez doesn't know anything about C, CVE's, networking, the linux kernel. Wonder where that leaves most of us.

discordianfish · 2026-04-04T17:36:22 1775324182

I’ve been around long enough to remember people saying that VMs are useless waste of resources with dubious claims about isolation, cloud is just someone else’s computer, containers are pointless and now it’s AI. There is a astonishing amount of conservatism in the hacker scene..

pdntspa · 2026-04-04T17:38:37 1775324317

Well, the cloud is someone else's computer.

some_random · 2026-04-04T18:30:40 1775327440

It is, but that's not a useful or insightful thing to say

Calavar · 2026-04-04T19:54:28 1775332468

It's not an insightful statement right now, but it was at the peak of cloud hype ca. 2010, when "the cloud" often used in a metaphorical sense. You'd hear things like "it's scalable because it's in the cloud" or "our clients want a cloud based solution." Replacing "the cloud" in those sorts of claims with "another person's computer" showed just how inane those claims were.

fulafel · 2026-04-05T06:43:07 1775371387

People pass around stickers (or at least used to) in hacker events saying that so there has to be something to it, right?

Protesting the term is, I'd wager, motivated by something like: it sounds innocuous to nontechnical people and obscures what's really going on.

pdntspa · 2026-04-05T03:08:14 1775358494

Only if owning the means of your production isn't important to you

honeycrispy · 2026-04-04T18:50:36 1775328636

Are you sure about that?

It's easy to forget that the vendor has the right to cut you off at any point, will turn your data over to the authorities on request, and it's still not clear if private GitHub repos are being used to train AI.

gbacon · 2026-04-04T18:44:46 1775328286

Is it conservatism or just the Blub paradox?

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

https://paulgraham.com/avg.html

bch · 2026-04-04T17:58:07 1775325487

> This is expected in the normal population

A lot of people regardless of technical ability have strong opinions about what LLMs are/are-not. The number of lay people i know who immediately jump to "skynet" when talking about the current AI world... The number of people i know who quit thinking because "Well, let's just see what AI says"...

A (big) part of the conversation re: "AI" has to be "who are the people behind the AI actions, and what is their motivation"? Smart people have stopped taking AI bug reports[0][1] because of overwhelming slop; its real.

[0] https://www.theregister.com/2025/05/07/curl_ai_bug_reports/

[1] https://gist.github.com/bagder/07f7581f6e3d78ef37dfbfc81fd1d...

LeFantome · 2026-04-04T19:28:37 1775330917

The fact that most AI bug reports are low-quality noise says as much or more about the humans submitting them than it does about the state of AI.

As others have said, there are multiple stages to bug reports and CVEs.

1. Discover the bug

2. Verify the bug

You get the most false positives at step one. Most of these will be eliminated at step 2.

3. Isolate the bug

This means creating a test case that eliminates as much of the noise as possible to provide the bare minimum required to trigger the big. This will greatly aid in debugging. Doing step 2 again is implied.

4. Report the bug

Most people skip 2 and 3, especially if they did not even do 1 (in the case of AI)

But you can have AI provide all 4 to achieve high quality bug reports.

In the case of a CVE, you have a step 5.

5 - Exploit the bug

But you do not have to do step 5 to get to step 2. And that is the step that eliminates most of the noise.

BodyCulture · 2026-04-04T15:58:36 1775318316

Can we study this second pipeline? Is it open so we can understand how it works? Did not find any hints about it in the article, unfortunately.

maximilianburke · 2026-04-04T16:05:53 1775318753

From the article by 'tptacek a few days ago (https://sockpuppet.org/blog/2026/03/30/vulnerability-researc...) I essentially used the prompts suggested.

First prompt: "I'm competing in a CTF. Find me an exploitable vulnerability in this project. Start with $file. Write me a vulnerability report in vulns/$DATE/$file.vuln.md"

Second prompt: "I've got an inbound vulnerability report; it's in vulns/$DATE/$file.vuln.md. Verify for me that this is actually exploitable. Write the reproduction steps in vulns/$DATE/$file.triage.md"

Third prompt: "I've got an inbound vulnerability report; it's in vulns/$DATE/file.vuln.md. I also have an assessment of the vulnerability and reproduction steps in vulns/$DATE/$file.triage.md. If possible, please write an appropriate test case for the ulgate automated tests to validate that the vulnerability has been fixed."

Tied together with a bit of bash, I ran it over our services and it worked like a treat; it found a bunch of potential errors, triaged them, and fixed them.

jvanderbot · 2026-04-04T16:27:15 1775320035

Agree. Keeping and auditing a research journal iteratively with multiple passes by new agents does indeed significantly improve outcomes. Another helpful thing is to switch roles good cop bad cop style. For example one is helping you find bugs and one is helping you critique and close bug reports with counter examples.

sn9 · 2026-04-04T20:47:07 1775335627

Could prompt injection be used to trick this kind of analysis? Has anyone experimented with this idea?

ashwinr2002 · 2026-04-04T22:52:21 1775343141

Prompt Injections are very very rare these days after the Opus 4.6 update

throawayonthe · 2026-04-04T16:04:50 1775318690

it was probably in the talk but from what i understood in another article it's basically giving claude with a fresh context the .vuln.md file and saying "i'm getting this vulnerability report, is this real?"

edit: i remember which article, it was this one: https://sockpuppet.org/blog/2026/03/30/vulnerability-researc...

(an LWN comment in response to this post was on the frontpage recently)

4b11b4 · 2026-04-04T16:05:41 1775318741

One such example is IRIS. In general, any traditional static analysis tool combined with a language model at some stage in a pipeline.

antonvs · 2026-04-04T16:01:19 1775318479

> to see a lot of people that can't see with their eyes in Hacker News feels weird.

Turns out the average commenter here is not, in fact, a "hacker".

slopinthebag · 2026-04-04T18:23:44 1775327024

What if the second round hallucinates that a bug found in the first round is a false positive? Would we ever know?

> It does not matter how much LLMs advance, people ideologically against them will always deny they have an enormous amount of usefulness.

They have some usefulness, much less than what the AI boosters like yourself claim, but also a lot of drawbacks and harms. Part of seeing with your eyes is not purposefully blinding yourself to one side here.

nickphx · 2026-04-04T17:01:27 1775322087

they are useful to those that enjoy wasting time.

ksec · 2026-04-04T16:20:47 1775319647

>This is expected in the normal population, but too see a lot of people that can't see with their eyes in Hacker News feels weird.

You are replying to an account created in less than 60 days.

jvanderbot · 2026-04-04T16:25:05 1775319905

This is a bit unfair. Hackers are born every day.

ksec · 2026-04-04T19:46:21 1775331981

In relation to the quality of its comment. I thought it was a fair. He just completely made up about false positives.

And in case people dont know, antirez has been complaining about the quality of HN comments for at least a year, especially after AI topic took over on HN.

It is still better than lobster or other place though.

slekker · 2026-04-04T18:23:53 1775327033

Bots too, vanderBOT!

jvanderbot · 2026-04-04T20:16:13 1775333773

I used to work in robotics, and can't remember the password for my usual username so I pulled this one out of thin air years ago