Hacker Newsnew | past | comments | ask | show | jobs | submit | bestcommentslogin
Most-upvoted comments of the last 48 hours. You can change the number of hours like this: bestcomments?h=24.

Someone said - in Linux, everything is a file. In Microsoft, everything is a copilot. Lol.

Refreshing to see an honest and balanced take on AI coding. This is what real AI-assisted coding looks like once you get past the initial wow factor of having the AI write code that executes and does what you asked.

This experience is familiar to every serious software engineer who has used AI code gen and then reviewed the output:

> But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti14. I didn’t understand large parts of the Python source extraction pipeline, functions were scattered in random files without a clear shape, and a few files had grown to several thousand lines. It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision,

Some people never get to the part where they review the code. They go straight to their LinkedIn or blog and start writing (or having ChatGPT write) posts about how manual coding is dead and they’re done writing code by hand forever.

Some people review the code and declare it unusable garbage, then also go to their social media and post how AI coding is completely useless and they’re not going to use it for anything.

This blog post shows the journey that anyone not in one of those two vocal minorities is going through right now: A realization that AI coding tools can be a large accelerator but you need to learn how to use them correctly in your workflow and you need to remain involved in the code. It’s not as clickbaity as the extreme takes that get posted all the time. It’s a little disappointing to read the part where they said hard work was still required. It is a realistic and balanced take on the state of AI coding, though.


I'm getting the impression that a lot of people in this thread think this is because they violated an open-source license and saying things to the effect of, "they're just the ones who got caught". I also thought that was the scandal initially. (And when it comes to license violations, yes, there's absolutely more where that came from.)

But that's just the cherry on top. I don't think they're being thrown out because they violated a license. There are really serious fraud allegations. Allegedly they were rubber-stamping noncompliant customers, leaving them exposed to potential criminal liability under regulations like HIPPA.

https://deepdelver.substack.com/p/delve-fake-compliance-as-a...

I've only skimmed this so I do not endorse these allegations, but I think it's context missing from this discussion.


This book was SO GOOD.

It's bleak. I always imagined that rich/powerful people only created suffering if that suffering was required for certain goals. It's easier for me to bear injustice when it's a zero-sum game. But the story of Facebook is not that. Facebook didn't make ethical sacrifices for profit -- its executives just didn't care to understand the consequences of their actions. I wish those folks could feel how much harm they've caused.


> Schwartz's experiment is the most revealing, and not for the reason he thinks. What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days. It looked professional. The equations seemed right. The plots matched expectations. Then Schwartz read it, and it was wrong. Claude had been adjusting parameters to make plots match instead of finding actual errors. It faked results. It invented coefficients. [...] Schwartz caught all of this because he's been doing theoretical physics for decades. He knew what the answer should look like. He knew which cross-checks to demand. [...] If Schwartz had been Bob instead of Schwartz, the paper would have been wrong, and neither of them would have known.

And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.

Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob, even though Bob may seem to be faster.

The article gestures at this but I don't think it comes down hard enough. It doesn't seem practical. But we have to find a way, or we're all going to be in deep trouble when the next generation doesn't know how to evaluate what the LLMs produce!

---

† "Useful" in this context means "helps you produce good science that benefits humanity".


Having listened to the book on Audible, I'm both shocked at the behavior of the executive team, and not surprised all at the same time. What bothers me about all of this is what it says about us. It says we're willing to give rich and powerful people a pass just because they make overtures towards something we care about.

We wouldn't give our children a pass like this, nor would we teach our children to act this way, but we're perfectly willing to allow fully grown adults to act like this.

Here's just one example, there are plenty more:

Cheryl Sandberg inviting the author of the book to sleep in her bed next to her on the company jet, and the petulent and vindictive behavior when the author said 'no'.

Everyone in the orbit of the executive team knew about this behavior, and everyone gave it a pass, even going so far as to defend it and to protect Cheryl. This behavior should be universally deplored, and yet is not.


This isn't surprising. What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.

Really fascinating how this works; it's basically context-aware decoding. From the paper:

> Code interleaves fork positions, where several continuations are genuinely plausible and may correspond to different solution approaches, with lock positions, where syntax and semantics leave little ambiguity but a low-probability distractor tail still remains… The best global decoding setting is therefore necessarily a compromise; we call this tension the precision-exploration conflict.

In other words, just like us, the model needs to shift from "exploration" in "fork" mode (divergent thinking to produce a creative solution) to "precision" in "lock" mode (producing syntactically correct code).

What this paper shows is that their simple technique (SSD) can improve the ranking of optimal tokens in both lock and fork positions, meaning the model is more likely to explore when it should be exploring, and more likely to be precise when it needs to be.

I love that we're still learning the emergent properties of LLMs!


“They were careless people, Tom and Daisy- they smashed up things and creatures and then retreated back into their money or their vast carelessness or whatever it was that kept them together, and let other people clean up the mess they had made.” ― F. Scott Fitzgerald, The Great Gatsby

I suspect people are misdiagnosing the root cause of why Anthropic is doing this a bit.

I don't think this is particularly about the financial impact of people using OpenClaw - they can adjust the amount of tokens in a subscription quite easily.

I think the root cause is that Anthropic is capacity constrained so is having to make choices about the customers they want to serve and have chosen people who use Claude Code above other segments.

We know Anthropic weren't as aggressive as OpenAI through 2025 in signing huge capacity deals with the hyperscalers and instead signed smaller deals with more neo-clouds, and we know some of the neo-clouds have had trouble delivering capacity as quickly as they promised.

We also know Claude Code usage is growing very fast - almost certainly faster since December 2025 than Anthropic predicted 12 months ago when they were doing 12-month capacity planning.

We know Anthropic has suffered from brown-outs in Claude availability.

Put this all together and a reasonable hypothesis is that Anthropic is choosing which customers to service rather than raising prices.


Copilot is just Microsoft's term for AI. How many products have Copilot? Just about all of them.

America was in practice running an empire that collected tribute from the rest of planet earth in exchange for entries in a database denominated in a currency they controlled and that was accepted everywhere. Really the only way it could go wrong is putting it under the control of someone who doesn't understand the kayfabe...

In case someone is missing context, this is Google (apparently together with Meta, Microsoft, and Snap) coming out in favour of Chat Control legislation. This is something EU citizens have so far fought tooth and nail to repel. The fact that these US companies known for spying on people and invading privacy in the name of profit are lobbying for the legislation should be a warning to us all to avoid their services.

YC has no problem with morally questionable behavior, many YC startups do things that are just as shady. YC is, ultimately, not responsible for what these startups choose to do. Delve’s problem is that they betrayed so many other YC companies in the process. An important value of being in YC is access to a ready-made customer base. The licensing issue is nothing compared to their fake audits but it is an affront to the YC community, hence, kicked from the community.

I’m sure if Delve has only engaged in fraudulent audits or had only resold another YC company’s product, they would have been allowed to stay, the problem is all of that combined pissed off enough other YC companies.


Author here. A few people are arguing against a stronger claim than the repo is meant to make. As well, this was very much intended to be a joke and not research level commentary.

This skill is not intended to reduce hidden reasoning / thinking tokens. Anthropic’s own docs suggest more thinking budget can improve performance, so I would not claim otherwise.

What it targets is the visible completion: less preamble, less filler, less polished-but-nonessential text. Therefore, since post-completion output is “cavemanned” the code hasn’t been affected by the skill at all :)

Also surprising to hear so little faith in RL. Quite sure that the models from Anthropic have been so heavily tuned to be coding agents that you cannot “force” a model to degrade immensely.

The fair criticism is that my “~75%” README number is from preliminary testing, not a rigorous benchmark. That should be phrased more carefully, and I’m working on a proper eval now.

Also yes, skills are not free: Anthropic notes they consume context when loaded, even if only skill metadata is preloaded initially.

So the real eval is end-to-end: - total input tokens - total output tokens - latency - quality/task success

There is actual research suggesting concise prompting can reduce response length substantially without always wrecking quality, though it is task-dependent and can hurt in some domains. (https://arxiv.org/html/2401.05618v3)

So my current position is: interesting idea, narrower claim than some people think, needs benchmarks, and the README should be more precise until those exist.


It's pretty depressing that on a corner of the internet that's supposed to be a gathering of tech/geeks/nerds/stem people, discussing topics that "good hackers would find interesting", it's seemingly impossible to have a single thread about something like this that isn't almost entirely negative or political bickering.

The thing is, agents aren’t going away. So if Bob can do things with agents, he can do things.

I mourn the loss of working on intellectually stimulating programming problems, but that’s a part of my job that’s fading. I need to decide if the remaining work - understanding requirements, managing teams, what have you - is still enjoyable enough to continue.

To be honest, I’m looking at leaving software because the job has turned into a different sort of thing than what I signed up for.

So I think this article is partly right, Bob is not learning those skills which we used to require. But I think the market is going to stop valuing those skills, so it’s not really a _problem_, except for Bob’s own intellectual loss.

I don’t like it, but I’m trying to face up to it.


What if you „lose“ your google / apple account, like this sanctioned judge of the international criminal court? Crazy to imagine that we are still baking in dependency on US providers in european societies, even though there is clear indications we should be doing the opposite?

It reminds me of around 2002 when Microsoft named everything ".net".

> YC is, ultimately, not responsible for what these startups choose to do.

Of course they're responsible for their investments; they're just not liable. YC has a lot to answer for in the damage it's wreaked over the years.


I think Google has done some cool stuff, and I think in a lot of ways they're, at least historically, one of the less evil big tech players.

I gotta say, though, that my experience with trying to get them to sort out any kind of issue with their services makes me reluctant to spend any money with them.

I bought a Pixel phone. As per the sales terms, the phone came with one year of Gemini AI Pro service. Except, the redemption process to get the year of service didn't work for me. I contacted Google, they never fixed it or offered any solution. I simply didn't get the year of service I was promised.

My friend, who bought a Pixel around the same time, also wasn't able to get the year of Gemini they were promised.

That same friend has a Google One subscription, billed through their phone carrier. Recently, Google (or the provider?) discontinued that specific Google One plan, as well as the option to bill via your carrier. This was all covered in an email sent to my friend. As consolation, the email explained, my friend was given the option to switch to a different plan, billed monthly by Google (instead of their phone carrier), with 6 months free. Except, the new plan, and the 6 months free, wasn't selectable as a plan type for their account. So my friend emails Google about it and, to my complete lack of surprise, Google was unwilling/unable to provide any resolution.

At this point, I legitimately don't understand why, unless I had no other option, I would pick Google for services. They clearly put no real effort into resolving any service issues for any customer that's not spending millions with them.


In Finland we have old saying: "If liquor, tar and sauna won’t help, an illness is fatal"

Its worth reading this follow-up LKML post by Andres Freund (who works on Postgres): https://lore.kernel.org/lkml/yr3inlzesdb45n6i6lpbimwr7b25kqk...

That's not what is happening right now. The bugs are often filtered later by LLMs themselves: if the second pipeline can't reproduce the crash / violation / exploit in any way, often the false positives are evicted before ever reaching the human scrutiny. Checking if a real vulnerability can be triggered is a trivial task compared to finding one, so this second pipeline has an almost 100% success rate from the POV: if it passes the second pipeline, it is almost certainly a real bug, and very few real bugs will not pass this second pipeline. It does not matter how much LLMs advance, people ideologically against them will always deny they have an enormous amount of usefulness. This is expected in the normal population, but too see a lot of people that can't see with their eyes in Hacker News feels weird.

German implementer here. We have to use some kind of attestation mechanism per the eIDAS implementing acts. That doesn't work without operating system support.

The initial limitation to Google/Android is not great, we know that, and we have support for other OSs on our list (like, e.g., GrapheneOS). It is simply a matter of where we focus our energy at the moment, not that we don't see the issues.


German citizen here. So why is an implementation going forward when you already know it will not serve all citizens? Why are we not refusing to implement this until we know we can make it work on all devices?

Personally I recently switched from an AOSP based android without Google Play to Ubuntu Touch. In the future with better hardware support I will probably switch to postmarketOS.


I attestation should be abolished altogether. An app should have absolutely no way of knowing what kind of device it’s running on or what changes the user has made to the system. It is up to each individual to ensure the security of their own device. App developers should do no more than offer recommendations. If someone wants to use GrapheneOS, root their device (not recommended), or run the whole thing in an emulator, a homemade compatibility layer under Linux, or a custom port for MS-DOS, that should be possible.

Requiring people to use products from one of two private American companies with a bad track record of locking people out of their accounts is more than “not great”. Some things are better not done if they can’t be done well.

remember when gambling was illegal?

and the idea of advertising gambling on television wasn't even something conceivable?

and, even more so, the idea that sports entertainment channels would be directly involved in the operation of gambling of was just completely beyond comprehension?

ahhh, the remote, halcyon, bygone days of 2018...


I'm going to place an order for the book right now. I encourage you all to do the same.

We the people hold the power to keep in check the immoral companies, governments, and other unscrupulous entities that would exploit the collective to enrich the few. And ultimately that's through our money and how we spend it.

Screw Meta and their anti-human business model.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: