> Another common argument I've heard is that Generative AI is helpful when you need to write code in a language or technology you are not familiar with. To me this also makes little sense.
I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.
Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.
plasticeagle 10 hours ago [-]
ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.
What do you think will happen when everyone is using the AI tools to answer their questions? We'll be back in the world of Encyclopedias, in which central authorities spent large amounts of money manually collecting information and publishing it. And then they spent a good amount of time finding ways to sell that information to us, which was only fair because they spent all that time collating it. The internet pretty much destroyed that business model, and in some sense the AI "revolution" is trying to bring it back.
Also, he's specifically talking about having a coding tool write the code for you, he's not talking about using an AI tool to answer a question, so that you can go ahead and write the code yourself. These are different things, and he is treating them differently.
socalgal2 9 hours ago [-]
> ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.
I know this isn't true because I work on an API that has no answers on stackoverflow (too new), nor does it have answers anywhere else. Yet, the AI seems to able to accurately answer many questions about it. To be honest I've been somewhat shocked at this.
gwhr 5 hours ago [-]
What kind of API is it? Curious if it's a common problem that the AI was able to solve?
bbarnett 9 hours ago [-]
It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck.
That doesn't mean it knows the answer. That means it guessed or hallucinated correctly. Guessing isn't knowing.
edit: people seem to be missing my point, so let me rephrase. Of course AIs don't think, but that wasn't what I was getting at. There is a vast difference between knowing something, and guessing.
Guessing, even in humans, is just the human mind statistically and automatically weighing probabilities and suggesting what may be the answer.
This is akin to what a model might do, without any real information. Yet in both cases, there's zero validation that anything is even remotely correct. It's 100% conjecture.
It therefore doesn't know the answer, it guessed it.
When it comes to being correct about a language or API that there's zero info on, it's just pure happenstance that it got it correct. It's important to know the differences, and not say it "knows" the answer. It doesn't. It guessed.
One of the most massive issues with LLMs is we don't get a probability response back. You ask a human "Do you know how this works", and an honest and helpful human might say "No" or "No, but you should try this. It might work".
That's helpful.
Conversely a human pretending it knows and speaking with deep authority when it doesn't is a liar.
LLMs need more of this type of response, which indicates certainty or not. They're useless without this. But of course, an LLM indicating a lack of certainty, means that customers might use it less, or not trust it as much, so... profits first! Speak with certainty on all things!
demosthanos 5 hours ago [-]
This is wrong. I write toy languages and frameworks for fun. These are APIs that simply don't exist outside of my code base, and LLMs are consistently able to:
* Read the signatures of the functions.
* Use the code correctly.
* Answer questions about the behavior of the underlying API by consulting the code.
Of course they're just guessing if they go beyond what's in their context window, but don't underestimate context window!
bbarnett 5 hours ago [-]
So, you're saying you provided examples of the code and APIs and more, in the context window, and it succeeds? That sounds very much unlike the post I responded to, which claimed "no knowledge". You're also seemingly missing this:
"If you're getting answers, it has seen it elsewhere"
The context window is 'elsewhere'.
semiquaver 1 hours ago [-]
This is moving goalposts vs the original claim upthread that LLMs are just regurgitating human-authored stackoverflow answers and without those answers it would be useless.
It’s silly to say that something LLMs can reliably do is impossible and every time it happens it’s “dumb luck”.
demosthanos 5 hours ago [-]
If that's the distinction you're drawing then it's totally meaningless in the context of the question of where the information is going to come from if not Stack Overflow. We're never in a situation where we're using an open source library that has zero information about it: The code is by definition available to be put in the context window.
As they say, it sounds like you're technically correct, which is the best kind of correct. You're correct within the extremely artificial parameters that you created for yourself, but not in any real world context that matters when it comes to real people using these tools.
fnordpiglet 2 hours ago [-]
The argument is futile as the goal posts move constantly. In one moment the assertion is it’s just megacopy paste, then the next when evidence is shown that it’s able to one shot construct seemingly novel and correct answers from an api spec or grammar never seen before, the goal posts move to “it’s unable to produce results on things it’s never been trained on or in its context” - as if making up a fake language and asking it write code in it and its inability to do so without a grammar is an indication of literally anything.
To anyone who has used these tools in anger it’s remarkable given they’re only trained on large corpuses of language and feedback they’re able to produce what they do. I don’t claim they exist outside their weights, that’s absurd. But the entire point of non linear function activations with many layers and parameters is to learn highly complex non linear relationships. The fact they can be trained as much as they are with as much data as they have without overfitting or gradient explosions means the very nature of language contains immense information in its encoding and structure, and the network by definition of how it works and is trained does -not- just return what it was trained on. It’s able to curve fit complex functions that inter relate semantic concepts that are clearly not understood as we understand them, but in some ways it represents an “understanding” that’s sometimes perhaps more complex and nuanced than even we can.
Anyway the stochastic parrot euphemism misses the point that parrots are incredibly intelligent animals - which is apt since those who use that phrase are missing the point.
Workaccount2 3 hours ago [-]
>It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck.
How would you reconcile this with the fact that SOTA models are only a few TB in size? Trained on exabytes of data, yet only a few TB in the end.
Correct answers couldn't be dumb luck either, because otherwise the models would pretty much only hallucinate (the space of wrong answers is many orders of magnitude larger than the space of correct answers), similar to the early proto GPT models.
efavdb 2 hours ago [-]
Could it be that there is a lot of redundancy in the training data?
daveguy 1 hours ago [-]
> How would you reconcile this with the fact that SOTA models are only a few TB in size? Trained on exabytes of data, yet only a few TB in the end.
This is false. You are off by ~4 orders of magnitude by claiming these models are trained on exabytes of data. It is closer to 500TB of more curated data at most. Contrary to popular belief LLMs are not trained on "all of the data on the internet". I responded to another one of your posts that makes this false claim here:
You want to say this guy's experience isn't reproducible? That's one thing, but that's probably not the case unless you're assuming they're pretty stupid themselves.
You want to say that it Is reproducible, but that "that doesn't mean AI can think"? Okay, but that's not what the thread was about.
hombre_fatal 7 hours ago [-]
This doesn't seem like a useful nor accurate way of describing LLMs.
When I built my own programming language and used it to build a unique toy reactivity system and then asked the LLM "what can I improve in this file", you're essentially saying it "only" could help me because it learned how it could improve arbitrary code before in other languages and then it generalized those patterns to help me with novel code and my novel reactivity system.
"It just saw that before on Stack Overflow" is a bad trivialization of that.
It saw what on Stack Overflow? Concrete code examples that it generalized into abstract concepts it could apply to novel applications? Because that's the whole damn point.
skydhash 6 hours ago [-]
Programming languages, by their nature of being formal notation, only have a few patterns to follow, all of them listed in the grammar of that language. And then there’s only so much libraries out there. I believe there’s more unique comments and other code explanations out there than unique code patterns. Take something like MDN where there’s a full page of text for every JavaScript, html, css symbol.
PeterStuer 9 hours ago [-]
What would convince you otherwise? The reason I ask is because you sound like you have made up your mind phylosophically, not based on practical experience.
rsanheim 9 hours ago [-]
It's just Pattern matching. Most APIs, and hell, most code is not unique or special. Its all been done a thousands of times before. Thats why an LLM can be helpful on some tool you've written just for yourself and never released anywhere.
As to 'knows the answer', I'm don't even know what that means with these tools.
All I know is if it is helpful or not.
danielbln 5 hours ago [-]
Also, most problems are decomposable into simpler, certainly not novel parts. That intractable unicorn problem I hear so much about is probably composed of very pedestrian sub-problems.
CamperBob2 1 hours ago [-]
'Pattern matching' isn't just all you need, it's all there is.
jumploops 8 hours ago [-]
> It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before.
The amazing thing about LLMs is that we still don’t know how (or why) they work!
Yes, they’re magic mirrors that regurgitate the corpus of human knowledge.
But as it turns out, most human knowledge is already regurgitation (see: the patent system).
Novelty is rare, and LLMs have an incredible ability to pattern match and see issues in “novel” code, because they’ve seen those same patterns elsewhere.
Do they hallucinate? Absolutely.
Does that mean they’re useless? Or does that mean some bespoke code doesn’t provide the most obvious interface?
Having dealt with humans, the confidence problem isn’t unique to LLMs…
skydhash 6 hours ago [-]
> The amazing thing about LLMs is that we still don’t know how (or why) they work!
You may want to take a course in machine learning and read a few papers.
semiquaver 1 hours ago [-]
Parent is right. We know mechanically how LLMs are trained and used but why they work as well as they do is very much not known.
whateverbrah 3 hours ago [-]
That was sarcasm by the poster, in case you failed to notice.
js8 4 hours ago [-]
Sorry, but that's reductionism. We don't know how human brain works, and that you won't get there by studying quantum electrodynamics.
LLMs are insanely complex systems and their emergent behavior is not explained by the algorithm alone.
dboreham 4 hours ago [-]
Suspect you and the parent poster are thinking on different levels.
rainonmoon 5 hours ago [-]
> the corpus of human knowledge.
Goodness this is a dim view on the breadth of human knowledge.
jamesrcole 5 hours ago [-]
what do you object to about it? I don't see an issue with referring to "the corpus of human knowledge". "Corpus" pretty much just means the "collection of".
jazzyjackson 4 hours ago [-]
Human knowledge != Reddit/Twitter/Wikipedia
oezi 3 hours ago [-]
Conversely, what do you posit is part of human knowledge but isn't scrapable from the internet?
jazzyjackson 1 hours ago [-]
I mean, as far as a corpus goes, I suppose all text on the internet gets pretty close if most books are included, but even then you’re mostly looking at English language books that have been OCR’d.
But I look down my nose at conceptions that human knowledge is packagable as plain text, our lives, experience, and intelligence is so much more than the cognitive strings we assemble in our heads in order to reason. It’s like in that movie Contact when Jodie Foster muses that they should have sent a poet. Our empathy and curiosity and desires are not encoded in UTF8. You might say these are realms other than knowledge, but woe to the engineer who thinks they’re building anything superhuman while leaving these dimensions out, they’re left with a cold super-rationalist with no impulse to create of its own.
jamesrcole 3 hours ago [-]
Who said it was? I’m pretty sure they’re trained on a lot more than just those.
9 hours ago [-]
gejose 2 hours ago [-]
I'm sorry but this is a gross oversimplification. You can also apply this to the human brain.
"<the human brain> cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck."
semiquaver 1 hours ago [-]
> ChatGPT and Gemini literally only know the answer because they read StackOverflow
Obviously this isn’t true. You can easily verify this by inventing and documenting an API and feeding that description to an LLM and asking it how to use it. This works well. LLMs are quite good at reading technical documentation and synthesizing contextual answers from it.
Taylor_OD 2 hours ago [-]
> ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.
I mean... They also can read actual documentation. If I'm working on any api work or a language I'm not familiar with, I ask the LLM to include the source they got their answer from and use official documentation when possible.
That lowers the hallucination rate significantly and also lets me ensure said function or code actually does what the llm reports it does.
In theory, all stackoverflow answers are just regurgitated documentation, no?
sothatsit 2 hours ago [-]
> I mean... They also can read actual documentation.
This 100%. I use o3 as my primary search engine now. It is brilliant at finding relevant sources, summarising what is relevant from them, and then also providing the links to those sources so I can go read them myself. The release of o3 was a turning point for me where it felt like these models could finally go and fetch information for themselves. 4o with web search always felt inadequate, but o3 does a very good job.
> In theory, all stackoverflow answers are just regurgitated documentation, no?
This is unfair to StackOverflow. There is a lot of debugging and problem solving that has happened on that platform of undocumented bugs or behaviour.
erikerikson 3 hours ago [-]
I broadly agree that cutting new knowledge will need to continue being done and that overuse of LLMs could undermine that, yet... When was the last time you paid to read an APIs' docs? It costs money for companies to make those too.
olmo23 9 hours ago [-]
Where does the knowledge come from? People can only post to SO if they've read the code or the documentation. I don't see why LLMs couldn't do that.
nobunaga 8 hours ago [-]
ITT: People who think LLMs are AGI and can produce output that the LLM has come up with out of thin air or by doing research. Go speak with someone who is actually an expert in this field how LLMs work and why the training data is so important. Im amazed that people in the CS industry seem to talk like they know everything about a tech after using it but never even writing a line of code for an LLM. Our indsutry is doomed with people like this.
usef- 8 hours ago [-]
This isn't about being AGI or not, and it's not "out of thin air".
Modern implementations of LLMs can "do research" by performing searches (whose results are fed into the context), or in many code editors/plugins, the editor will index the project codebase/docs and feed relevant parts into the context.
My guess is they either were using the LLM from a code editor, or one of the many LLMs that do web searches automatically (ie. all of the popular ones).
They are answering non-stackoverflow questions every day, already.
nobunaga 4 hours ago [-]
Yeah, doing web searches could be called research but thats not what we are talking bout. Read the parent of the parent. Its about being able to answer questions thats not in its training data. People are talking about LLMs making scientific discoveries that humans haven't. A ridiculous take. Its not possible and with the current state of tech never will be. I know what LLMs are trained on. Thats not the topic of conversation.
semiquaver 59 minutes ago [-]
> Its about being able to answer questions thats not in its training data.
This happens all the time via RAG. The model “knows” certain things via its weights, but it can also inject much more concrete post-training data into its context window via RAG (e.g. web searches for documentation), from which it can usefully answer questions about information that may be “not in its training data”.
oezi 2 hours ago [-]
A large part of research is just about creatively re-arranging symbolic information and LLMs are great at this kind of research. For example discovering relevant protein sequences.
planb 6 hours ago [-]
I think the time has come to not mean LLMs when talking about AI. An agent with web access can do so much more and hallucinates way less than "just" the model. We should start seeing the model as a building block of an AI system.
raincole 4 hours ago [-]
> LLM has come up with out of thin air
People don't think that. Especially not the commentor you replied to. You're human-hallucinating.
People think LLM are trained on raw documents and code besides StackOverflow. Which is very likely true.
nobunaga 4 hours ago [-]
Read the parent of the parent. Its about being able to answer questions thats not in its training data. People are talking about LLMs making scientific discoveries that humans havent. A ridiculous take. Its not possible and with the current state of tech never will be. I know what LLMs are trained on. Thats not the topic of conversation.
CamperBob2 1 hours ago [-]
We'll start writing documentation for primary consumption by LLMs rather than human readers. The need for sites like SO will not vanish overnight but it will diminish drastically.
kypro 8 hours ago [-]
The idea that LLMs can only spew out text they've been trained on is a fundamental miss-understanding of how modern backprop training algorithms work. A lot of work goes into refining training algorithms to preventing overfitting of the training data.
Generalisation is something that neural nets are pretty damn good at, and given the complexity of modern LLMs the idea that they cannot generalise the fairly basic logical rules and patterns found in code such that they're able provide answers to inputs unseen in the training data is quite an extreme position.
fpoling 4 hours ago [-]
Yet the models do not (yet) reason. Try to ask them to solve a programming puzzle or exercise from an old paper book that was not scanned. They will produce total garbage.
Models work across programming languages because it turned out programming languages and API are much more similar than one could have expected.
socalgal2 11 hours ago [-]
To add, another experience I had. I was using an API I'm not that familiar with. My program was crashing. Looking at the stack trace I didn't see why. Maybe if I had many months experience with this API it would be obvious but it certainly wasn't to me. For fun I just copy and pasted the stack trace into Gemini. ~60 frames worth of C++. It immediately pointed out the likely cause given the API I was using. I fixed the bug with a 2 line change once I had that clue from the AI. That seems pretty useful to me. I'm not sure how long it would have taken me to find it otherwise since, as I said, I'm not that familiar with that API.
nottorp 10 hours ago [-]
You remember when Google used to do the same thing for you way before "AI"?
Okay, maybe sometimes the post about the stack trace was in Chinese, but a plain search used to be capable of giving the same answer as a LLM.
It's not that LLMs are better, it's search that got entshittified.
averageRoyalty 8 hours ago [-]
A horse used to get you places just like a car could. A wisk worked as well as a blender.
We have a habit of finding efficiencies in our processes, even if the original process did work.
chasd00 5 hours ago [-]
I don’t think search use to do everything LLMs do now but you have a very good point. Search has gotten much worse. I would say search is about the quality it was just before google launched. My general search needs are being met more and more by Claude, I use google only when I know very specific keywords because of seo spam and ads.
socalgal2 9 hours ago [-]
I remember when I could paste an error message into Google and get an answer. I do not remember pasting a 60 line stack trace into Google and getting an answer, though I'm pretty sure I honestly never tried that. Did it work?
0x000xca0xfe 5 hours ago [-]
Yes, pasting lots of seemingly random context into Google used to work shockingly well.
I could break most passwords of an internal company application by googling the SHA1 hashes.
It was possible to reliably identify plants or insects by just googling all the random words or sentences that would come to mind describing it.
(None of that works nowadays, not even remotely)
Philpax 10 hours ago [-]
Google has never identified the logical error in a block of code for me. I could find what an error code was, yes, but it's of very little help when you don't have a keyword to search.
jasode 9 hours ago [-]
>You remember when Google used to do the same thing for you way before "AI"?
[...] stack trace [...], but a plain search used to be capable of giving the same answer as a LLM.
The "plain" Google Search before LLM never had the capability to copy&paste an entire lengthy stack trace (e.g. ~60 frames of verbose text) because long strings like that exceeds Google's UI. Various answers say limit of 32 words and 5784 characters: https://www.google.com/search?q=limit+of+google+search+strin...
Before LLM, the human had to manually visually hunt through the entire stack trace to guess at a relevant smaller substring and paste that into Google the search box. Of course, that's do-able but that's a different workflow than an LLM doing it for you.
To clarify, I'm not arguing that the LLM method is "better". I'm just saying it's different.
nottorp 6 hours ago [-]
That's a good point, because now that I think of it, I never pasted a full stack trace in a search engine. I selected what looked to be the relevant part and pasted that.
But I did it subconsciously. I never thought of it until today.
Another skill that LLM use can kill? :)
swader999 5 hours ago [-]
Those truly were the dark ages. I don't know how people did it. They were a different breed.
FranzFerdiNaN 9 hours ago [-]
It was just as likely that Google would point you towards a stackoverflow question that was closed because it was considered a duplicate of a completely different question.
nsonha 8 hours ago [-]
> when Google used to do the same thing for you way before "AI"?
Which is never? Do you often just lie to win arguments? LLM gives you a synthesized answer, search engine only returns what already exists. By definition it can not give you anything that is not a super obvious match
nottorp 7 hours ago [-]
> Which is never?
In my experience it was "a lot". Because my stack traces were mostly hardware related problems on arm linux in that period.
But I suppose your stack traces were much different and superior and no one can have stack traces that are different from yours. The world is composed of just you and your project.
> Do you often just lie to win arguments?
I do not enjoy being accused of lying by someone stuck in their own bubble.
When you said "Which is never" did you lie consciously or subconsciously btw?
SpaceNugget 3 hours ago [-]
According to a quick search on google, which is not very useful these days, the maximum query length is 32 words or 2000 characters and change depending on which answer you trust.
Whatever it is specifically, the idea that you could just paste a 600 line stack trace unmodified into google, especially "way before AI" and get pointed to the relevant bit for your exact problem is obviously untrue.
raxxorraxor 4 hours ago [-]
For anything non-trivial you have to verify the results.
I disabled AI autocomplete and cannot understand how people can use it. It was mostly an extra key press on backspace for me.
That said, learning new languages is possible without searching anything. With a local model, you can do that offline and have a vast library of knowledge at hand.
The Gemini results integrated in Google are very bad as well.
I don't see the main problem to be people just lazily asking AI for how to use the toilet, but that real knowledge bases like stack overflow and similar will vanish because of lacking participation.
BlackFly 8 hours ago [-]
One of the many ways that search got worse over time was the promotion of blog spam over actual documentation. Generally, I would rather have good API documentation or a user guide that leads me through the problem so that next time I know how to help myself. Reading through good API documentation often also educates you about the overall design and associated functionality that you may need to use later. Reading the manual for technology that you will be regularly using is generally quite profitable.
Sometimes, a function doesn't work as advertised or you need to do something tricky, you get a weird error message, etc. For those things, stackoverflow could be great if you could find someone who had a similar problem. But the tutorial level examples on most blogs might solve the immediate problem without actually improving your education.
It would be similar to someone solving your homework problems for you. Sure you finished your homework, but that wasn't really learning. From this perspective, ChatGPT isn't helping you learn.
blueflow 8 hours ago [-]
You parent searches for answers, you search for documentation. Thats why AI works for him and not for you.
ryanackley 7 hours ago [-]
You're completely missing his point. If nobody figures things out for themselves, there's a risk that at some point, AI won't have anything to learn on since people will stop writing blog posts on how they figured something out and answering stack overflow questions.
Sure, there is a chance that one day AI will be smart enough to read an entire codebase and chug out exhaustively comprehensive and accurate documentation. I'm not convinced that is guaranteed to happen before our collective knowledge falls off a cliff.
blueflow 7 hours ago [-]
Read it again, slowly. FSVO "works":
Thats why AI works for him and not for you.
We both agree.
turtlebits 11 hours ago [-]
It's perfect for small boilerplate utilities. If I need a browser extension/tampermonkey script, I can get up and running quickly without having to read docs/write manifests. These are small projects where without AI, I wouldn't have bothered to even start.
At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.
perrygeo 2 hours ago [-]
> Getting answers to those question is learning, regardless of where the answer comes from.
Sort of. The process of working through the question is what drives learning. If you just receive the answer with zero effort, you are explicitly bypassing the brain's learning mechanism.
There's huge difference between your workflow and fully Agentic AIs though.
Asking an AI for the answer in the way you describe isn't exactly zero effort. You need to formulate the question and mold the prompt to get your response, and integrate the response back into the project. And in doing so you're learning! So YOUR workflow has learning built in, because you actually use your brain before and after the prompt.
But not so with vibe coding and Agentic LLMs. When you hit submit and get the tokens automatically dumped into your files, there is no learning happening. Considering AI agents are effectively trying to remove any pre-work (ie automating prompt eng) and post-work (ie automating debugging, integrating), we can see Agentic AI as explicitly anti-learning.
Here's my recent vibe coding anecdote to back this up. I was working on an app for an e-ink tablet dashboard and the tech stack of least resistance was C++ with QT SDK and their QML markup language with embedded javascript. Yikes, lots of unfamiliar tech. So I tossed the entire problem at Claude and vibe coded my way to a working application. It works! But could I write a C++/QT/QML app again today - absolutely not. I learned almost nothing. But I got working software!
Eisenstein 2 hours ago [-]
The logical conclusion of this is 'the AI just solves the problem by coding without telling you about it'. If we think about 'what happens when everyone vibe-codes to solve their problems' then we get to 'the AI solves the problem for you, and you don't even see the code'.
Vibe-coding is just a stop on the road to a more useful AI and we shouldn't think of it as programming.
rich_sasha 1 hours ago [-]
I sort of disagree with this argument in TFA, as you say, though the rest of the article highlights a limitation. If I'm unfamiliar with the API, I can't judge whether the answer is good.
There is a sweet spot of situations I know well enough to judge a solution quickly, but not well enough to write code quickly, but that's a rather narrow case.
PeterStuer 9 hours ago [-]
I love leaning new things. With ai I am learning more and faster.
I used to be on the Microsoft stack for decades. Windows, Hyper-V, .NET, SQL Server ... .
Got tired of MS's licensing BS and I made the switch.
Not all of these were completely new, but I had never dove in seriously.
Without AI, this would have been a long and daunting project. AI made this so much smoother. It never tires of my very basic questions.
It does not always answer 100% correct the first time (tip: paste in the docs of specific version of the thing you are trying to figure out as it sometimes has out-of-date or mixed version knowledge), but most often can be nudged and prodded to a very helpfull result.
AI is just an undeniably superior teacher than Google or Stack Overflow ever was. You still do the learning, but the AI is great in getting you to learn.
rootnod3 5 hours ago [-]
I might be an outlier, but I much prefer reading the documentation myself. One of the reasons I love using FreeBSD and OpenBSD as daily drivers. The documentation is just so damn good. Is it a pain in the ass at the beginning? Maybe. But I require way less documentation lookups over time and do not have to rely on AI for that.
Don't get me wrong, I tried. But even when pasting the documentation in, the amount of times it just hallucinated parameters and arguments that were not even there were such a huge waste of time, I don't see the value in it.
greybox 9 hours ago [-]
I trust chatgpt and gemini a lot less than stackoverflow. On stackoverflow I can see the context that the answer to the original question was given in. AI does not do this. I've asked chatgpt questions about cmake for instance that it got subtly wrong, if I had not noticed this it would have cost me aa lot of time.
thedelanyo 10 hours ago [-]
So AI is basically best as a search engine.
jrm4 2 hours ago [-]
As I've said a bunch.
AI is a search engine that can also remix its results, often to good effect.
groestl 10 hours ago [-]
That's right.
cess11 10 hours ago [-]
I mean, it's just a compressed database with a weird query engine.
nikanj 10 hours ago [-]
And ChatGPT never closes your question without answer because it (falsely) thinks it's a duplicate of a different question from 13 years ago
nottorp 10 hours ago [-]
But it does give you a ready to copy paste answer instead of a 'teach the man how to fish' answer.
nikanj 9 hours ago [-]
I'd rather have a copy paste answer than a "go fish" answer
addandsubtract 9 hours ago [-]
Not if you prompt it to explain the answer it gives you.
nottorp 9 hours ago [-]
Not the same thing. Copying code, even with comprehensive explanations, teaches less than writing/adjusting your own code based on advice.
elbear 5 hours ago [-]
It can also do that if you ask it. It can give you exercises that you can solve. But you have to specifically ask, because by default it just gives you code.
nottorp 5 hours ago [-]
Of course, I originally was picking on Stack Overflow's moderation.
Which strongly discouraged trying to teach people.
yard2010 8 hours ago [-]
I think the main issue here is trust. When you google something you develop a sense for bullshit so you can "feel" the sources and weigh them accordingly. Using a chat bot, this bias doesn't hold, so you don't know what is just SEO bullshit reiterated in sweet words and what's not.
lexandstuff 15 hours ago [-]
Great article. The other thing that you miss out on when you don't write the code yourself is that sense of your subconscious working for you. Writing code has a side benefit of developing a really strong mental model of a problem, that kinda gets embedded in your neurons and pays dividends down the track, when doing stuff like troubleshooting or deciding on how to integrate a new feature. You even find yourself solving problems in your sleep.
I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.
nerevarthelame 13 hours ago [-]
> I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.
I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.
But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.
You’re twisting results. Just because they took more time doesn’t mean their productivity went down. On the contrary, if you can perform expert task with much less mental resources (which 99% of orgs should prioritize for) then it is an absolute win. Work is extremely mentally draining and soul crushing experience for majority of people, if AI can lower that while maintaining roughly same result with subjects allocating only, say, 25% of their mental energy – that’s an amazing win.
didibus 13 hours ago [-]
If I follow what you are saying, employers won't see any benefits, but employees, while they will take the same time and create the same output in the same amount of time, will be able to do so at a reduced mental strain?
Personally, I don't know if this is always a win, mostly because I enjoy the creative and problem solving aspect of coding, and reducing that to something that is more about prompting, correcting, and mentoring an AI agent doesn't bring me the same satisfaction and joy.
Vinnl 8 hours ago [-]
Steelmanning their argument, employers will see benefits because while the employee might be more productive than with an LLM in the first two hours of the day, the cognitive load reduces their productivity as the day goes on. If employees are able to function at a higher level for longer during their day with an LLM, that should benefit the employer.
tsurba 9 hours ago [-]
And how long have you been doing this? Because that sounds naive.
After doing programming for a decade or two, the actual act of programming is not enough to be ”creative problem solving”, it’s the domain and set of problems you get to apply it to that need to be interesting.
>90% of programming tasks at a company are usually reimplementing things and algorithms that have been done a thousand times before by others, and you’ve done something similar a dozen times. Nothing interesting there. That is exactly what should and can now be automated (to some extent).
In fact solving problems creatively to keep yourself interested, when the problem itself is boring is how you get code that sucks to maintain for the next guy. You should usually be doing the most clear and boring implementation possible. Which is not what ”I love coding” -people usually do (I’m definitely guilty).
To be honest this is why I went back to get a PhD, ”just coding” stuff got boring after a few years of doing it for a living. Now it feels like I’m just doing hobby projects again, because I work exactly on what I think could be interesting for others.
AstroBen 2 hours ago [-]
> not having to expend brain energy to solve problems, and they're mistaking that for productivity
Couldn't this result in being able to work longer for less energy, though? With really hard mentally challenging tasks I find I cap out at around 3-4 hours a day currently
Like imagine if you could walk at running speed. You're not going faster.. but you can do it for way longer so your output goes up if you want it to
waprin 15 hours ago [-]
To some degree, traditional coding and AI coding are not the same thing, so it's not surprising that some people are better at one than the other. The author is basically saying that he's much better at coding than AI coding.
But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.
With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".
It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.
dspillett 9 hours ago [-]
> To some degree, traditional coding and AI coding are not the same thing
LLM-based¹ coding, at least beyond simple auto-complete enhancements (using it directly & interactively as what it is: Glorified Predictive Text) is more akin to managing a junior or outsourcing your work. You give a definition/prompt, some work is done, you refine the prompt and repeat (or fix any issues yourself), much like you would with an external human. The key differences are turnaround time (in favour of LLMs), reliability (in favour of humans, though that is mitigated largely by the quick turnaround), and (though I suspect this is a limit that will go away with time, possibly not much time) lack of usefulness for "bigger picture" work.
This is one of my (several) objections to using it: I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it. For years I've avoided managing people at all, at the known expense of reduced salary potential, for similar reasons: I want to be a tinkerer, not a manager of tinkerers. Perhaps call me back when you have an AGI that I can work alongside.
--------
[1] Yes, I'm a bit of a stick-in-the-mud about calling these things AI. Next decade they won't generally be considered AI like many things previously called AI are not now. I'll call something AI when it is, or very closely approaches, AGI.
rwmj 8 hours ago [-]
Another difference if your junior will, over time, learn, and you'll also get a sense of whether you can trust them. If after a while they aren't learning and you can't trust them, you get rid of them. GenAI doesn't gain knowledge in the same way, and you're always going to have the same level of trust in it (which in my experience is limited).
Also if my junior argued back and was wrong repeatedly, that's be bad. Lucky that has never happened with AIs ...
averageRoyalty 8 hours ago [-]
Cline, Roocode etc have the concept of rules that can be added to over time. There are heaps of memory bank and orchestration methods for AI.
LLMs absolutely can improve over time.
danielbln 5 hours ago [-]
> I want to be a tinkerer, not a manager of tinkerers.
We all want many things, doesn't mean someone will pay you for it. You want to tinker? Great, awesome, more power to you, tinker on personal projects to your heart's content. However, if someone pays you to solve a problem, then it is our job to find the best, most efficient way to cleanly do it. Can LLMs do this on their own most of the time? I think not, not right now at least. The combination of skilled human and LLM? Most likely, yes.
dspillett 35 minutes ago [-]
If it gets to the point where I can't compete in the role with those using LLMs, I'll move on. I'm not happy with remote teams essentially being the only way of working these days (if you aren't working alone) anyway, and various other directions the industry has moved in (the shit-show that is client-side stack for instance!).
Maybe I'll retrain for lab work, I know a few people in the area, yeah I'd need a pay cut, but… Heck, I've got the mortgage paid, so I could take quite a cut and not be destitute, especially if I get sensible and keep my savings where they are and building instead of getting tempted to spend them! I don't think it'll get to that point for quite a few years though, and I might have been due to throw the towel in by that point anyway. It might be nice to reclaim tinkering as a hobby rather than a chore!
thefz 4 hours ago [-]
> I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it
A million times yes.
And we live in a time in which people want to be called "programmers" because it's oh-so-cool but not doing the work necessary to earn the title.
mitthrowaway2 13 hours ago [-]
The skill ceiling might be "high" but it's not like investing years of practice to become a great pianist. The most experienced AI coder in the world has about three years of practice working this way, much of which is obsoleted because the models have changed to the point where some lessons learned on GPT 3.5 don't transfer. There aren't teachers with decades of experience to learn from, either.
freehorse 9 hours ago [-]
Moreover, the "ceiling" may still be below the "code works" level, and you have no idea when you start if it is or not.
dr_dshiv 11 hours ago [-]
It’s mostly attitude that you are learning. Playfulness, persistence and a willingness to start from scratch again and again.
suddenlybananas 11 hours ago [-]
>persistence and a willingness to start from scratch again and again.
i.e. continually gambling and praying the model spits something out that works instead of thinking.
tsurba 10 hours ago [-]
Gambling is where I end up if I’m tired and try to get an LLM to build my hobby project for me from scratch in one go, not really bothering to read the code properly. It’s stupid and a waste of time. Sometimes it’s easier to get started this way though.
But more seriously, in the ideal case refining a prompt based on a misunderstanding of an LLM due to ambiguity in your task description is actually doing the meaningful part of the work in software development. It is exactly about defining the edge cases, and converting into language what is it that you need for a task. Iterating on that is not gambling.
But of course if you are not doing that, but just trying to get a ”smarter” LLM with (hopefully deprecated study of) ”prompt engineering” tricks, then that is about building yourself a skill that can become useless tomorrow.
HPsquared 10 hours ago [-]
Most things in life are like that.
chii 8 hours ago [-]
why is the process important? If they can continuously trial and error their way into a good output/result, then it's a fine outcome.
suddenlybananas 7 hours ago [-]
Why is thinking important? Think about it a bit.
chii 7 hours ago [-]
is it more important for a chess engine to be able to think? Or is it able to win by brute force through searching a sufficient outcome?
If the outcome is indistinguisable from using "thinking" as the process rather than brute force, why would the process matter regarding how the outcome was achieved?
suddenlybananas 5 hours ago [-]
maybe if programming were a well-defined game like chess, but it's not.
chii 5 hours ago [-]
the grammar of a programming language is just as well defined. And the defined-ness of the "game" isn't required for my argument.
Your concept of thinking is the classic retoric - as soon as some "ai" manages to achieve something which previously wasn't capable, it's no longer AI and is just xyz process. It happened with chess engines, with alphago, and with LLMs. The implication being that human "thinking" is somehow unique and only the AI that replicate it can be considered to have "thinking".
notnullorvoid 13 hours ago [-]
Is it a skill worth learning though? How much does the output quality improve? How transferable is it across models and tools of today, and of the future?
From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.
vidarh 10 hours ago [-]
Given I see people insisting these tools don't work for them at all, and some of my results recently include spitting out a 1k line API client with about 5 brief paragraphs of prompts, and designing a website (the lot, including CSS, HTML, copy, database access) and populating the directory on it with entries, I'd think the output quality improves a very great deal.
From what I see of the tools, I think the skills developed largely consists of skills you need to develop as you get more senior anyway, namely writing detail-oriented specs and understanding how to chunk tasks. Those skills aren't going to stop having value.
notnullorvoid 3 hours ago [-]
If I had a green field project that was low novelty I would happily use AI to get a prototype out the door quickly. I basically never work on those kinds of projects though, and I've seen AI tools royal screw up enough times given clear direction on both novel and trivial tasks in existing code bases.
Detailed specs are certainly a transferable skill, what isn't is the tedious hand holding and defensive prompting. In my entire career I've worked with a lot of people, only one required as much hand holding as AI. That person was using AI to do all their work.
jyounker 36 minutes ago [-]
Describing things in enough detail that someone else can implement them is a pretty important skill. Learning how to break up a large project into smaller tasks that you can then delegate to others is also a pretty important skill.
npilk 5 hours ago [-]
Maybe this is yet another application of the bitter lesson. It's not worth learning complex processes for partnering with AI models, because any productivity gains will pale in comparison to the performance improvement from future generations.
notnullorvoid 3 hours ago [-]
Perhaps... Even if I'm being optimistic though there is a ceiling for just how much productivity can be gained. Natural language is much more lossy compared to programming languages, so you'll still need a lot of natural language input to get the desired output.
serpix 13 hours ago [-]
Regarding using AI tools for programming it is not a one-for-all choice. You can pick a grunt work task such as "Tag every such and such terraform resource with a uuid" and let it do just that. Nothing to do with quality but everything to do with a simple task and not having to bother with the tedium.
autobodie 12 hours ago [-]
Why use AI to do something so simple? You're only increasing the possibility that it gets done wrong. Multi-cursor editing wil be faster anyway.
barsonme 11 hours ago [-]
Why not? I regularly have a couple Claude instances running in the background chewing through simple yet time consuming tasks. It’s saved me many hours of work and given me more time to focus on the important parts.
dotancohen 10 hours ago [-]
> a couple Claude instances running in the background chewing through simple yet time consuming tasks.
If you don't mind, I'd love to hear more about this. How exactly are they running the background? What are they doing? How do you interact with them? Do they have access to your file system?
Thank you!
Philpax 9 hours ago [-]
I would guess that they're running multiple instances of Claude Code [0] in the background. You can give it arbitrary tasks up to a complexity ceiling that you have to figure out for yourself. It's a CLI agent, so you can just give it directives in the relevant terminal. Yes, they have access to the filesystem, but only what you give them.
Those tasks can take hours, or at least long enough where multiple tasks are running in the background? The page says $17 per month. That's unlimited usage?
If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.
Philpax 6 hours ago [-]
> Those tasks can take hours, or at least long enough where multiple tasks are running in the background?
Maybe not hours, but extended periods of time, yes. Agents are very quick, so they can frequently complete tasks that would have taken me hours in minutes.
> The page says $17 per month. That's unlimited usage?
Each plan has a limited quota; the Pro plan offers you enough to get in and try out Claude Code, but not enough for serious use. The $100 and $200 plans still have quotas, but they're quite generous; people have been able to get orders of magnitude of API-cost-equivalents out of them [0].
> If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.
Perhaps, but for now, you still need to have some degree of vague competence to know what to look out for and what works best. Might I suggest using the tools to get work done faster so that you can relax for the rest of the day? ;)
With such tedious tasks does it not take you just as long to verify it didn't screw up than if you had done it yourself?
stitched2gethr 12 hours ago [-]
It will very soon be the only way.
skydhash 15 hours ago [-]
> But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize
No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.
> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.
While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.
furyofantares 13 hours ago [-]
> No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.
The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.
I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.
I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.
Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.
philomath_mn 3 hours ago [-]
Anywhere I can follow your takes on LLM-assisted coding?
oxidant 15 hours ago [-]
I do not agree it is something you can pick up in an hour. You have to learn what AI is good at, how different models code, how to prompt to get the results you want.
If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?
I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.
AI isn't a panacea, but it can be the right tool for the job.
15123123 14 hours ago [-]
I am also interested in how much of these skills are at the mercy of OpenAI ? Like IIRC 1 or 2 years ago there was an uproar of AI "artists" saying that their art is ruined because of model changes ( or maybe the system prompt changed ).
>I do not agree it is something you can pick up in an hour.
But it's also interesting that the industry is selling the opposite ( with AI anyone can code / write / draw / make music ).
>You have to learn what AI is good at.
More often than not I find it you need to learn what the AI is bad at, and this is not a fun experience.
oxidant 12 hours ago [-]
Of course that's what the industry is selling because they want to make money. Yes, it's easy to create a proof of concept but once you get out of greenfield into 50-100k tokens needed in the context (reading multiple 500 line files, thinking, etc) the quality drops and you need to know how to focus the models to maintain the quality.
"Write me a server in Go" only gets you so far. What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?
I find I need to think AND write more than I would if I was doing it myself because the feedback loop is longer. Like the article says, you have to review the code instead of having implicit knowledge of what was written.
That being said, it is faster for some tasks, like writing tests (if you have good examples) and doing basic scaffolding. It needs quite a bit of hand holding which is why I believe those with more experience get more value from AI code because they have a better bullshit meter.
skydhash 7 hours ago [-]
> What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?
That is software engineering realm, not using LLMs realm. You have to answer all of these questions even with traditional coding. Because they’re not coding questions, they’re software design questions. And before that, there were software analysis questions preceded by requirements gathering questions.
A lot of replies around the thread is conflating coding activities with the parent set of software engineering activities.
oxidant 3 hours ago [-]
Agreed, but people sell "vibe coding" without acknowledging you need more than vibes.
LLMs can help answer the questions. However, they're not going to necessarily make the correct choices or implementation without significant input from the user.
solumunus 13 hours ago [-]
OpenAI? They are far from the forefront here. No one is using their models for this.
15123123 11 hours ago [-]
You can substitute for whatever saas company of your choice.
viraptor 14 hours ago [-]
> It's something you can pick in a few minutes
You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?
Correct prompting these days what makes a difference in tasks like SWE-verified.
sothatsit 14 hours ago [-]
I feel like there is also a very high ceiling to how much scaffolding you can produce for the agents to get them to work better. This includes custom prompts, custom CLAUDE.md files, other documentation files for Claude to read, and especially how well and quickly your linting and tests can run, and how much functionality they cover. That's not to mention MCP and getting Claude to talk to your database or open your website using Playwright, which I have not even tried yet.
For example, I have a custom planning prompt that I will give a paragraph or two of information to, and then it will produce a specification document from that by searching the web and reading the code and documentation. And then I will review that specification document before passing it back to Claude Code to implement the change.
This works because it is a lot easier to review a specification document than it is to review the final code changes. So, if I understand it and guide it towards how I would want the feature to be implemented at the specification stage, that sets me up to have a much easier time reviewing the final result as well. Because it will more closely match my own mental model of the codebase and how things should be implemented.
And it feels like that is barely scratching the surface of setting up the coding environment for Claude Code to work in.
freehorse 9 hours ago [-]
And where all this skill will go when newer models after one year use different tools and require different scaffolding?
The problem with overinvesting in a brand new, developping field is that you get skills that are soon to be redundant. You can hope that the skills are gonna transfer to what will be needed after, but I am not sure if that will be the case here. There was a lot of talk about prompting techniques ("prompt engineering") last year, and now most of these are redundant and I really don't think I have learnt something that is useful enough for the new models, nor have I actually understood sth. These are all tricks and tips level, shallow stuff.
I think these skills are just like learning how to use some tools in an ide. They increase productivity, it's great but if you have to switch ide they may not actually help you with the new things you have to learn in the new environment. Moreover, these are just skills in how to use some tools; they allow you to do things, but we cannot compare learning how to use tools vs actually learning and understanding the structure of a program. The former is obviously a shallow form of knowledge/skill, easily replaceable, easily redundant and probably not transferable (in the current context). I would rather invest more time in the latter and actually get somewhere.
sothatsit 7 hours ago [-]
A lot of the changes to get agents to work well is just good practice anyway. That's what is nice about getting these agents to work well - often, it just involves improving your dev tooling and documentation, which can help real human developers as well. I don't think this is going to become irrelevant any time soon.
The things that will change may be prompts or MCP setups or more specific optimisations like subagents. Those may require more consideration of how much you want to invest in setting them up. But the majority of setup you do for Claude Code is not only useful to Claude Code. It is useful to human developers and other agent systems as well.
> There was a lot of talk about prompting techniques ("prompt engineering") last year and now most of these are redundant.
Not true, prompting techniques still matter a lot to a lot of applications. It's just less flashy now. In fact, prompting techniques matter a ton for optimising Claude Code and creating commands like the planning prompt I created. It matters a lot when you are trying to optimise for costs and use cheaper models.
> I think these skills are just like learning how to use some tools in an ide.
> if you have to switch ide they may not actually help you
A lot of the skills you learn in one IDE do transfer to new IDEs. I started using Eclipse and that was a steep learning curve. But later I switched to IntelliJ IDEA and all I had to re-learn were key-bindings and some other minor differences. The core functionality is the same.
Similarly, a lot of these "agent frameworks" like Claude Code are very similar in functionality, and switching between them as the landscape shifts is probably not as large of a cost as you think it is. Often it is just a matter of changing a model parameter or changing the command that you pass your prompt to.
Of course it is a tradeoff, and that tradeoff probably changes a lot depending upon what type of work you do, your level of experience, how old your codebases are, how big your codebases are, the size of your team, etc... it's not a slam dunk that it is definitely worthwhile, but it is at least interesting.
viraptor 13 hours ago [-]
> then it will produce a specification document from that
I like a similar workflow where I iterate on the spec, then convert that into a plan, then feed that step by step to the agent, forcing full feature testing after each one.
bcrosby95 13 hours ago [-]
When you say specification, what, specifically, does that mean? Do you have an example?
I've actually been playing around with languages that separate implementation from specification under the theory that it will be better for this sort of stuff, but that leaves an extremely limited number of options (C, C++, Ada... not sure what else).
I've been using C and the various LLMs I've tried seem to have issues with the lack of memory safety there.
sothatsit 12 hours ago [-]
A "specification" as in a text document outlining all the changes to make.
For example, it might include: Overview, Database Design (Migration, Schema Updates), Backend Implementation (Model Updates, API updates), Frontend Implementation (Page Updates, Component Design), Implementation Order, Testing Considerations, Security Considerations, Performance Considerations.
It sounds like a lot when I type it out, but it is pretty quick to read through and edit.
The specification document is generated by a planning prompt that tells Claude to analyse the feature description (the couple paragraphs I wrote), research the repository context, research best practices, present a plan, gather specific requirements, perform quality control, and finally generate the planning document.
I'm not sure if this is the best process, but it seems to work pretty well.
viraptor 12 hours ago [-]
Like a spec you'd hand to a contractor. List of requirements, some business context, etc. Not a formal algorithm spec.
My basic initial prompt for that is: "we're creating a markdown specification for (...). I'll start with basic description and at each step you should refine the spec to include the new information and note what information is missing or could use refinement."
sagarpatil 14 hours ago [-]
Yeah, you can’t do sh*t in an hour.
I spend a good 6-8 hours every day using Claude Code, and I actually spend an hour every day trying new AI tools, it’s a constant process.
It definitely takes more than minutes to discover the ways that your model is going to repeatedly piss you off and set up guardrails to mitigate those problems.
JimDabell 13 hours ago [-]
> It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up).
This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.
dingnuts 15 hours ago [-]
> You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.
ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.
qinsig 15 hours ago [-]
Avoid using agents that can just blow through money (cline, roocode, claudecode with API key, etc).
Instead you can get comfortable prompting and managing context with aider.
Or you can use claude code with a pro subscription for a fair amount of usage.
I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.
goalieca 15 hours ago [-]
And how often do your prompting skills change as the models evolve.
badsectoracula 13 hours ago [-]
It wont be the hippest of solutions, but you can use something like Devstral Small with a full open source setup to get experimenting with local LLMs and a bunch of tools - or just chat with it with a chat interface. I did pingponged between Devstral running as a chat interface and my regular text editor some time ago to make a toy project of a raytracer [0] (output) [1] (code).
While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).
It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.
IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).
[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things
[3] the code it produced for [2] was fixable to do a simple BVH
[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.
grogenaut 15 hours ago [-]
how much time did you spend learning your last language to become comfortable with it?
stray 15 hours ago [-]
You're going to spend a little over $1k to ramp up your skills with AI-aided coding. It's dirt cheap in the grand scheme of things.
viraptor 14 hours ago [-]
Not even close. I'm still under $100, creating full apps. Stick to reasonable models and you can achieve and learn a lot. You don't need latest and greatest in max mode (or whatever the new one calls it) for majority of the tasks. You can have to throw the whole project at the service every time either.
viraptor 6 hours ago [-]
Typo: ...you don't have to throw the whole project context...
dingnuts 15 hours ago [-]
do I get a refund if I spend a grand and I'm still not convinced? at some point I'm going to start lying to myself to justify the cost and I don't know how much y'all earn but $1k is getting close
theoreticalmal 14 hours ago [-]
Would you ask for a refund from a university class if you didn’t get a job or skill from it? Investing in a potential skill is a risk and carries an opportunity cost, that’s part of what makes it a risk
HDThoreaun 14 hours ago [-]
No one is forcing you to improve. If you don’t want to invest in yourself that is fine, you’ll just be left behind.
asciimov 14 hours ago [-]
How are those without that kind of scratch supposed to keep up with those that do?
theoreticalmal 14 hours ago [-]
This kind of seems like asking “how are poor people supposed to keep up with rich people” which we seem to not have a long term viable answer for right now
wiseowise 13 hours ago [-]
What makes you think those without that kind of scratch are supposed to keep up?
asciimov 12 hours ago [-]
For the past 10 years we have been telling everyone learn to code, now it’s learn to build AI prompts.
Before a poor kid with a computer access could learn to code nearly for free, but if it costs $1k just to get started with AI that poor kid will never have that opportunity.
wiseowise 12 hours ago [-]
For the past 10 years scammers and profiteers been telling everyone to learn to code, not we.
sagarpatil 14 hours ago [-]
Use free tiers?
throwawaysleep 14 hours ago [-]
If you lack "that kind of scratch", you are at the learning stage for software development, not the keeping up stage. Either that or horribly underpaid.
bevr1337 13 hours ago [-]
I recently had a coworker tell me he liked his last workplace because "we all spoke the same language." It was incredible how much he revealed about himself with what he thought was a simple fact about engineer culture. Your comment reminds me of that exchange.
- Employers, not employees, should provide workplace equipment or compensation for equipment. Don't buy bits for the shop, nails for the foreman, or Cursor for the tech lead.
- the workplace is not a meritocracy. People are not defined by their wealth.
- If $1,000 does not represent an appreciable amount of someone's assets, they are doing well in life. Approximately half of US citizens cannot afford rent if they lose a paycheck.
- Sometimes the money needs to go somewhere else. Got kids? Sick and in the hospital? Loan sharks? A pool full of sharks and they need a lot of food?
- Folks can have different priorities and it's as simple as that
We're (my employer) still unsure if new dev tooling is improving productivity. If we find out it was unhelpful, I'll be very glad I didn't lose my own money.
15123123 13 hours ago [-]
$100 per month for a SaaS is quite a lot outside of Western countries. People are not even spending that much on VPN or Password Manager.
13 hours ago [-]
rwmj 8 hours ago [-]
> What I think happens is that these people save time because they only spot review the AI generated code, or skip the review phase altogether, which as I said above would be a deal breaker for me.
In my experience it's that they dump the code into a pull request and expect me to review it. So GenAI is great if someone else is doing the real work.
anelson 7 hours ago [-]
I’ve experienced this as well. If management is not competent they can’t tell (or don’t want to hear) when a “star” performer is actually a very expensive wrapper around a $20/mo cursor subscription.
Unlike the author of the article I do get a ton of value from coding agents, but as with all tools they are less than useless when wielded incompetently. This becomes more damaging in an org that already has perverse incentives which reward performative slop over diligent and thoughtful engineering.
skydhash 6 hours ago [-]
Git blame can do a lot in those situations. Find the general location of the bug, then assign everyone that has touched it to the ticket.
cardanome 5 hours ago [-]
Is that really something you are doing in your job?
Most of my teams have been very allergic to assigning personal blame and management very focused on making sure everyone can do everything and we are always replaceable. So maybe I could phrase it like "X could help me with this" but saying X is responsible for the bug would be a no no.
skydhash 5 hours ago [-]
Not really. I was talking more in the context of the parent comment. If your management is dysfunctional, allowing AI slop without the accountability, then you go with this extreme measure.
I don't mind fixing bugs, but I do mind reckless practices that introduce them.
danielbln 5 hours ago [-]
I don't understand this, the buck stops with the PR submitter. If they get repeated feedback about their PRs that are just passed-through AI slop, then the team lead or whatever should give them a stern talking to.
pera 4 hours ago [-]
That would be a reasonable thing to do, unfortunately this doesn't always happen. Say for example that your company is quite behind schedule and decides to pay some cheap contractors to work on anything that doesn't require domain expertise: In 2025 these cheap contractors will 100% vibe code their way through their assigned tickets. They will open PRs that look "nearly there" and basically hope for all green checks in your CI/CD pipeline. If that doesn't happen then they will try to bruteforce^W vibe code the PR for a couple of hours. If it still doesn't pass then claim that the PR is ready but there is something wrong for example with an external component which they can't touch due to contractual reasons...
One of the most bizarre experiences I have had over this past year was dealing with a developer who would screen share a ChatGPT session where they were trying to generate a test payload with a given schema, getting something that didn't pass schema validation, and then immediately telling me that there must be a bug in the validator (from Apache foundation). I was truly out of words.
mdavid626 16 minutes ago [-]
AI offers many solutions and also brings many problems. Some people like to see only the good side and act, as if there would be no bad side.
One of the biggest problem I see with AI is, that it makes people used to NOT to think. It takes lots of time and energy to learn to program and design complex software. AI doesn’t solve this - humans to be able to supervise need to have these skills. But why would new programmers learn them? AI writes their code! It’s already hard to convince them otherwise. This only leads to bad things.
Technology without proper control and wisdom, destroys human things. We saw this many times already.
maujun 5 minutes ago [-]
Agreed.
StackOverflow makes it easier not think and copy-paste.
Autocomplete makes it easier to not think and make typos (Hopefully you have static typing).
Package management makes it easier to not think and introduce heavy dependencies.
C makes it easier to not think and forget to initialize variables.
I make it easier to not think and read without considering evil (What if every word I say has evil intention and effect?)
jumploops 16 hours ago [-]
> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.
As someone who uses Claude Code heavily, this is spot on.
LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.
I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).
adriand 15 hours ago [-]
Do you have to review the code? I’ll be honest that, like the OP theorizes, I often just spot review it. But I also get it to write specs (often very good, in terms of the ones I’ve dug into), and I always carefully review and test the results. Because there is also plenty of non-AI code in my projects I didn’t review at all, namely, the myriad open source libraries I’ve installed.
jumploops 15 hours ago [-]
Yes, I’m actually working on an another project with the goal of never looking at the code.
For context, it’s just a reimplementation of a tool I built.
Let’s just say it’s going a lot slower than the first time I built it by hand :)
hatefulmoron 14 hours ago [-]
It depends on what you're doing. If it's a simple task, or you're making something that won't grow into something larger, eyeballing the code and testing it is usually perfect. These types of tasks feel great with Claude Code.
If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.
The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.
vidarh 10 hours ago [-]
If you let it paint you into a corner, why are you doing so?
I don't know what you mean by "a lot of guidance". Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.
Another issue is that as long as you ensure it builds good enough tests, the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.
The code is increasingly becoming throwaway.
hatefulmoron 9 hours ago [-]
> If you let it paint you into a corner, why are you doing so?
What do you mean? If it were as simple as not letting it do so, I would do as you suggest. I may as well stop letting it be incorrect in general. Lots of guidance helps avoid it.
> Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.
Well yeah. You need to give it lots of guidance, like someone who works for you.
> the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.
It's a moving target for sure. My confidence with this in more complex scenarios is much smaller.
vidarh 8 hours ago [-]
> What do you mean? If it were as simple as not letting it do so, I would do as you suggest.
I'm arguing it is as simple as that. Don't accept changes that muddle up the architecture. Take attempts to do so as evidence that you need to add direction. Same as you presumably would - at least I would - with a developer.
hatefulmoron 8 hours ago [-]
My concern isn't that it's messing up my architecture as I scream in protest from the other room, powerless to stop it. I agree with you and I think I'm being quite clear. Without relatively close guidance, it will paint you into a corner in terms of architecture. Guide it, direct it, whatever you want to call it.
cbsmith 15 hours ago [-]
There's an implied assumption here that code you write yourself doesn't need to be reviewed from a context different from the author's.
There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".
Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).
It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.
Terr_ 14 hours ago [-]
There's an implied assumption here that developers who end up spending all their time reviewing LLM code won't lose their skills or become homicidal. :p
cbsmith 13 hours ago [-]
Fair enough. ;-)
I'm actually curious about the "lose their skills" angle though. In the open source community it's well understood that if anything reviewing a lot of code tends to sharpen your skills.
Terr_ 12 hours ago [-]
I expect that comes from the contrast and synthesis between how the author is anticipating things will develop or be explained, versus what the other person actually provided and trying to understand their thought process.
What happens if the reader no longer has enough of that authorial instinct, their own (opinionated) independent understanding?
I think the average experience would drift away from "I thought X was the obvious way but now I see by doing Y you were avoid that other problem, cool" and towards "I don't see the LLM doing anything too unusual compared to when I ask it for things, LGTM."
cbsmith 11 hours ago [-]
It seems counter intuitive that the reader would no longer have that authorial instinct due to lack of writing. Like, maybe they never had it, in which case, yes. But being exposed to a lot of different "writing opinions" tends to hone your own.
Let's say you're right though, and you lose that authorial instinct. If you've got five different proposals/PRs from five different models, each one critiqued by the other four, the needs for authorial instinct diminish significantly.
layer8 8 hours ago [-]
I don’t find this convincing. People generally don’t learn how to write a good novel just by reading a lot of them.
jyounker 22 minutes ago [-]
On the other hand, people who write good novels tend to read a lot. Reading isn't sufficient, but intensive reading generally seems to be required.
ramraj07 4 hours ago [-]
That's a great perspective but its possible you're in a thread where no one wants to believe AI actually helps with coding.
sagarpatil 14 hours ago [-]
I always use Claude Code to debug issues, there’s no point in trying to do this yourself when AI can fix it in minutes (easy to verify if you write tests first)
o3 with new search can do things in 5 mins that will take me at least 30 mins if I’m very efficient.
Say what you want but the time savings is real.
layer8 8 hours ago [-]
Tests can never verify the correctness of code, they only spot-check for incorrectness.
susshshshah 13 hours ago [-]
How do you know what tests to write if you don’t understand the code?
9rx 13 hours ago [-]
Same way you normally would? Tests are concerned with behaviour. The code that implements the behaviour is immaterial.
wiseowise 13 hours ago [-]
How do you do TDD without having code in the first place? How do QA verifies without reading the source?
adastra22 12 hours ago [-]
I’m not sure I understand this statement. You give your program parameters X and expect result Y, but instead get Z. There is your test, embedded in the problem statement.
13 hours ago [-]
mleonhard 15 hours ago [-]
I solved my RSI symptoms by keeping my arms warm all the time, while awake or asleep. Maybe that will work for you, too?
jumploops 15 hours ago [-]
My issue is actually due to ulnar nerve compression related to a plate on my right clavicle.
Years of PT have enabled me to work quite effectively and minimize the flare ups :)
15 hours ago [-]
hooverd 16 hours ago [-]
Is anybody doing cool hybrid interfaces? I don't actually want to do everything in conversational English, believe it or not.
jumploops 16 hours ago [-]
My workflow is to have spec files (markdown) for any changes I’m making, and then use those to keep Claude on track/pull out of the trees.
Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.
I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.
adastra22 12 hours ago [-]
I do similar and find that this is the best compromise that I have tried. But I still find myself nodding along with OP. I am more and more finding that this is not actually faster, even though it certainly seems so.
bdamm 16 hours ago [-]
Isn't that what Windsurf or Cursor are?
marssaxman 14 hours ago [-]
So far as I can tell, generative AI coding tools make the easy part of the job go faster, without helping with the hard part of the job - in fact, possibly making it harder. Coding just doesn't take that much time, and I don't need help doing it. You could make my coding output 100x faster without materially changing my overall productivity, so I simply don't bother to optimize there.
nsonha 3 hours ago [-]
No software engineer needs any help if they keep working in the same stack and problem domain that they already know front to back after a few years doing the same thing. They wouldn't need any coding tool even. But that a pretty useless thing to say. To each their own.
resource_waste 6 hours ago [-]
I have it write algorithms, explain why my code isnt working, write API calls, or make specific functions.
The entire code? Not there, but with debuggers, I've even started doing that a bit.
Jonovono 13 hours ago [-]
Are you a plumber perhaps?
kevinventullo 10 hours ago [-]
I’m not sure I follow the question. I think of plumbing as being the exact kind of verbose boilerplate that LLM’s are quite good at automating.
In contrast, when I’m trying to do something truly novel, I might spend days with a pen and paper working out exactly what I want to do and maybe under an hour coding up the core logic.
On the latter type of work, I find LLM’s to be high variance with mostly negative ROI. I could probably improve the ROI by developing a better sense of what they are and aren’t good at, but of course that itself is rapidly changing!
marssaxman 2 hours ago [-]
Not if I can help it, no; I don't have the patience.
worik 13 hours ago [-]
I am.
That is the mental model I have for the work (computer programing) i like to do and am good at.
Plumbing
Jonovono 2 hours ago [-]
I like it!
tptacek 13 hours ago [-]
I'm fine with anybody saying AI agents don't work for their work-style and am not looking to rebut this piece, but I'm going to take this opportunity to call something out.
The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.
But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.
Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.
That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.
If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".
sensanaty 9 hours ago [-]
But how is this a more efficient way of working? What if you have to have it open 30 PRs before 1 of them is acceptable enough to not outright ignore? It sounds absolutely miserable, I'd rather review my human colleague's work because in 95% of cases I can trust that it's not garbage.
The alternative where I boil a few small lakes + a few bucks in return for a PR that maybe sometimes hopefully kinda solves the ticket sounds miserable. I simply do not want to work like that, and it doesn't sound even close to efficient or speedier or anything like that, we're just creating extra work and extra waste for literally no reason other than vague marketing promises about efficiency.
kasey_junk 7 hours ago [-]
If you get to 2 or 3 and it hasn’t done what you want you fall back to writing it yourself.
But in my experience this is _signal_. If the ai cant get to it with minor back and forth then something needs work, your understanding, the specification, the tests, your code factoring etc.
The best case scenario is your agent one shots the problem. But close behind that is that your agent finds a place where a little cleanup makes everybody’s life easier you, your colleagues and the bot. And your company is now incentivized to invest in that.
The worse case is you took the time to write 2 prompts that didn’t work.
smaudet 13 hours ago [-]
I guess my challenge is that "if it was a rote recitation of an idiomatic go function", was it worth writing?
There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.
The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.
And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...
Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.
sesm 9 hours ago [-]
I would put it differently: when you already have a mental model of what the code is supposed to do and how, then reviewing is easy: just check that the code conforms to that model.
With an arbitrary PR from a colleague or security audit, you have to come up with mental model first, which is the hardest part.
tptacek 13 hours ago [-]
Yes. More things should be rote recitations. Rote code is easy to follow and maintain. We get in trouble trying to be clever (or DRY) --- especially when we do it too early.
Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.
smaudet 12 hours ago [-]
> We get in trouble trying to be clever (or DRY)
Certainly, however:
> That's the point I'm making about reviewing LLM code: you are not on the hook for making it work
The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).
Agentic AI is just yet another, as you put it way to "get in trouble trying to be clever".
My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code. If your only real use of AI is to replace template systems, congratulations on perpetuating the most over-engineered template system ever. I'll stick with a provable, free template system, or just not write the code at all.
vidarh 10 hours ago [-]
> The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).
You're missing the point.
tptacek is saying he isn't the one who needs to fix the issue because he can just reject the PR and either have the AI agent refine it or start over. Or ultimately resort to writing the code himself.
He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.
> My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code.
There's a vast chasm between simple enough that a non-AI code generator can generate it using templates and simple enough that a fast read-through is enough to show that it's okay to run.
As an example, the other day I had my own agent generate a 1kloc API client for an API. The worst case scenario other than failing to work would be that it would do something really stupid, like deleting all my files. Since it passes its tests, skimming it was enough for me to have confidence that nowhere does it do any file manipulation other than reading the files passed in. For that use, that's sufficient since it otherwise passes the tests and I'll be the only user for some time during development of the server it's a client for.
But no template based generator could write that code, even though it's fairly trivial - it involved reading the backend API implementation and rote-implementation of a client that matched the server.
smaudet 9 hours ago [-]
> But no template based generator could write that code, even though it's fairly trivial
Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...
> He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.
I think you are missing the point as well, that's still review, that's still being on the hook.
Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.
But I hear you say "all software works like that", well, yes, to some degree. The difference being, one you hopefully actually wrote and have some idea what's going wrong, the other one?
Well, you just have to sort of hope it works and when it doesn't, well you said it yourself. Your code was garbage anyways, time to "kill" it and generate some new slop...
vidarh 8 hours ago [-]
> Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...
Where is this template based code generator that can read my code, understand it, and generate a full client including a CLI, that include knowing how to format the data, and implement the required protocols?
I'm 30 years of development, I've seen nothing like it.
> I think you are missing the point as well, that's still review, that's still being on the hook.
I don't know if you're being intentionally obtuse, or what, but while, yes, you're on the hook for the final deliverable, you're not on the hook for fixinga specific instance of code, because you can just throw it away and have the AI do it all over.
The point you seem intent on missing is that the cost of throwing out the work of another developer is high, while the cost of throwing out the work of an AI assistant is next to nothing, and so where you need to carefully review a co-workers code because throwing it away and starting over from scratch is rarely an option, with AI generated code you can do that at the slightest whiff of an issue.
> Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.
No, they are not a problem at all. They point to a difference in opportunity cost. If the rate at which you kill code is too high, it's a problem irrespective of source. But the point is that this rate can be much higher for AI code than for co-workers before it becomes a problem, because the cost of starting over is orders of magnitude different, and this allows for a very different way of treating code.
> Well, you just have to sort of hope it works and when it doesn't
No, I don't "hope it works" - I have tests.
kenjackson 13 hours ago [-]
I can read code much faster than I can write it.
This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.
globnomulous 11 hours ago [-]
> I can read code much faster than I can write it.
I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.
I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.
kenjackson 5 hours ago [-]
You definitely can. For example I know x86. I can read it and understand it quite well. But if you asked me to write even a basic program in it, it would take me a considerable amount of time.
The same goes with shell scripting.
But more importantly you don’t have to understand code to the same degree and depth. When I read code I understand what the code is doing and if it looks correct. I’m not going over other design decisions or implementation strategies (unless they’re obvious). If I did that then I’d agree. Id also stop doing code reviews and just write everything myself.
autobodie 12 hours ago [-]
I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.
I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.
kenjackson 12 hours ago [-]
I only generate the code once with GenAI and typically fix a bug or two - or at worst use its structure. Rarely do I toss a full PR.
It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.
omnicognate 9 hours ago [-]
The problem is that at this stage we mostly just have people's estimates of their own success to go on, and nobody thinks they're incompetent. Nobody's going to say "AI works really well for, me but I just pump out dross my colleagues have to fix" or "AI doesn't work for me but I'm an unproductive, burnt out hack pretending I'm some sort of craftsman as the world leaves me behind".
This will only be resolved out there in the real world. If AI turns a bad developer, or even a non-developer, into somebody that can replace a good developer, the workplace will transform extremely quickly.
So I'll wait for the world to prove me wrong but my expectation, and observation so far, is that AI multiplies the "productivity" of the worst sort of developer: the ones that think they are factory workers who produce a product called "code". I expect that to increase, not decrease, the value of the best sort of developer: the ones who spend the week thinking, then on Friday write 100 lines of code, delete 2000 and leave a system that solves more problems than it did the week before.
mwcampbell 1 hours ago [-]
I aspire to live up to your description of the best sort of developer. But I think there might also be a danger that that approach can turn into an excuse for spending the week overthinking (possibly while goofing off as well; I've done it), then writing a first cut on Friday, leaving no time for the multiple iterations that are often necessary to get to the best solution. In other words, I think sometimes it's necessary to just start coding sooner than we'd like so we can start iterating toward the right solution. But that "unproductive, burnt out hack" line hits a bit too close to home for me these days, and I'm starting to entertain the possibility that an LLM-based agent might have more energy for doing those multiple iterations than I do.
autobodie 5 hours ago [-]
My experiences so far suggest that you might be right.
dagw 10 hours ago [-]
It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.
It is 100% a function of what you are trying to build, what language and libraries you are building it in, and how sensitive that thing is to factors like performance and getting the architecture just right. I've experienced building functioning systems with hardly any intervention, and repeatedly failing to get code that even compiles after over an hour of effort. There exists small, but popular, subset of programming tasks where gen AI excels, and a massive tail of tasks where it is much less useful.
stitched2gethr 12 hours ago [-]
Why would you review agent generated code any differently than human generated code?
tptacek 11 hours ago [-]
Because you don't care about the effort the agent took and can just ask for a do-over.
greybox 9 hours ago [-]
For simple tedious or rote tasks, I have templates bound to hotkeys in my IDE. They even come with configurable variable sections that you can fill in afterwards, or base on some highlighted code before hitting the hot key. Also, its free
112233 13 hours ago [-]
This is radical and healthy way to do it. Obviously wrong — reject. Obviously right — accept. In any other case — also reject, as non-obvious.
I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.
bluefirebrand 12 hours ago [-]
> Obviously right — accept.
I don't think code is ever "obviously right" unless it is trivially simple
vidarh 10 hours ago [-]
Auto-complete means having to babysit it.
The more I use this, the longer the LLM will be working before I even look at the output any more than maybe having it chug along on another screen and occasionally glance over.
My shortest runs now usually takes minutes of the LLM expanding my prompt into a plan, writing the tests, writing the code, linting its code, fixing any issues, and write a commit message before I even review things.
tptacek 13 hours ago [-]
I don't find this to be the case. I've used (and hate) autocomplete-style LLM code generation. But I can feed 10 different tasks to Codex in the morning and come back and pick out the 3-4 I think might be worth pursuing, and just re-prompt the 7 I kill. That's nothing like interactive autocomplete, and drastically faster than than I could work without LLM assistance.
monero-xmr 13 hours ago [-]
I mostly just approve PRs because I trust my engineers. I have developed a 6th sense for thousand-line PRs and knowing which 100-300 lines need careful study.
Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.
"Ship it!" - me
theK 12 hours ago [-]
I think this points out the crux of the difference of collaborating with other devs vs collaborating with am AI. The article correctly States that the AI will never learn your preferences or idiosyncrasies of the specific projects/company etc because it effectively is amnesic. You cannot trust the AI the same you trust other known collaborators because you don't have a real relationship with it.
loandbehold 10 hours ago [-]
Most AI coding tools are working on this problem. E.g. say with Claude Code you can add your preferences to claude.md file. When I notice repeatedly correcting AI's mistake I add instruction to claude.md to avoid it in the future. claude.md is exactly that: memory of your preferences, idiosyncrasies and other project-related info.
vidarh 10 hours ago [-]
I do something to the effect of "Update LLM.md with what you've learned" at the end of every session, coupled with telling it what is wrong when I reject a change. It works. It could work better, but it works.
autobodie 12 hours ago [-]
Haha, doing this with AI will bury you in a very deep hole.
roxolotl 15 hours ago [-]
> But interns learn and get better over time. The time that you spend reviewing code or providing feedback to an intern is not wasted, it is an investment in the future. The intern absorbs the knowledge you share and uses it for new tasks you assign to them later on.
This is the piece that confuses me about the comparison to a junior or an intern. Humans learn about the business, the code, the history of the system. And then they get better. Of course there’s a world where agents can do that, and some of the readme/doc solutions do that but the limitations are still massive and so much time is spent reexplaining the business context.
viraptor 13 hours ago [-]
You don't have to reexplain the business context. Save it to the mdc file if it's important. The added benefit is that the next real person looking at the code can also use that to learn - it's actually cool for having good up to date documentation is now an asset.
adastra22 12 hours ago [-]
Do you find your agent actually respecting the mdc file? I don’t.
viraptor 11 hours ago [-]
There should be no difference between the mdc and the text in the prompt. Try something drastic like "All of responses should be in Chinese". If it doesn't happen, they're not included correctly. Otherwise, yeah, they work modulo the usual issues of prompt adherence.
adastra22 10 hours ago [-]
I suspect that Cursor is summarizing the context window, and the .mdc directives are the first thing on the chopping room floor.
xarope 15 hours ago [-]
I think this is how certain LLMs end up with 14k worth of system prompts
Terr_ 13 hours ago [-]
"Be fast", "Be Cheap", "Be Good".
*dusts off hands* Problem solved! Man, am I great at management or what?
freeone3000 15 hours ago [-]
Put the business context in the system prompt.
danieltanfh95 15 hours ago [-]
AI models are fundamentally trained on patterns from existing data - they learn to recognize and reproduce successful solution templates rather than derive solutions from foundational principles. When faced with a problem, the model searches for the closest match in its training experience rather than building up from basic assumptions and logical steps.
Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.
Even when explicitly prompted to use first-principles analysis, AI models can struggle because:
- They lack the intuitive understanding of when to discard prior assumptions
- They don't naturally distinguish between surface-level similarity and deep structural similarity
- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics
This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.
Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.
adastra22 12 hours ago [-]
So are people. People are trained on existing data and learn to reproduce known solutions. They also take this to the meta level—a scientist or engineer is trained on methods for approaching new problems which have yielded success in the past. AI does this too. I’m not sure there is actually a distinction here..
danieltanfh95 7 hours ago [-]
Of course there is. Humans can pattern match as a means to save time. LLM pattern match as the only mode of communication and “thought”.
Humans are also not as susceptible to context poisoning, unlike llms.
adastra22 3 hours ago [-]
Human thought is associative (pattern matching) as well. This is very well established.
esailija 6 hours ago [-]
There is a difference between extrapolating from just a few examples vs interpolating between trillion examples
pSYoniK 10 hours ago [-]
I've been reading these posts for the past few months and the comments too. I've tried Junie a bit and I've used ChatGPT in the past for some bash scripts (which, for the most part, did what they were supposed to do), but I can't seem to find the use case.
Using them for larger bits of code feels silly as I find subtle bugs or subtle issues in places, so I don't necessarily feel comfortable passing in more things. Also, large bits of code I work with are very business logic specific and well abstracted, so it's hard to try and get ALL that context into the agent.
I guess what I'm trying to ask here is what exactly do you use agents for? I've seen youtube videos but a good chunk of those are people getting a bunch of typescript generated and have some front-end or generate some cobbled together front end that has Stripe added in and everyone is celebrating as if this is some massive breakthrough.
So when people say "regular tasks" or "rote tasks" what do you mean? You can't be bothered to write a db access method/function using some DB access library? You are writing the same regex testing method for the 50th time? You keep running into the same problem and you're still writing the same bit of code over and over again? You can't write some basic sql queries?
Also not sure about others, but I really dislike having to do code reviews when I am unable to really gauge the skill of the dev I'm reviewing. If I know I have a junior with 1-2 years maybe, then I know to focus a lot on logic issues (people can end up cobbling toghether the previous simple bits of code) and if it's later down the road at 2-5 years then I know that I might focus on patterns or look to ensure that the code meets the standards, look for more discreet or hidden bugs. With an agent output it could oscilate wildly between those. It could be a solidly written search function, well optimized or it could be a nightmarish sql querry that's impossible to untangle.
Thoughts?
I do have to say I found it good when working on my own to get another set of "eyes" and ask things like "are there more efficient ways to do X" or "can you split this larger method into multiple ones" etc
dvt 14 hours ago [-]
I'm actually quite bearish on AI in the generative space, but even I have to admit that writing boilerplate is "N" times faster using AI (use your favorite N). I hate when people claim this without any proof, so literally today this is what I asked ChatGPT:
write a stub for a react context based on this section (which will function as a modal):
```
<section>
// a bunch of stuff
</section>
```
Worked great, it created a few files (the hook, the provider component, etc.), and I then added them to my project. I've done this a zillion times, but I don't want to do it again, it's not interesting to me, and I'd have to look up stuff if I messed it up from memory (which I likely would, because provider/context boilerplate sucks).
Now, I can just do `const myModal = useModal(...)` in all my components. Cool. This saved me at least 30 minutes, and 30 minutes of my time is worth way more than 20 bucks a month. (N.B.: All this boilerplate might be a side effect of React being terrible, but that's beside the point.)
skydhash 6 hours ago [-]
For this case, i will probably lift off the example from the library docs. Or spend 5 minutes writing a bare implementation as it would be all I need at the time.
That’s an issue I have with generated code. More often, I start with a basic design that evolves based on the project needs. It’s an iterative process that can span the whole timeline. But with generated code, it’s a whole solution that fits the current needs, but it’s a pain to refactor.
Winsaucerer 13 hours ago [-]
This kind of thing is my main use, boilerplate stuff And for scripts that I don't care about -- e.g., if I need a quick bash script to do a once off task.
For harder problems, my experience is that it falls over, although I haven't been refining my LLM skills as much as some do. It seems that the bigger the project, the more it integrates with other things, the worse AI is. And moreover, for those tasks it's important for me or a human to do it because (a) we think about edge cases while we work through the problem intellectually, and (b) it gives us a deep understanding of the system.
mellosouls 3 hours ago [-]
The author makes the excellent point that LLM-coding still has a human bottleneck at the code review point - regardless of whether the issue at hand is fixed or not.
Leaving aside the fact that this isn't an LLM problem; we've always had tech debt due to cowboy devs and weak management or "commercial imperatives":
I'd be interested to know if any of the existing LLM ELO style leaderboards mark for code quality in addition to issue fixing?
The former seems a particularly useful benchmark as they become more powerful in surface abilities.
NoGravitas 2 hours ago [-]
> Leaving aside the fact that this isn't an LLM problem; we've always had tech debt due to cowboy devs and weak management or "commercial imperatives":
But this is one of the core problems with LLM coding, right? It accelerates an already broken model of software development (worse is better) rather than trying to help fix it.
frankc 16 hours ago [-]
I just don't agree with this. I am generally telling the model how to do the work according to an architecture I specify using technology I understand. The hardest part for me in reviewing someone else's code is understanding their overall solution and how everything fits together as it's not likely to be exactly the way I would have structured the code or solved the problem. However, with an LLM it generally isn't since we have pre-agreed upon a solution path. If that is not what is happening than likely you are letting the model get too far ahead.
There are other times when I am building a stand-alone tool and am fine wiht whatever it wants to do because it's not something I plan to maintain and its functional correctness is self-evident. In that case I don't even review what it's doing unless it's stuck. This is more actual vibe code. This isn't something I would do for something I am integrating into a larger system but will for something like a cli tool that I use to enhance my workflow.
ken47 16 hours ago [-]
You can pre-agree on a solution path with human engineers too, with a similar effect.
SpaceNugget 2 hours ago [-]
I think the point of the comment you replied to is that "reviewing code" is different in a regular work situation of reviewing a coworkers PR vs checking that the LLM generated something that matches what you requested.
I don't send my coworkers lists of micromanaged directions that give me a pretty clear expectation of what their PR is going to look like. I do however, occasionally get tagged on a review for some feature I had no part in designing, in a part of some code base I have almost no experience with.
Reviewing that the components you asked for do what you asked is a much easier scenario.
Maybe if people are asking an LLM to build an entire product from scratch with no guidance it would take a lot more effort to read and understand the output. But I don't think most people do that on a daily basis.
bigbuppo 13 hours ago [-]
Don't try to argue with those using AI coding tools. They don't interact well with actual humans, which is why they've been relegated to talking to the computer. We'll eventually have them all working on some busy projects to help with "marketing" to keep them distracted while the decent programmers that can actually work in a team environment can get back to useful work free of the terrible programmers and marketing departments.
wiseowise 12 hours ago [-]
> that can actually work in a team environment can get back to useful work free of the terrible programmers
Is that what you and your buddies talk about at two hour long coffee/smoke breaks while “terrible” programmers work?
bigbuppo 12 minutes ago [-]
I mostly just look at numbers every once in a while and try to keep them going in the right direction.
mdavid626 1 hours ago [-]
I fully agree. People who save lots of time, are the people who don’t care about the code. If it builds and the golden scenario works, it’s good to go. It doesn’t matter, that it’ll cost multiple times more to fix the bugs. Hey, they were fast!
zmmmmm 16 hours ago [-]
I think there's a key context difference here in play which is that AI tools aren't better than an expert on the language and code base that is being written. But the problem is that most software isn't written by such experts. It's written by people with very hazy knowledge of the domain and only partial knowledge of the languages and frameworks they are using. Getting it to be stylistically consistent or 100% optimal is far from the main problem. In these contexts AI is a huge help, I find.
kachapopopow 14 hours ago [-]
AI is a tool like any other, you have to learn to use it.
I had AI create me a k8s device plugin for supporting sr-iov only vGPU's. Something nvidia calls "vendor specific" and basically offers little to not support for in their public repositories for Linux KVM.
I loaded up a new go project in goland, opened up Junie, typed what I needed and what I have, went to make tea, came back, looked over the code to make sure it wasn't going to destroy my cluster (thankfully most operations were read-only), deployed it with the generated helm chart and it worked (nearly) first try.
Before this I really had no idea how to create device plugins other than knowing what they are and even if I did, it would have easily taken me an hour or more to have something working.
The only thing AI got wrong is that the virtual functions were symlinks and not directories.
The entire project is good enough that I would consider opensourcing it. With 2 more prompts I had configmap parsing to initialize virtual functions on-demand.
Drunkfoowl 14 hours ago [-]
[dead]
ritz_labringue 8 hours ago [-]
AI is really useful when you already know what code needs to be written. If you can explain it properly, the AI will write it faster than you can and you'll save time because it is quick to check that this is actually the code you wanted to write.
So "programming with AI" means programming in your mind and then using the AI to materialize it in the codebase.
Tzt 6 hours ago [-]
Well, kinda? I often know what chunks / functions I need, but too lazy to think how to implement them exactly, how they should works inside. Yeah, you need to have overall idea of what you are trying to make.
royal__ 16 hours ago [-]
I get confused when I see stances like this, because it gives me the sense that maybe people just aren't using coding tools efficiently.
90% of my usage of Copilot is just fancy autocomplete: I know exactly what I want, and as I'm typing out the line of code it finishes it off for me. Or, I have a rough idea of the syntax I need to use a specific package that I use once every few months, and it helps remind me what the syntax is, because once I see it I know it's right. This usage isn't really glamorous, but it does save me tiny bits of time in terms of literal typing, or a simple search I might need to do. Articles like this make me wonder if people who don't like coding tools are trying to copy and paste huge blocks of code; of course it's slower.
kibibu 16 hours ago [-]
My experience is that the "fancy autocomplete" is a focus destroyer.
I know what function I want to write, start writing it, and then bam! The screen fills with ghost text that may partly be what I want but probably not quit.
Focus shifts from writing to code review. I wrest my attention back to the task at hand, type some more, and bam! New ghost text to distract me.
Ever had the misfortune of having a conversation with a sentence-finisher? Feels like that.
Perhaps I need to bind to a hot key instead of using the default always-on setting.
---
I suspect people using the agentic approaches skip this entirely and therefore have a more pleasant experience overall.
atq2119 14 hours ago [-]
It's fascinating how differently people's brains work.
Autocomplete is a total focus destroyer for me when it comes to text, e.g. when writing a design document. When I'm editing code, it sometimes trips me up (hitting tab to indent but end up accepting a suggestion instead), but without destroying my focus.
I believe your reported experience, but mine (and presumably many others') is different.
skydhash 15 hours ago [-]
That usage is the most disruptive for me. With normal intellisense and a library you're familiar with, you can predict the completion and just type normally with minimal interruption. With no completion, I can just touch type and fix the errors after the short burst. But having whole lines pop up break that flow state.
With unfamiliar syntax, I only needs a few minutes and a cheatsheet to get back in the groove. Then typing go back to that flow state.
Typing code is always semi-unconscious. Just like you don't pay that much attention to every character when you're writing notes on paper.
Editing code is where I focus on it, but I'm also reading docs, running tests,...
aryehof 11 hours ago [-]
These days, many programmers and projects are happy to leave testing and defect discovery to end users, under the guise of “but we have unit tests and CI”. That’s exacerbated when using LLM driven code with abandon.
The author is one who appears unwilling to do so.
xpe 4 hours ago [-]
> In recent times I had to learn Rust, Go, TypeScript, WASM, Java and C# for various projects, and I wouldn't delegate this learning effort to an AI, even if it saved me time.
Either/or fallacy. There exist a varied set of ways to engage with the technology. You can read reference material and ask for summarization. You can use language models to challenge your own understanding.
Are people really this clueless? (Yes, I know the answer, but this is a rhetorical device.)
Think, people. Human intelligence is competing against artificial intelligence, and we need to step it up. Probably a good time to stop talking like we’re in Brad Pitt’s latest movie, Logical Fallacy Club. If we want to prove our value in a competitive world, we need to think and write well.
I sometimes feel like bashing flawed writing is mean, but maybe the feedback will get through. Better to set a quality bar. We should aim to be our best.
Fraterkes 4 hours ago [-]
Let me help you remove the beam from your own eye first: this comment leaves me with the impression that your writing isn’t great.
xpe 3 hours ago [-]
I welcome specific and actionable criticism. Would you like to engage with my (a) substance; (b) tone; (c) something else?
didibus 13 hours ago [-]
You could argue that AI-generated code is a black box, but let's adjust our perspective here. When was the last time you thoroughly reviewed the source code of a library you imported? We already work with black boxes daily as we evaluate libraries by their interfaces and behaviors, not by reading every line.
The distinction isn't whether code comes from AI or humans, but how we integrate and take responsibility for it. If you're encapsulating AI-generated code behind a well-defined interface and treating it like any third party dependency, then testing that interface for correctness is a reasonable approach.
The real complexity arises when you have AI help write code you'll commit under your name. In this scenario, code review absolutely matters because you're assuming direct responsibility.
I'm also questioning whether AI truly increases productivity or just reduces cognitive load. Sometimes "easier" feels faster but doesn't translate to actual time savings. And when we do move quicker with AI, we should ask if it's because we've unconsciously lowered our quality bar. Are we accepting verbose, oddly structured code from AI that we'd reject from colleagues? Are we giving AI-generated code a pass on the same rigorous review process we expect for human written code? If so, would we see the same velocity increases from relaxing our code review process amongst ourselves (between human reviewers)?
materielle 12 hours ago [-]
I’m not sure that the library comparison really works.
Libraries are maintained by other humans, who stake their reputation on the quality of the library. If a library gets a reputation of having a lax maintainer, the community will react.
Essentially, a chain of responsibility, where each link in the chain has an incentive to behave well else they be replaced.
Who is accountable for the code that AI writes?
layer8 8 hours ago [-]
Would you use a library that was written by AI without anyone having supervised it and thoroughly reviewed the code? We are using libraries without checking its source code because of the human thought process and quality control that has gone into it, and existing reputation. Nobody would use a library that no one else has ever seen and whose source code no human has ever laid their eyes on. (Excluding code generated by deterministic vetted tools here, like transpilers or parser generators.)
bluefirebrand 12 hours ago [-]
> When was the last time you thoroughly reviewed the source code of a library you imported?
Doesn't matter, I'm not responsible for maintaining that particular code
The code in my PRs has my name attached, and I'm not trusting any LLM with my name
didibus 12 hours ago [-]
Exactly, that's what I'm saying. Commit AI code under its own name. Then the code under your name can use the AI code as a black box. If your code that uses AI code works as expected, it is similar to when using libraries.
If you consider that AI code is not code any human needs to read or later modify by hand, AI code is modified by AI. All you want to do is just fully test it, if it all works, it's good. Now you can call into it from your own code.
benediktwerner 11 hours ago [-]
I don't see what that does. The AI hardly cares about it's reputation and I also can't really blame the AI when my boss or a customer asks me why something failed, so what does committing under its name do?
I'm ultimately still responsible for the code. And unlike AI, library authors but their and their libraries reputation on the line.
adastra22 12 hours ago [-]
These days, I review external dependencies pretty thoroughly. I did not use to. This is because of AI slop though.
jpcrs 10 hours ago [-]
I use AI daily, currently paying for Claude Code, Gemini and Cursor. It really helps me on my personal toy projects, it’s amazing at getting a POC running and validate my ideas.
My company just had internal models that were mediocre at best, but at the beginning this year they finally enabled Copilot for everyone.
At the beginning I was really excited for it, but it’s absolutely useless for work. It just doesn’t work on big old enterprise projects. In an enterprise environment everything is composed of so many moving pieces, knowledge scattered across places, internal terminology, etc. Maybe in the future, with better MCP servers or whatever, it’ll be possible to feed all the context into it to make it spit something useful, but right now, at work, I just use AI as search engine (and it’s pretty good at it, when you have the knowledge to detect when it have subtle problems)
HPsquared 10 hours ago [-]
I think a first step for these big enterprise codebases (also applicable to documentation) is to collect it into a big ball and finetune on it.
redhale 8 hours ago [-]
This line by the author, in response to one of the comments, betrays the core of the article imo:
> The quality of the code these tools produce is not the problem.
So even if an AI could produce code of a quality equal to or surpassing the author's own code quality, they would still be uninterested in using it.
To each their own, but it's hard for me to accept an argument that such an AI would provide no benefit, even if one put priority on maintaining high quality standards. I take the point that the human author is ultimately responsible, but still.
throwaway12345t 5 hours ago [-]
I don’t understand this one at all. Say you need to update a somewhat unique implementation of a component across 5 files. In pseudocode, it might take you 30 seconds to type out whatever needs to be done. It would take maybe 3-4 minutes to do it.
I set that up to run then do something different. I come back in a couple minutes, scan the diffs which match expectations and move on to the next task.
That’s not everything but those menial tasks where you know what needs to be done and what the final shape should look like are great for AI. Pass it off while you work on more interesting problems.
cwoolfe 5 hours ago [-]
I have found AI generated code to be overly verbose and complex. It usually generates 100 lines and I take a few of them and adapt them to what I want. The best cases I've found for using it are asking specific technical questions, helping me learn a new code language, having it generate ideas on how to solve a problem for brainstorming. It also does well with bounded algorithmic problems that are well specified i.e. write a function that takes inputs and produces outputs according to xyz. I've found it's usually sorely lacking in domain knowledge (i.e. it is not an expert on the iOS SDK APIs, not an expert in my industry, etc.)
mettamage 5 hours ago [-]
My heuristic: the more you're solving a solved problem that is just tedious work and memory intensive take a crack at using AI. It will probably one shot your solution with minimal tweaks required.
The more you deviate from that, the more you have to step in.
But given that I constantly forget how to open a file in Python, I still have a use for it. It basically supplanted Stackoverflow.
ukprogrammer 7 hours ago [-]
> “It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.”
There’s your issue, the skill of programming has changed.
Typing gets fast; so does review once robust tests already prove X, Y, Z correctness properties.
With the invariants green, you get faster at grokking the diff, feed style nits back into the system prompt, and keep tuning the infinite tap to your taste.
osigurdson 3 hours ago [-]
That is my experience with Windsurf / Cursor type tools. Faster for some things but generally super annoying and slow.
The Codex workflow however really is a game changer imo. It takes the time to ensure changes are consistent with other code and the async workflow is just so much nicer.
Kiro 10 hours ago [-]
> I believe people who claim that it makes them faster or more productive are making a conscious decision to relax their quality standards to achieve those gains.
Yep, this is pretty much it. However, I honestly feel that AI writes so much better code than me that I seldom need to actually fix much in the review, so it doesn't need to be as thorough. AI always takes more tedious edge-cases into account and applies best practices where I'm much sloppier and take more shortcuts.
animex 12 hours ago [-]
I write mostly boilerplate and I'd rather have the AI do it. The AI is also slow, which is great, which allows me to run 2 or 3 AI workspaces working on different tickets/problems at the same time.
Where AI especially excels is helping me do maintenance tickets on software I rarely touch (or sometimes never have touched). It can quickly read the codebase, and together we can quickly arrive at the place where the patch/problem lies and quickly correct it.
I haven't written anything "new" in terms of code in years, so I'm not really learning anything from coding manually but I do love solving problems for my customers.
ed_mercer 15 hours ago [-]
> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.
Hard disagree. It's still way faster to review code than to manually write it. Also the speed at which agents can find files and the right places to add/edit stuff alone is a game changer.
Winsaucerer 13 hours ago [-]
There's a difference between reviewing code by developers you trust, and reviewing code by developers you don't trust or AI you don't trust.
Although tbh, even in the worse case I think I am still faster at reviewing than writing. The only difference is though, those reviews will never have had the same depth of thought and consideration as when I write the code myself. So reviews are quicker, but also less thorough/robust than writing for me.
bluefirebrand 12 hours ago [-]
> also less thorough/robust than writing for me.
This strikes me as a tradeoff I'm absolutely not willing to make, not when my name is on the PR
sensanaty 8 hours ago [-]
I'm fast at reviewing PRs because I know the person on the other end and can trust that they got things correctly. I'll focus on the meaty, tricky parts of their PR, but I can rest assured that they matched the design, for example, and not have to verify every line of CSS they wrote.
This is a recipe for disaster with AI agents. You have to read every single line carefully, and this is much more difficult for the large majority of people out there than if you had written it yourself. It's like reviewing a Junior's work, except I don't mind reviewing my Junior colleague's work because I know they'll at least learn from the mistakes and they're not a black box that just spews bullshit.
__loam 15 hours ago [-]
You are probably not being thorough enough.
zacksiri 8 hours ago [-]
LLMs are relatively new technology. I think it's important to recognize the tool for what it is and how it works for you. Everyone is going to get different usage from these tools.
What I personally find is. It's great for helping me solve mundane things. For example I'm recently working on an agentic system and I'm using LLMs to help me generate elasticsearch mappings.
There is no part of me that enjoy making json mappings, it's not fun nor does it engage my curiosity as a programmer, I'm also not going to learn much from generating elasticsearch mappings over and over again. For problems like this, I'm happy to just let the LLM do the job. I throw some json at it and I've got a prompt that's good enough that it will spit out results deterministically and reliably.
However if I'm exploring / coding something new, I may try letting the LLM generate something. Most of the time though in these cases I end up hitting 'Reject All' after I've seen what the LLM produces, then I go about it in my own way, because I can do better.
It all really depends on what the problem you are trying to solve. I think for mundane tasks LLMs are just wonderful and helps get out of the way.
If I put myself into the shoes of a beginner programmer LLMs are amazing. There is so much I could learn from them. Ultimately what I find is LLMs will help lower the barrier of entry to programming but does not mitigate the need to learn to read / understand / reason about the code. Beginners will be able to go much further on their own before seeking out help.
If you are more experienced you will probably also get some benefits but ultimately you'd probably want to do it your own way since there is no way LLMs will replace experienced programmer (not yet anyway).
I don't think it's wise to completely dismiss LLMs in your workflow, at the same time I would not rely on it 100% either, any code generated needs to be reviewed and understood like the post mentioned.
thefz 4 hours ago [-]
I learned C# first then async/await then the TPL then MVVM by banging my head against the problems I had to solve. I still retain the knowledge to this day because I had to think long and hard and test a lot, prototype and verify.
Having a chatbot telling me what to write would have not sorted the same effect.
It's like having someone tell you the solutions to your homework.
conductr 3 hours ago [-]
Has a lot to do with what you’re building. It does front end and crud apps pretty well. Things like games and more complex programs I feel like I’m fighting it more and should just write the code myself
Zaylan 10 hours ago [-]
I've had a similar experience. These tools are pretty helpful for small scripts or quick utility code, but once you're working on something with a more complex structure and lots of dependencies, they tend to slow down. Sometimes it takes more effort to fix what they generate than to just write it myself.
I still use them, but more as a support tool than a real assistant.
10 hours ago [-]
handfuloflight 16 hours ago [-]
Will we be having these conversations for the next decade?
wiseowise 12 hours ago [-]
It’s the new “I use Vim/Emacs/Ed over IDE”.
ken47 16 hours ago [-]
Longer.
adventured 16 hours ago [-]
The conversations will climb the ladder and narrow.
Eventually: well, but, the AI coding agent isn't better than a top 10%/5%/1% software developer.
And it'll be that the coding agents can't do narrow X thing better than a top tier specialist at that thing.
The skeptics will forever move the goal posts.
jdbernard 15 hours ago [-]
If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.
However, assuming we are still having this conversation, that alone is proof to me that the AI is not that capable. We're several years into "replace all devs in six months." We will have to continue wait and see it try and do.
ukprogrammer 7 hours ago [-]
> If the AI actually outperforms humans in the full context of the work, then no, we won't. It will be so much cheaper and faster that businesses won't have to argue at all. Those that adopt them will massively outcompetes those that don't.
This. The dev's outcompeting by using AI today are too busy shipping, rather than wasting time writing blog posts about what ultimately, is a skill-issue.
wiseowise 12 hours ago [-]
> If the AI actually outperforms humans in the full context of the work, then no, we won't.
IDEs outperform any “dumb” editor in full context of work. You don’t see any less posts about “I use Vim, btw” (and I say this as Vim user).
karl11 14 hours ago [-]
There is an important concept alluded to here around skin in the game: "the AI is not going to assume any liability if this code ever malfunctions" -- it is one of the issues I see w/ self-driving cars, planes, etc. If it malfunctions, there is no consequence for the 'AI' (no skin in the game) but there are definitely consequences for any humans involved.
nottorp 10 hours ago [-]
> The problem is that I'm going to be responsible for that code, so I cannot blindly add it to my project and hope for the best.
Responsability and "AI" marketing are two non intersecting sets.
edg5000 12 hours ago [-]
It's a bit like going from assembly to C++, except we don't have good rigid rules for high-level program specification. If we had a rigid "high-level language" to express programs, orders or magnitude more high-level than C++ and other, than we could maybe evaluate it for correctness and get 100% output reliability, perhaps. All the languages I picked up, I picked them up when they were at least 10 years old. I'm trying to use AI a bit these days for programming, but it feels like what it must have felt like using C++ when it just came available; promising but not usable (yet?) for most programming situations.
mentalgear 6 hours ago [-]
From all my experience over several years, the best stance towards AI assisted development is: "Trust, but verify" (each change). Which is in stark contrast of brittle "vibe coding" (which might work for demos but nothing else).
Tzt 6 hours ago [-]
What do you mean several years, it became feasible like 6 months ago lol. No, gpt3.5 doesn't count, it's a completely useless thing.
bawana 5 hours ago [-]
Is AI's relationship to knowledge the same as an index fund is to equities? Does the fact that larger and larger groups of people use AI result in more homogeneous and 'blindered' thinking?
fshafique 16 hours ago [-]
"do not work for me", I believe, is the key message here. I think a lot of AI companies have crafted their tools such that adoption has increased as the tools and the output got better. But there will always be a few stragglers, non-normative types, or situations where the AI agent is just not suitable.
lexandstuff 14 hours ago [-]
Maybe, but there's also some evidence that AI coding tools aren't making anyone more productive. One study from last year found that there was no increase in developer velocity but a dramatic increase in bugs.[1] Granted, the technology has advanced since this study, but many of the fundamental issues of LLM unreliability remain. Additionally, a recent study has highlighted the significant cognitive costs associated with offloading problem-solving onto LLMs, revealing that individuals who do so develop significantly weaker neural connectivity than those who don't [2].
It's very possible that AI is literally making us less productive and dumber. Yet they are being pushed by subscription-peddling companies as if it is impossible to operate without them. I'm glad some people are calling it out.
One year ago I probably would've said the same. But I started dabbling with it recently, and I'm awed by it.
afarviral 13 hours ago [-]
This has been my experience as well, but there are plenty of assertions here that are not always true, e.g. "AI coding tools are sophisticated enough (they are not) to fix issues in my projects" … but how do you know this if you are not constantly checking whether the tooling has improved? I think for a certain level of issue AI can tackle it and improve things, but there's only a subset of the available models and of a multitude of workflows that will work well, but unfortunately we are drowning in many that are mediocre at best and many like me give up before finding the winning combination.
layer8 8 hours ago [-]
You omitted “with little or no supervision”, which I think is crucial to that quote. It’s pretty undisputed that having an AI fix issues in your code requires some amount of supervision that isn’t negligible. I.e. you have to review the fixes, and possibly make some adjustments.
15 hours ago [-]
noiv 10 hours ago [-]
I've started to finish some abandoned half-ready side projects with Claude Pro on Desktop with filesystem MCP. Used to high quality code, it took me some time to teach Claude to follow conventions. Now it works like a charm, we work on a requirements.md until all questions are answered and then I let Claude go. Only thing left is convincing clients to embrace code assistents.
block_dagger 15 hours ago [-]
> For every new task this "AI intern" resets back to square one without having learned a thing!
I guess the author is not aware of Cursor rules, AGENTS.md, CLAUDE.md, etc. Task-list oriented rules specifically help with long term context.
adastra22 12 hours ago [-]
Do they? I have found that with Cursor at least, the model very quickly starts ignoring rules.
stray 15 hours ago [-]
You can lead a horse to the documentation, but you can't make him think.
wiseowise 12 hours ago [-]
Think is means to an end, not the end goal.
Or are you talking about OP not knowing AI tools enough?
b0a04gl 14 hours ago [-]
clarity is exactly why ai tools could work well for anyone. they're not confused users , they know what they want and that makes them ideal operators of these systems. if anything, the friction they're seeing isn't proof the tools are broken, it's proof the interface is still too blunt. you can't hand off intent without structure. but when someone like uses ai with clean prompts, tight scope, and review discipline, the tools usually align. it's not either-or. the tools aren't failing them, they're underutilissed.
zengyue 8 hours ago [-]
I think it is more suitable for creation rather than modification, so when repeated attempts still don't work, I will delete it and let it rewrite, which often solves the problem.
edg5000 12 hours ago [-]
A huge bottleneck seems the lack of memory between sessions, at least with Claude Code. Sure, I can write things into a text file, but it's not the same as having an AI actually remember the work done earlier.
Is this possible in any way today? Does one need to use Llama or DeepSeek, and do we have to run it on our own hardware to get persistence?
Aeolun 10 hours ago [-]
> The part that I enjoy the most about working as a software engineer is learning new things, so not knowing something has never been a barrier for me.
To me the part I enjoy most is making things. Typing all that nonsense out is completely incidental to what I enjoy about it.
sagarpatil 14 hours ago [-]
What really baffles me is the claims from:
Anthropic: 80% of the code is generated by AI
OpenAI: 70-80%
Google/Microsoft: 30%
root_axis 13 hours ago [-]
The use of various AI coding tools is so diffuse that there isn't even a practical way to measure this. You can be assured those numbers are more or less napkin math based on some arbitrary AI performance factor applied to the total code writing population of the company.
layer8 7 hours ago [-]
Microsoft and Google have the much larger and older code bases.
nojs 14 hours ago [-]
This does not contradict the article - it may be true, and yet not significantly more productive, because of the increased review burden.
s_ting765 10 hours ago [-]
Author makes very good points. Someone has to be responsible for the AI generated code, and if it's not going to be you then no one should feel obligated to pull the auto-generated PR.
euleriancon 14 hours ago [-]
> The truth that may be shocking to some is that open source contributions submitted by users do not really save me time either, because I also feel I have to do a rigorous review of them.
This truly is shocking. If you are reviewing every single line of every package you intend to use how do you ever write any code?
adastra22 12 hours ago [-]
That’s not what he said. He said he reviews every line of every pull request he receives to his own projects. Wouldn’t you?
abenga 13 hours ago [-]
You do not need to review every line of every package you use, just the subset of the interface you import/link and use. You have to review every line of code you commit into your project. I think attempting to equate the two is dishonest dissembling.
euleriancon 13 hours ago [-]
To me, the point the friend is making is, just like you said, that you don't need to review every line of code in a package, just the interface.
The author misses the point that there truly is code that you trust without seeing it.
At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state.
As AI code becomes more reliable, it will likely become the case that you only need to read the subset of the interface you import/link and use.
bluefirebrand 12 hours ago [-]
This absolutely is intrinsic to the workflow
Using a package that hundreds of thousands of other people use is low risk, it is battle tested
It doesn't matter how good AI code gets, a unique solution that no one else has ever touched is always going to be more brittle and risky than an open source package with tons of deployments
And yes, if you are using an Open Source package that has low usage, you should be reviewing it very carefully before you embrace it
Treat AI code as if you were importing from a git repo with 5 installs, not a huge package with Mozilla funding
root_axis 13 hours ago [-]
> At the moment AI code isn't as trustworthy as a well tested package but that isn't intrinsic to the technology, just a byproduct of the current state
This remains to be seen. It's still early days, but self-attention scales quadratically. This is a major red flag for the future potential of these systems.
joelthelion 11 hours ago [-]
I think it's getting clear that, in the current stage, Ai coding agent are mostly useful for people working either on small projects, or isolated new features. People who maintain a large framework find it less useful.
xpe 5 hours ago [-]
> Unfortunately these claims are just based on the perception of the subjects themselves, so there is no hard data to back them up.
Did the author take their own medicine and measure their own productivity?
scotty79 2 hours ago [-]
> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.
Even if that was true for everybody reviews would still be worth doing because when the code is reviewed it gets more than one pair of eyes looking at it.
So it's still worth using AI even if it's slower than writing code yourself. Because you wouldn't have made mistakes that AI would made and AI wouldn't make mistakes you would have made.
It still might be personally not worth it for you though if you prefer to write code than to read it. Until you can set up AI as a reviewer for yourself.
lvl155 8 hours ago [-]
Analogous to assembly, we need standardized AI language/styles.
skydhash 16 hours ago [-]
I do agree with these points in my situation. I don't actually care for speed or having generated snippets for unfamiliar domains. Coding for me has always be about learning. Whether I'm building out a new feature or solving a bug, programming is always a learning experience. The goal is to bring forth a solution that a computer can then perform, but in the process you learn about how and more importantly why you should solve a problem.
The concept of why can get nebulous in a corporate setting, but it's nevertheless fun to explore. At the end of the day, someone have a problem and you're the one getting the computer to solve it. The process of getting there is fun in a way that you learn about what irks someone else (or yourself).
Thinking about the problem and its solution can be augmented with computers (I'm not remembering Go Standard Library). But computers are simple machines with very complex abstractions built on top of them. The thrill is in thinking in terms of two worlds, the real one where the problem occurs and the computing one where the solution will come forth. The analogy may be more understandable to someone who've learned two or more languages and think about the nuances between using them to depict the same reality.
Same as the TFA, I'm spending most of my time manipulating a mental model of the solution. When I get to code is just a translation. But the mental model is difuse, so getting it written gives it a firmer existence. LLMs generation is mostly disrupting the process. The only way they help really is a more pliable form of Stack Overflow, but I've only used Stack Overflow as human-authored annotations of the official docs.
cinbun8 10 hours ago [-]
As someone who heavily utilizes AI for writing code, I disagree with all the points listed. AI is faster, a multiplier, and in many instances, the equivalent of an intern. Perhaps the code it writes is not like the code written by humans, but it serves as a force multiplier. Cursor makes $500 million for a reason.
p1dda 14 hours ago [-]
It would be interesting to see which is faster/better in competitive coding, the human coder or the human using AI to assist in coding.
Apparently models are not doing great for problems out of distribution.
p1dda 9 hours ago [-]
It goes to show that the LLMs aren't intelligent in the way humans are. LLMs are a really great replacement for googling though
asciimov 14 hours ago [-]
It would only be interesting if the problem was truly novel. If the AI has already been trained on the problem it’ll just push out a solution.
wiseowise 12 hours ago [-]
It already happened. Last year AI submissions completely destroyed AoC, as far as I remember.
nilirl 9 hours ago [-]
The main claim made: When there's money or reputation to be lost, code requires the same amount of cognition; irrespective of who wrote the code, AI or not.
Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.
The real problem: Disinformation. Needless extrapolation, poor analogies, over valuing anecdotes.
But there's money to be made. What can we do, sometimes the invisible hand slaps us silly.
freehorse 9 hours ago [-]
> Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.
Counter counter claim for these use cases: when I do proof of concept, I actually want to increase my understanding of said concept at the same time, learn challenges involved, and in general get a better idea how feasible things are. An AI can be useful for asking questions, asking for reviews, alternative solutions, inspiration etc (it may have something interesting to add or not) but if we are still in the territory "this matters" I would rather not substitute the actual learning experience and deeper understanding with having an AI generate code faster. Similar for hobby projects, do I need that thing to just work or I actually care to learn how it is done? If the learning/understanding is not important in a context, I would say then using AI to generate the code is a great time-saver. Otherwise, I may still use AI but not in the same way.
nilirl 8 hours ago [-]
Fair. I rescind those examples and revise my counter: When you gain much more from speed than you lose with errors, AI makes sense.
Revised example: Software where the goal is design experimentation; like with trying out variations of UX ideas.
dpcan 14 hours ago [-]
This article is just simply not true for most people who have figured out how to use AI properly when coding. Since switching to Cursor, my coding speed and efficiency has probably increased 10x conservatively. When I'm using it to code in languages I've used for 25+ years, it's a breeze to look over the function it just saved me time by pre-thinking and typing it out for me. Could I have done it myself, yeah, but it would have taken longer if I even had to go lookup one tiny thing in the documentation, like order of parameters for a function, or that little syntax thing I never use...
Also, the auto-complete with tools like Cursor are mind blowing. When I can press tab to have it finish the next 4 lines of a prepared statement, or it just knows the next 5 variables I need to define because I just set up a function that will use them.... that's a huge time saver when you add it all up.
My policy is simple, don't put anything AI creates into production if you don't understand what it's doing. Essentially, I use it for speed and efficiency, not to fill in where I don't know at all what I'm doing.
amlib 13 hours ago [-]
What do you even mean with a 10x increase in efficiency? Does that means you commit 10x more code every day? Or that "you" essentially "type" code 10x faster? In the later case all the other tasks surrounding code would still take around the same netting you much less than 10x increase in overall productivity, probably less than 2x?
dpcan 4 hours ago [-]
My favorite example, and the ones I show my team and my employer, is that I can have AI look at a string of fields for my database table and generate all the views for the display, add, and edit forms for those fields in exactly the way I instruct, and that saves me as much as 30 minutes every time I do it. If I do this 8 times in a day, that would save me about 4 hours. Especially when those forms require things like lookups and extra JavaScript functionality.
Another great example, is the power of tabbing with Cursor. If I want to change the parameters of a function in my React app, I can be at one of the functions anywhere in my screen, add a variable that relates to what is being rendered, and I can now quickly tab through to find all the spots that also are affected in that screen, and then it usually helps apply the changes to the function. It's like smart search and replace where I can see every change that needs made but it knows how to make it more intelligently than just replacing a line of code - and I didn't have to write the regex to find it, AND it usually helps get the work done in the function as well to reflect the change. That could save me 3-5 minutes, and I could do that 5 times a day maybe, and another almost half-hour is saved.
The point is, these small things add up SO fast. Now I'm incredibly efficient because the tedious part of programming has been sped up so much.
asciimov 14 hours ago [-]
Out of curiosity how much are you spending on AI?
How much do you believe a programmer needs to layout to “get good”?
dpcan 4 hours ago [-]
I have a $20/month GPT subscription, and the $20/month cursor plan. I've yet to come close to going over my limits with either service. I use the unlimited Tab completions in cursor which are what end up saving me an enormous amount of time. I probably use 5 to maybe 10 chats a day in cursor, but I jump over to GPT if I think I'm going to require a few extra chats to get to the bottom of something.
I think that getting "good" at using AI means that you figure out exactly how to formulate your prompts so that the results are what you are looking for given your code base. It also means knowing when to start new chats, and when to have it focus on very specific pieces of code, and finally, knowing what it's really bad at doing.
For example, if I need to have it take a list of 20 fields and create the HTML view for the form, it can do it in a few seconds, and I know to tell it, for example, to use Bootstrap, Bootstrap icons, Bootstrap modals, responsive rows and columns, and I may want certain fields aligned certain ways, buttons in certain places for later, etc, and then I have a form - and just saved myself probably 30 minutes of typing it out and testing the alignment etc. If I do things like this 8 times a day, that's 4 hours of saved time, which is game changing for me.
epiccoleman 13 hours ago [-]
I am currently subscribed to Claude Pro, which is $20/mo and gives you plenty to experiment with by giving you access to Projects and MCP in Claude Desktop and also Claude Code for a flat monthly fee. (I think there are usage limits but I haven't hit them).
I've probably fed $100 in API tokens into the OpenAI and Anthropic consoles over the last two years or so.
I was subscribed to Cursor for a while too, though I'm kinda souring on it and looking at other options.
At one point I had a ChatGPT pro sub, I have found Claude more valuable lately. Same goes for Gemini, I think it's pretty good but I haven't felt compelled to pay for it.
I guess my overall point is you don't have to break the bank to try this stuff out. Shell out the $20 for a month, cancel immediately, and if you miss it when it expires, resub. $20 is frankly a very low bar to clear - if it's making me even 1% more productive, $20 is an easy win.
hooverd 15 hours ago [-]
The moat is that juniors, never having worked without these tools, provide revenue to AI middlemen. Ideally they're blasting their focus to hell on short form video and stimulants, and are mentally unable to do the job without them.
Terr_ 13 hours ago [-]
Given some the creeping appeal of LLMs as cheating tools in education, some of them may be arriving in the labor market with their brains already cooked.
wilkinsonsmooth 3 hours ago [-]
One thing about AI that I feel like no company that is already inserting into their workforce is thinking about is what the future looks like when your company depends on it. If AI is doing the work that junior employees used to do, then you are losing the base knowledge that your employees used to learn. Maybe in the coming years it starts to take over more and more roles that people used to do and companies can decrease their workforce. AI comes a lot cheaper than real people (at least that's the selling point).
Most tech companies however tend to operate following a standard enshittification schedule. First they are very cheap, supported by investments and venture capitalists. Then they build a large user base who become completely dependent on them as alternatives disappear (in this case as they lose the institutional knowledge that their employees used to have). Then they seek to make money so the investors can make their profits. In this case I could see the cost of AI rising a lot, after companies have already built it in to their business. AI eventually has to start making money. Just like Amazon had to, and Facebook, and Uber, and Twitter, and Netflix, etc.
From all the talk I see of companies embracing AI wholeheartedly it seems like they aren't looking any further than the next quarter. It only costs so much per month to replace so many man hours of work! I'm sure that won't last once AI is deeply embedded into so many businesses that they can start charging whatever they want to.
bilalq 13 hours ago [-]
I've found "agents" to be an utter disappointment in their current state. You can never trust what they've done and need to spend so much time verifying their solution that you may as well have just done it yourself in the first place.
However, AI code reviewers have been really impressive. We run three separate AI reviewers right now and are considering adding more. One of these reviewers is kind of noisy, so we may drop it, but the others have been great. Sure, they have false positives sometimes and they don't catch everything. But they do catch real issues and prevent customer impact.
The Copilot style inline suggestions are also decent. You can't rely on it for things you don't know about, but it's great at predicting what you were going to type anyway.
nreece 13 hours ago [-]
Heard someone say the other day "AI coding is just advanced scaffolding right now." Made me wonder if we're expecting too much out of it, at-least for now.
varispeed 6 hours ago [-]
What is the purpose of this article?
bdamm 16 hours ago [-]
No offense intended, but this is written by a guy who has the spare time to write the blog. I can only assume his problem space is pretty narrow. I'm not sure what his workflow is like, but personally I am interacting with so many different tools, in so many different environments, with so many unique problem sets, that being able to use AIs for error evaluation, and yes, for writing code, has indeed been a game changer. In my experience it doesn't replace people at all, but they sure are powerful tools. Can they write unsupervised code? No. Do you need to read the code they write? Yes, absolutely. Can the AIs produce bugs that take time to find? Yes.
But despite all that, the tools can find problems, get information, and propose solutions so much faster and across such a vast set of challenges that I simply cannot imagine going back to working without them.
This fellow should keep on working without AIs. All the more power to him. And he can ride that horse all the way into retirement, most likely. But it's like ignoring the rise of IDEs, or Google search, or AWS.
ken47 16 hours ago [-]
> rise of IDEs, or Google search, or AWS.
None of these things introduced the risk of directly breaking your codebase without very close oversight. If LLMs can surpass that hurdle, then we’ll all be having a different conversation.
stray 14 hours ago [-]
A human deftly wielding an LLM can surpass that hurdle. I laugh at the idea of telling Claude Code to do the needful and then blindly pushing to prod.
bdamm 15 hours ago [-]
This is not the right way to look at it. You don't have to have the LLMs directly coding your work unsupervised to see the enormous power that is there.
And besides, not all LLMs are the same when it comes to breaking existing functions. I've noticed that Claude 3.7 is far better at not breaking things that already work than whatever it is that comes with Cursor by default, for example.
wiseowise 12 hours ago [-]
Literally everything in this list, except AWS, introduces risk of breaking your code base without close oversight. Same people who copy paste LLM code into IDEs are yesterday’s copy paste from SO and random Google searches.
satisfice 15 hours ago [-]
You think he's not using the tools correctly. I think you aren't doing your job responsibly. You must think he isn't trying very hard. I think you are not trying very hard...
That is the two sides of the argument. It could only be settled, in principle, if both sides were directly observing each other's work in real-time.
But, I've tried that, too. 20 years ago in a debate between dedicated testers and a group of Agilists who believed all testing should be automated. We worked together for a week on a project, and the last day broke down in chaos. Each side interpreted the events and evidence differently. To this day the same debate continues.
bdamm 49 minutes ago [-]
I am absolutely responsible for my work. That's why I spend so much time reading the code that I and others on my team write, and it's why I spend so much time building enormous test systems, and pulling deeply on the work of others. Thousands and thousands of hours go into work that the customer will never see, because I am responsible.
People's lives are literally at stake. If my systems screw up, people can die.
And I will continue to use AI to help get through all that. It doesn't make me any less responsible for the result.
nurettin 12 hours ago [-]
I simply don't like the code it writes. Whenever I try using llms, it is like wrestling for conciseness. Terrible code which is almost certainly 1/10 error or "extras" I don't need. At this point I am simply using it to motivate me to move forward.
Writing a bunch of orm code feels boring? I make it generate the code and edit. Importing data? I just make it generate inserts. New models are good at reformatting data.
Using a third party Library? I force it to look up every function doc online and it still has errors.
Adding transforms and pivots to sql while keeping to my style? It is a mess. Forget it. I do that by hand.
worik 13 hours ago [-]
There are tasks I find AI (I use DeepSeek) useful for
I have not found it useful for large programming tasks. But for small tasks, a sort of personalised boiler plate, I find it useful
abalashov 4 hours ago [-]
I think the point about owning the code is the significant one. If you’re just doing some throwaway prototyping or trying stuff, fine. But if you really need to commit to ownership and maintenance and care and feeding of this code, best just write it yourself, if only for the reason that writing it engenders the appropriate level of understanding while removing the distraction of AI slop code review.
Where I find it genuinely useful is in extremely low-value tasks, like localisation constants for the same thing in other languages, without having to tediously run that through an outside translator. I think that mostly goes in the "fancy inline search" category.
Otherwise, I went back from Cursor to normal VS Code, and mostly have Copilot autocompletions off these days because they're such a noisy distraction and break my thought process. Sometimes they add something of value, sometimes not, but I'd rather not have to confront that question with every keystroke. That's not "10x" at all.
Yes, I've tried the more "agentic" workflow and got down with Claude Code for a while. What I found is that its changes are so invasive and chaotic--and better prompts don't really prevent this--that it has the same implications for maintainability and ownership referred to above. For instance, I have a UIKit-based web application to which I recently asked Claude Code to add dark theme options, and it rather brainlessly injected custom styles into dozens of components and otherwise went to town, in a classic "optimise for maximum paperclip production" kind of way. I spent a lot more time un-F'ing what it did throughout the code base than I would have spent adding the functionality myself in an appropriately conservative fashion. Sure, a better prompt would probably have helped, but that would have required knowing what chaos it was going to wreak in advance, as to ask it to refrain from that as part of the prompt. The possibility of this happening with every prompt is not only daunting, but a rabbit hole of cognitive load that distracts from real work.
I will concede it does a lot better--occasionally, very impressively--with small and narrow tasks, but those tasks at which it most excels are so small that the efficiency benefit of formulating the prompt and reviewing the output is generally doubtful.
There are those who say these tools are just in their infancy, AGI is just around the corner, etc. As far as I can tell from observing the pace of progress in this area (which is undeniably impressive in strictly relative terms), this is hype and overextrapolation. There are some fairly obvious limits to their training and inference, and any programmer would be wise to keep their head down, ignore the hype, use these tools for what they're good at and studiously avoid venturing into "fundamentally new ways of working".
satisfice 15 hours ago [-]
Thank you for writing what I feel and experience, so that I don't have to.
Which is kind of like if AI wrote it: except someone is standing behind those words.
globnomulous 12 hours ago [-]
Decided to post my comment here rather than on the author's blog. Dang and tonhow, if the tone is too personal or polemical, I apologize. I don't think I'm breaking any HN rules.
Commenter Doug asks:
> > what AI coding tools have you utilized
Miguel replies:
> I don't use any AI coding tools. Isn't that pretty clear after reading this blog post?
Doug didn't ask what tools you use, Miguel. He asked which tools you have used. And the answer to that question isn't clear. Your post doesn't name the ones you've tried, despite using language that makes clear you that you have in fact used them (e.g. "my personal experience with these tools"). Doug's question isn't just reasonable. It's exactly the question an interested, engaged reader will ask, because it's the question your entire post begs.
I can't help but point out the irony here: you write a great deal on the meticulousness and care with which you review other people's code, and criticize users of AI tools for relaxing standards, but the AI-tool user in your comments section has clearly read your lengthy post more carefully and thoughtfully than you read his generous, friendly question.
And I think it's worth pointing out that this isn't the blog post's only head scratcher. Take the opening:
> People keep asking me If I use Generative AI tools for coding and what I think of them, so this is my effort to put my thoughts in writing, so that I can send people here instead of having to repeat myself every time I get the question.
Your post never directly answers either question. Can I infer that you don't use the tools? Sure. But how hard would it be to add a "no?" And as your next paragraph makes clear, your post isn't "anti" or "pro." It's personal -- which means it also doesn't say much of anything about what you actually think of the tools themselves. This post won't help the people who are asking you whether you use the tools or what you think of them, so I don't see why you'd send them here.
> my personal experience with these tools, from a strictly technical point of view
> I hope with this article I've made the technical issues with applying GenAI coding tools to my work clear.
Again, that word: "clear." No, the post not only doesn't make clear the technical issues; it doesn't raise a single concern that I think can properly be described as technical. You even say in your reply to Doug, in essence, that your resistance isn't technical, because for you the quality of an AI assistant's output doesn't matter. Your concerns, rather, are practical, methodological, and to some extent social. These are all perfectly valid reasons for eschewing AI coding assistants. They just aren't technical -- let alone strictly technical.
I write all of this as a programmer who would rather blow his own brains out, or retire, than cede intellectual labor, the thing I love most, to a robot -- let alone line the pockets of some charlatan 'thought leader' who's promising to make a reality of upper management's dirtiest wet dream: in essence, to proletarianize skilled work and finally liberate the owners of capital from the tyranny of labor costs.
I also write all of this, I guess, as someone who thinks commenter Doug seems like a way cool guy, a decent chap who asked a reasonable question in a gracious, open way and got a weirdly dismissive, obtuse reply that belies the smug, sanctimonious hypocrisy of the blog post itself.
Oh, and one more thing: AI tools are poison. I see them as incompatible with love of programming, engineering quality, and the creation of safe, maintainable systems, and I think they should be regarded as a threat to the health and safety of everybody whose lives depend on software (all of us), not because of the dangers of machine super intelligence but because of the dangers of the complete absence of machine intelligence paired with the seductive illusion of understanding.
andrewstuart 13 hours ago [-]
He’s saying it’s not faster because he needs to impose his human analysis on it which is slow.
That’s fine, but it’s an arbitrary constraint he chooses, and it’s wrong to say AI is not faster. It is. He just won’t let it be faster.
Some won’t like to hear this, but no-one reviews the machine code that a compiler outputs. That’s the future, like it or not.
You can’t say compilers are slow because I add on the time I take to Analyse the machine code. That’s you being slow.
bluefirebrand 12 hours ago [-]
> no-one reviews the machine code that a compiler outputs
That's because compilers are generally pretty trustworthy. They aren't necessarily bug free, and when you do encounter compiler bugs it can be extremely nasty, but mostly they just work
If compilers were wrong as often as LLMs are, we would be reviewing machine code constantly
purerandomness 8 hours ago [-]
A compiler produces the same, deterministic output, every single time.
A stochastic parrot can never be trusted, let alone one that tweaks its model every other night.
I totally get that not all code ever written needs to be correct.
Some throw-away experiments can totally be one-shot by AI, nothing wrong with that. Depending on the industry one works in, people might be on different points of the expectation spectrum for correctness, and so their experience with LLMs vary.
It's the RAD tool discussion of the 2000s, or the "No-Code" tools debate of the last decade, all over again.
sneak 14 hours ago [-]
It’s harder to read code than it is to write it, that’s true.
But it’s also faster to read code than to write it. And it’s faster to loop a prompt back to fixed code to re-review than to write it.
AlotOfReading 13 hours ago [-]
I've written plenty of code that's much faster to write than to read. Most dense, concise code will require a lot more time building a mental model to read than it took to turn that mental model into code in the first place.
Dusksky 7 hours ago [-]
[dead]
Mila-Cielo 15 hours ago [-]
[dead]
15 hours ago [-]
blueboo 9 hours ago [-]
Skeptics find Talking themselves out of trying them is marvellously effective for convincing themselves they’re right
strangescript 16 hours ago [-]
Everyone is still thinking about this problem the wrong way. If you are still running one agent, on one project at a time, yes, its not going to be all that helpful if you are already a fast, solid coder.
Run three, run five. Prompt with voice annotation. Run them when normally you need a cognitive break. Run them while you watch netflix on another screen. Have them do TDD. Use an orchestrator. So many more options.
I feel like another problem is deep down most developers hate debugging other people's code and thats effectively what this is at times. It doesn't matter if your Associate ran off and saved you 50k lines of typing, you would still rather do it yourself than debug the code.
I would give you grave warnings, telling you the time is nigh, adapt or die, etc, but it doesn't matter. Eventually these agents will be good enough that the results will surpass you even in simple one task at a time mode.
kibibu 15 hours ago [-]
I have never seen people work harder to dismantle their own industry than software engineers are right now.
marssaxman 14 hours ago [-]
We've been automating ourselves out of our jobs as long as we've had them; somehow, despite it all, we never run out of work to do.
kibibu 8 hours ago [-]
We've automated bullshit tedium work, like building and deploying, but this is the first time in my memory that people are actively trying to automate all the fun parts away.
Closest parallel I can think of is the code-generation-from-UML era, but that explicitly kept the design decisions on the human side, and never really took over the world.
strangescript 15 hours ago [-]
What exactly is the alternative? Wish it away? Developers have been automating away jobs for decades, its seems hypocritical to complain about it now.
hooverd 14 hours ago [-]
who gets the spoils?
sponnath 14 hours ago [-]
Can you actually demonstrate this workflow producing good software?
hooverd 15 hours ago [-]
Sounds like a way to blast your focus into a thousand pieces
giantg2 5 hours ago [-]
The true test is can it write tests? Ask the dev if they use it to write tests. The answers to #1 is it can't really. The answer to #2 should be no.
AI can write some tests, but it can't design thorough ones. Perhaps the best way to use AI is to have a human writing thorough and well documented tests as part of TDD, asking AI to write code to meet those tests, then thoroughly reviewing that code.
AI saves me just a little time by writing boilerplate stuff for me, just one step above how IDEs have been providing generated getters and setters.
I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.
Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.
What do you think will happen when everyone is using the AI tools to answer their questions? We'll be back in the world of Encyclopedias, in which central authorities spent large amounts of money manually collecting information and publishing it. And then they spent a good amount of time finding ways to sell that information to us, which was only fair because they spent all that time collating it. The internet pretty much destroyed that business model, and in some sense the AI "revolution" is trying to bring it back.
Also, he's specifically talking about having a coding tool write the code for you, he's not talking about using an AI tool to answer a question, so that you can go ahead and write the code yourself. These are different things, and he is treating them differently.
I know this isn't true because I work on an API that has no answers on stackoverflow (too new), nor does it have answers anywhere else. Yet, the AI seems to able to accurately answer many questions about it. To be honest I've been somewhat shocked at this.
That doesn't mean it knows the answer. That means it guessed or hallucinated correctly. Guessing isn't knowing.
edit: people seem to be missing my point, so let me rephrase. Of course AIs don't think, but that wasn't what I was getting at. There is a vast difference between knowing something, and guessing.
Guessing, even in humans, is just the human mind statistically and automatically weighing probabilities and suggesting what may be the answer.
This is akin to what a model might do, without any real information. Yet in both cases, there's zero validation that anything is even remotely correct. It's 100% conjecture.
It therefore doesn't know the answer, it guessed it.
When it comes to being correct about a language or API that there's zero info on, it's just pure happenstance that it got it correct. It's important to know the differences, and not say it "knows" the answer. It doesn't. It guessed.
One of the most massive issues with LLMs is we don't get a probability response back. You ask a human "Do you know how this works", and an honest and helpful human might say "No" or "No, but you should try this. It might work".
That's helpful.
Conversely a human pretending it knows and speaking with deep authority when it doesn't is a liar.
LLMs need more of this type of response, which indicates certainty or not. They're useless without this. But of course, an LLM indicating a lack of certainty, means that customers might use it less, or not trust it as much, so... profits first! Speak with certainty on all things!
* Read the signatures of the functions.
* Use the code correctly.
* Answer questions about the behavior of the underlying API by consulting the code.
Of course they're just guessing if they go beyond what's in their context window, but don't underestimate context window!
"If you're getting answers, it has seen it elsewhere"
The context window is 'elsewhere'.
It’s silly to say that something LLMs can reliably do is impossible and every time it happens it’s “dumb luck”.
As they say, it sounds like you're technically correct, which is the best kind of correct. You're correct within the extremely artificial parameters that you created for yourself, but not in any real world context that matters when it comes to real people using these tools.
To anyone who has used these tools in anger it’s remarkable given they’re only trained on large corpuses of language and feedback they’re able to produce what they do. I don’t claim they exist outside their weights, that’s absurd. But the entire point of non linear function activations with many layers and parameters is to learn highly complex non linear relationships. The fact they can be trained as much as they are with as much data as they have without overfitting or gradient explosions means the very nature of language contains immense information in its encoding and structure, and the network by definition of how it works and is trained does -not- just return what it was trained on. It’s able to curve fit complex functions that inter relate semantic concepts that are clearly not understood as we understand them, but in some ways it represents an “understanding” that’s sometimes perhaps more complex and nuanced than even we can.
Anyway the stochastic parrot euphemism misses the point that parrots are incredibly intelligent animals - which is apt since those who use that phrase are missing the point.
How would you reconcile this with the fact that SOTA models are only a few TB in size? Trained on exabytes of data, yet only a few TB in the end.
Correct answers couldn't be dumb luck either, because otherwise the models would pretty much only hallucinate (the space of wrong answers is many orders of magnitude larger than the space of correct answers), similar to the early proto GPT models.
This is false. You are off by ~4 orders of magnitude by claiming these models are trained on exabytes of data. It is closer to 500TB of more curated data at most. Contrary to popular belief LLMs are not trained on "all of the data on the internet". I responded to another one of your posts that makes this false claim here:
https://news.ycombinator.com/item?id=44283713
You want to say this guy's experience isn't reproducible? That's one thing, but that's probably not the case unless you're assuming they're pretty stupid themselves.
You want to say that it Is reproducible, but that "that doesn't mean AI can think"? Okay, but that's not what the thread was about.
When I built my own programming language and used it to build a unique toy reactivity system and then asked the LLM "what can I improve in this file", you're essentially saying it "only" could help me because it learned how it could improve arbitrary code before in other languages and then it generalized those patterns to help me with novel code and my novel reactivity system.
"It just saw that before on Stack Overflow" is a bad trivialization of that.
It saw what on Stack Overflow? Concrete code examples that it generalized into abstract concepts it could apply to novel applications? Because that's the whole damn point.
As to 'knows the answer', I'm don't even know what that means with these tools. All I know is if it is helpful or not.
The amazing thing about LLMs is that we still don’t know how (or why) they work!
Yes, they’re magic mirrors that regurgitate the corpus of human knowledge.
But as it turns out, most human knowledge is already regurgitation (see: the patent system).
Novelty is rare, and LLMs have an incredible ability to pattern match and see issues in “novel” code, because they’ve seen those same patterns elsewhere.
Do they hallucinate? Absolutely.
Does that mean they’re useless? Or does that mean some bespoke code doesn’t provide the most obvious interface?
Having dealt with humans, the confidence problem isn’t unique to LLMs…
You may want to take a course in machine learning and read a few papers.
LLMs are insanely complex systems and their emergent behavior is not explained by the algorithm alone.
Goodness this is a dim view on the breadth of human knowledge.
But I look down my nose at conceptions that human knowledge is packagable as plain text, our lives, experience, and intelligence is so much more than the cognitive strings we assemble in our heads in order to reason. It’s like in that movie Contact when Jodie Foster muses that they should have sent a poet. Our empathy and curiosity and desires are not encoded in UTF8. You might say these are realms other than knowledge, but woe to the engineer who thinks they’re building anything superhuman while leaving these dimensions out, they’re left with a cold super-rationalist with no impulse to create of its own.
"<the human brain> cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck."
Obviously this isn’t true. You can easily verify this by inventing and documenting an API and feeding that description to an LLM and asking it how to use it. This works well. LLMs are quite good at reading technical documentation and synthesizing contextual answers from it.
I mean... They also can read actual documentation. If I'm working on any api work or a language I'm not familiar with, I ask the LLM to include the source they got their answer from and use official documentation when possible.
That lowers the hallucination rate significantly and also lets me ensure said function or code actually does what the llm reports it does.
In theory, all stackoverflow answers are just regurgitated documentation, no?
This 100%. I use o3 as my primary search engine now. It is brilliant at finding relevant sources, summarising what is relevant from them, and then also providing the links to those sources so I can go read them myself. The release of o3 was a turning point for me where it felt like these models could finally go and fetch information for themselves. 4o with web search always felt inadequate, but o3 does a very good job.
> In theory, all stackoverflow answers are just regurgitated documentation, no?
This is unfair to StackOverflow. There is a lot of debugging and problem solving that has happened on that platform of undocumented bugs or behaviour.
Modern implementations of LLMs can "do research" by performing searches (whose results are fed into the context), or in many code editors/plugins, the editor will index the project codebase/docs and feed relevant parts into the context.
My guess is they either were using the LLM from a code editor, or one of the many LLMs that do web searches automatically (ie. all of the popular ones).
They are answering non-stackoverflow questions every day, already.
This happens all the time via RAG. The model “knows” certain things via its weights, but it can also inject much more concrete post-training data into its context window via RAG (e.g. web searches for documentation), from which it can usefully answer questions about information that may be “not in its training data”.
People don't think that. Especially not the commentor you replied to. You're human-hallucinating.
People think LLM are trained on raw documents and code besides StackOverflow. Which is very likely true.
Generalisation is something that neural nets are pretty damn good at, and given the complexity of modern LLMs the idea that they cannot generalise the fairly basic logical rules and patterns found in code such that they're able provide answers to inputs unseen in the training data is quite an extreme position.
Models work across programming languages because it turned out programming languages and API are much more similar than one could have expected.
Okay, maybe sometimes the post about the stack trace was in Chinese, but a plain search used to be capable of giving the same answer as a LLM.
It's not that LLMs are better, it's search that got entshittified.
We have a habit of finding efficiencies in our processes, even if the original process did work.
I could break most passwords of an internal company application by googling the SHA1 hashes.
It was possible to reliably identify plants or insects by just googling all the random words or sentences that would come to mind describing it.
(None of that works nowadays, not even remotely)
The "plain" Google Search before LLM never had the capability to copy&paste an entire lengthy stack trace (e.g. ~60 frames of verbose text) because long strings like that exceeds Google's UI. Various answers say limit of 32 words and 5784 characters: https://www.google.com/search?q=limit+of+google+search+strin...
Before LLM, the human had to manually visually hunt through the entire stack trace to guess at a relevant smaller substring and paste that into Google the search box. Of course, that's do-able but that's a different workflow than an LLM doing it for you.
To clarify, I'm not arguing that the LLM method is "better". I'm just saying it's different.
But I did it subconsciously. I never thought of it until today.
Another skill that LLM use can kill? :)
Which is never? Do you often just lie to win arguments? LLM gives you a synthesized answer, search engine only returns what already exists. By definition it can not give you anything that is not a super obvious match
In my experience it was "a lot". Because my stack traces were mostly hardware related problems on arm linux in that period.
But I suppose your stack traces were much different and superior and no one can have stack traces that are different from yours. The world is composed of just you and your project.
> Do you often just lie to win arguments?
I do not enjoy being accused of lying by someone stuck in their own bubble.
When you said "Which is never" did you lie consciously or subconsciously btw?
Whatever it is specifically, the idea that you could just paste a 600 line stack trace unmodified into google, especially "way before AI" and get pointed to the relevant bit for your exact problem is obviously untrue.
I disabled AI autocomplete and cannot understand how people can use it. It was mostly an extra key press on backspace for me.
That said, learning new languages is possible without searching anything. With a local model, you can do that offline and have a vast library of knowledge at hand.
The Gemini results integrated in Google are very bad as well.
I don't see the main problem to be people just lazily asking AI for how to use the toilet, but that real knowledge bases like stack overflow and similar will vanish because of lacking participation.
Sometimes, a function doesn't work as advertised or you need to do something tricky, you get a weird error message, etc. For those things, stackoverflow could be great if you could find someone who had a similar problem. But the tutorial level examples on most blogs might solve the immediate problem without actually improving your education.
It would be similar to someone solving your homework problems for you. Sure you finished your homework, but that wasn't really learning. From this perspective, ChatGPT isn't helping you learn.
Sure, there is a chance that one day AI will be smart enough to read an entire codebase and chug out exhaustively comprehensive and accurate documentation. I'm not convinced that is guaranteed to happen before our collective knowledge falls off a cliff.
At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.
Sort of. The process of working through the question is what drives learning. If you just receive the answer with zero effort, you are explicitly bypassing the brain's learning mechanism.
There's huge difference between your workflow and fully Agentic AIs though.
Asking an AI for the answer in the way you describe isn't exactly zero effort. You need to formulate the question and mold the prompt to get your response, and integrate the response back into the project. And in doing so you're learning! So YOUR workflow has learning built in, because you actually use your brain before and after the prompt.
But not so with vibe coding and Agentic LLMs. When you hit submit and get the tokens automatically dumped into your files, there is no learning happening. Considering AI agents are effectively trying to remove any pre-work (ie automating prompt eng) and post-work (ie automating debugging, integrating), we can see Agentic AI as explicitly anti-learning.
Here's my recent vibe coding anecdote to back this up. I was working on an app for an e-ink tablet dashboard and the tech stack of least resistance was C++ with QT SDK and their QML markup language with embedded javascript. Yikes, lots of unfamiliar tech. So I tossed the entire problem at Claude and vibe coded my way to a working application. It works! But could I write a C++/QT/QML app again today - absolutely not. I learned almost nothing. But I got working software!
Vibe-coding is just a stop on the road to a more useful AI and we shouldn't think of it as programming.
There is a sweet spot of situations I know well enough to judge a solution quickly, but not well enough to write code quickly, but that's a rather narrow case.
I used to be on the Microsoft stack for decades. Windows, Hyper-V, .NET, SQL Server ... .
Got tired of MS's licensing BS and I made the switch.
This meant learning Proxmox, Linux, Pangolin, UV, Python, JS, Bootstrap, NGinx, Plausible, SQLite, Postgress ...
Not all of these were completely new, but I had never dove in seriously.
Without AI, this would have been a long and daunting project. AI made this so much smoother. It never tires of my very basic questions.
It does not always answer 100% correct the first time (tip: paste in the docs of specific version of the thing you are trying to figure out as it sometimes has out-of-date or mixed version knowledge), but most often can be nudged and prodded to a very helpfull result.
AI is just an undeniably superior teacher than Google or Stack Overflow ever was. You still do the learning, but the AI is great in getting you to learn.
Don't get me wrong, I tried. But even when pasting the documentation in, the amount of times it just hallucinated parameters and arguments that were not even there were such a huge waste of time, I don't see the value in it.
AI is a search engine that can also remix its results, often to good effect.
Which strongly discouraged trying to teach people.
I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.
I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.
But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.
[1] https://storage.googleapis.com/gweb-research2023-media/pubto...
Personally, I don't know if this is always a win, mostly because I enjoy the creative and problem solving aspect of coding, and reducing that to something that is more about prompting, correcting, and mentoring an AI agent doesn't bring me the same satisfaction and joy.
After doing programming for a decade or two, the actual act of programming is not enough to be ”creative problem solving”, it’s the domain and set of problems you get to apply it to that need to be interesting.
>90% of programming tasks at a company are usually reimplementing things and algorithms that have been done a thousand times before by others, and you’ve done something similar a dozen times. Nothing interesting there. That is exactly what should and can now be automated (to some extent).
In fact solving problems creatively to keep yourself interested, when the problem itself is boring is how you get code that sucks to maintain for the next guy. You should usually be doing the most clear and boring implementation possible. Which is not what ”I love coding” -people usually do (I’m definitely guilty).
To be honest this is why I went back to get a PhD, ”just coding” stuff got boring after a few years of doing it for a living. Now it feels like I’m just doing hobby projects again, because I work exactly on what I think could be interesting for others.
Couldn't this result in being able to work longer for less energy, though? With really hard mentally challenging tasks I find I cap out at around 3-4 hours a day currently
Like imagine if you could walk at running speed. You're not going faster.. but you can do it for way longer so your output goes up if you want it to
But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.
With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".
It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.
LLM-based¹ coding, at least beyond simple auto-complete enhancements (using it directly & interactively as what it is: Glorified Predictive Text) is more akin to managing a junior or outsourcing your work. You give a definition/prompt, some work is done, you refine the prompt and repeat (or fix any issues yourself), much like you would with an external human. The key differences are turnaround time (in favour of LLMs), reliability (in favour of humans, though that is mitigated largely by the quick turnaround), and (though I suspect this is a limit that will go away with time, possibly not much time) lack of usefulness for "bigger picture" work.
This is one of my (several) objections to using it: I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it. For years I've avoided managing people at all, at the known expense of reduced salary potential, for similar reasons: I want to be a tinkerer, not a manager of tinkerers. Perhaps call me back when you have an AGI that I can work alongside.
--------
[1] Yes, I'm a bit of a stick-in-the-mud about calling these things AI. Next decade they won't generally be considered AI like many things previously called AI are not now. I'll call something AI when it is, or very closely approaches, AGI.
Also if my junior argued back and was wrong repeatedly, that's be bad. Lucky that has never happened with AIs ...
LLMs absolutely can improve over time.
We all want many things, doesn't mean someone will pay you for it. You want to tinker? Great, awesome, more power to you, tinker on personal projects to your heart's content. However, if someone pays you to solve a problem, then it is our job to find the best, most efficient way to cleanly do it. Can LLMs do this on their own most of the time? I think not, not right now at least. The combination of skilled human and LLM? Most likely, yes.
Maybe I'll retrain for lab work, I know a few people in the area, yeah I'd need a pay cut, but… Heck, I've got the mortgage paid, so I could take quite a cut and not be destitute, especially if I get sensible and keep my savings where they are and building instead of getting tempted to spend them! I don't think it'll get to that point for quite a few years though, and I might have been due to throw the towel in by that point anyway. It might be nice to reclaim tinkering as a hobby rather than a chore!
A million times yes.
And we live in a time in which people want to be called "programmers" because it's oh-so-cool but not doing the work necessary to earn the title.
i.e. continually gambling and praying the model spits something out that works instead of thinking.
But more seriously, in the ideal case refining a prompt based on a misunderstanding of an LLM due to ambiguity in your task description is actually doing the meaningful part of the work in software development. It is exactly about defining the edge cases, and converting into language what is it that you need for a task. Iterating on that is not gambling.
But of course if you are not doing that, but just trying to get a ”smarter” LLM with (hopefully deprecated study of) ”prompt engineering” tricks, then that is about building yourself a skill that can become useless tomorrow.
If the outcome is indistinguisable from using "thinking" as the process rather than brute force, why would the process matter regarding how the outcome was achieved?
Your concept of thinking is the classic retoric - as soon as some "ai" manages to achieve something which previously wasn't capable, it's no longer AI and is just xyz process. It happened with chess engines, with alphago, and with LLMs. The implication being that human "thinking" is somehow unique and only the AI that replicate it can be considered to have "thinking".
From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.
From what I see of the tools, I think the skills developed largely consists of skills you need to develop as you get more senior anyway, namely writing detail-oriented specs and understanding how to chunk tasks. Those skills aren't going to stop having value.
Detailed specs are certainly a transferable skill, what isn't is the tedious hand holding and defensive prompting. In my entire career I've worked with a lot of people, only one required as much hand holding as AI. That person was using AI to do all their work.
Thank you!
[0]: https://www.anthropic.com/claude-code
If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.
Maybe not hours, but extended periods of time, yes. Agents are very quick, so they can frequently complete tasks that would have taken me hours in minutes.
> The page says $17 per month. That's unlimited usage?
Each plan has a limited quota; the Pro plan offers you enough to get in and try out Claude Code, but not enough for serious use. The $100 and $200 plans still have quotas, but they're quite generous; people have been able to get orders of magnitude of API-cost-equivalents out of them [0].
> If so, it does seem that AI just replaced me at my job... don't let them know. A significant portion of my projects are writing small business tools.
Perhaps, but for now, you still need to have some degree of vague competence to know what to look out for and what works best. Might I suggest using the tools to get work done faster so that you can relax for the rest of the day? ;)
[0]: https://xcancel.com/HaasOnSaaS/status/1932713637371916341
No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.
> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.
While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.
The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.
I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.
I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.
Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.
If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?
I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.
AI isn't a panacea, but it can be the right tool for the job.
>I do not agree it is something you can pick up in an hour.
But it's also interesting that the industry is selling the opposite ( with AI anyone can code / write / draw / make music ).
>You have to learn what AI is good at.
More often than not I find it you need to learn what the AI is bad at, and this is not a fun experience.
"Write me a server in Go" only gets you so far. What is the auth strategy, what endpoints do you need, do you need to integrate with a library or API, are there any security issues, how easy is the code to extend, how do you get it to follow existing patterns?
I find I need to think AND write more than I would if I was doing it myself because the feedback loop is longer. Like the article says, you have to review the code instead of having implicit knowledge of what was written.
That being said, it is faster for some tasks, like writing tests (if you have good examples) and doing basic scaffolding. It needs quite a bit of hand holding which is why I believe those with more experience get more value from AI code because they have a better bullshit meter.
That is software engineering realm, not using LLMs realm. You have to answer all of these questions even with traditional coding. Because they’re not coding questions, they’re software design questions. And before that, there were software analysis questions preceded by requirements gathering questions.
A lot of replies around the thread is conflating coding activities with the parent set of software engineering activities.
LLMs can help answer the questions. However, they're not going to necessarily make the correct choices or implementation without significant input from the user.
You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?
Correct prompting these days what makes a difference in tasks like SWE-verified.
For example, I have a custom planning prompt that I will give a paragraph or two of information to, and then it will produce a specification document from that by searching the web and reading the code and documentation. And then I will review that specification document before passing it back to Claude Code to implement the change.
This works because it is a lot easier to review a specification document than it is to review the final code changes. So, if I understand it and guide it towards how I would want the feature to be implemented at the specification stage, that sets me up to have a much easier time reviewing the final result as well. Because it will more closely match my own mental model of the codebase and how things should be implemented.
And it feels like that is barely scratching the surface of setting up the coding environment for Claude Code to work in.
The problem with overinvesting in a brand new, developping field is that you get skills that are soon to be redundant. You can hope that the skills are gonna transfer to what will be needed after, but I am not sure if that will be the case here. There was a lot of talk about prompting techniques ("prompt engineering") last year, and now most of these are redundant and I really don't think I have learnt something that is useful enough for the new models, nor have I actually understood sth. These are all tricks and tips level, shallow stuff.
I think these skills are just like learning how to use some tools in an ide. They increase productivity, it's great but if you have to switch ide they may not actually help you with the new things you have to learn in the new environment. Moreover, these are just skills in how to use some tools; they allow you to do things, but we cannot compare learning how to use tools vs actually learning and understanding the structure of a program. The former is obviously a shallow form of knowledge/skill, easily replaceable, easily redundant and probably not transferable (in the current context). I would rather invest more time in the latter and actually get somewhere.
The things that will change may be prompts or MCP setups or more specific optimisations like subagents. Those may require more consideration of how much you want to invest in setting them up. But the majority of setup you do for Claude Code is not only useful to Claude Code. It is useful to human developers and other agent systems as well.
> There was a lot of talk about prompting techniques ("prompt engineering") last year and now most of these are redundant.
Not true, prompting techniques still matter a lot to a lot of applications. It's just less flashy now. In fact, prompting techniques matter a ton for optimising Claude Code and creating commands like the planning prompt I created. It matters a lot when you are trying to optimise for costs and use cheaper models.
> I think these skills are just like learning how to use some tools in an ide. > if you have to switch ide they may not actually help you
A lot of the skills you learn in one IDE do transfer to new IDEs. I started using Eclipse and that was a steep learning curve. But later I switched to IntelliJ IDEA and all I had to re-learn were key-bindings and some other minor differences. The core functionality is the same.
Similarly, a lot of these "agent frameworks" like Claude Code are very similar in functionality, and switching between them as the landscape shifts is probably not as large of a cost as you think it is. Often it is just a matter of changing a model parameter or changing the command that you pass your prompt to.
Of course it is a tradeoff, and that tradeoff probably changes a lot depending upon what type of work you do, your level of experience, how old your codebases are, how big your codebases are, the size of your team, etc... it's not a slam dunk that it is definitely worthwhile, but it is at least interesting.
I like a similar workflow where I iterate on the spec, then convert that into a plan, then feed that step by step to the agent, forcing full feature testing after each one.
I've actually been playing around with languages that separate implementation from specification under the theory that it will be better for this sort of stuff, but that leaves an extremely limited number of options (C, C++, Ada... not sure what else).
I've been using C and the various LLMs I've tried seem to have issues with the lack of memory safety there.
For example, it might include: Overview, Database Design (Migration, Schema Updates), Backend Implementation (Model Updates, API updates), Frontend Implementation (Page Updates, Component Design), Implementation Order, Testing Considerations, Security Considerations, Performance Considerations.
It sounds like a lot when I type it out, but it is pretty quick to read through and edit.
The specification document is generated by a planning prompt that tells Claude to analyse the feature description (the couple paragraphs I wrote), research the repository context, research best practices, present a plan, gather specific requirements, perform quality control, and finally generate the planning document.
I'm not sure if this is the best process, but it seems to work pretty well.
My basic initial prompt for that is: "we're creating a markdown specification for (...). I'll start with basic description and at each step you should refine the spec to include the new information and note what information is missing or could use refinement."
Here’s what my today’s task looks like: 1. Test TRAE/Refact.ai/Zencoder: 70% on SWE verified 2. https://github.com/kbwo/ccmanager: use git tree to manage multiple Claude Code sessions 3. https://github.com/julep-ai/julep/blob/dev/AGENTS.md: Read and implement 4. https://github.com/snagasuri/deebo-prototype: Autonomous debugging agent (MCP) 5. https://github.com/claude-did-this/claude-hub: connects Claude Code to GitHub repositories.
This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.
ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.
Instead you can get comfortable prompting and managing context with aider.
Or you can use claude code with a pro subscription for a fair amount of usage.
I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.
While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).
It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.
IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).
[0] https://i.imgur.com/FevOm0o.png
[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...
[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things
[3] the code it produced for [2] was fixable to do a simple BVH
[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.
Before a poor kid with a computer access could learn to code nearly for free, but if it costs $1k just to get started with AI that poor kid will never have that opportunity.
- Employers, not employees, should provide workplace equipment or compensation for equipment. Don't buy bits for the shop, nails for the foreman, or Cursor for the tech lead.
- the workplace is not a meritocracy. People are not defined by their wealth.
- If $1,000 does not represent an appreciable amount of someone's assets, they are doing well in life. Approximately half of US citizens cannot afford rent if they lose a paycheck.
- Sometimes the money needs to go somewhere else. Got kids? Sick and in the hospital? Loan sharks? A pool full of sharks and they need a lot of food?
- Folks can have different priorities and it's as simple as that
We're (my employer) still unsure if new dev tooling is improving productivity. If we find out it was unhelpful, I'll be very glad I didn't lose my own money.
In my experience it's that they dump the code into a pull request and expect me to review it. So GenAI is great if someone else is doing the real work.
Unlike the author of the article I do get a ton of value from coding agents, but as with all tools they are less than useless when wielded incompetently. This becomes more damaging in an org that already has perverse incentives which reward performative slop over diligent and thoughtful engineering.
Most of my teams have been very allergic to assigning personal blame and management very focused on making sure everyone can do everything and we are always replaceable. So maybe I could phrase it like "X could help me with this" but saying X is responsible for the bug would be a no no.
I don't mind fixing bugs, but I do mind reckless practices that introduce them.
One of the most bizarre experiences I have had over this past year was dealing with a developer who would screen share a ChatGPT session where they were trying to generate a test payload with a given schema, getting something that didn't pass schema validation, and then immediately telling me that there must be a bug in the validator (from Apache foundation). I was truly out of words.
One of the biggest problem I see with AI is, that it makes people used to NOT to think. It takes lots of time and energy to learn to program and design complex software. AI doesn’t solve this - humans to be able to supervise need to have these skills. But why would new programmers learn them? AI writes their code! It’s already hard to convince them otherwise. This only leads to bad things.
Technology without proper control and wisdom, destroys human things. We saw this many times already.
StackOverflow makes it easier not think and copy-paste. Autocomplete makes it easier to not think and make typos (Hopefully you have static typing). Package management makes it easier to not think and introduce heavy dependencies. C makes it easier to not think and forget to initialize variables. I make it easier to not think and read without considering evil (What if every word I say has evil intention and effect?)
As someone who uses Claude Code heavily, this is spot on.
LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.
I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).
For context, it’s just a reimplementation of a tool I built.
Let’s just say it’s going a lot slower than the first time I built it by hand :)
If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.
The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.
I don't know what you mean by "a lot of guidance". Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.
Another issue is that as long as you ensure it builds good enough tests, the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.
The code is increasingly becoming throwaway.
What do you mean? If it were as simple as not letting it do so, I would do as you suggest. I may as well stop letting it be incorrect in general. Lots of guidance helps avoid it.
> Maybe I just naturally do that, but to me there's not been much change in the level of guidance I need to give Claude Code or my own agent vs. what I'd give developers working for me.
Well yeah. You need to give it lots of guidance, like someone who works for you.
> the cost of telling it to just throw out the code it builds later and redo it with additional architectural guidance keeps dropping.
It's a moving target for sure. My confidence with this in more complex scenarios is much smaller.
I'm arguing it is as simple as that. Don't accept changes that muddle up the architecture. Take attempts to do so as evidence that you need to add direction. Same as you presumably would - at least I would - with a developer.
There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".
Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).
It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.
I'm actually curious about the "lose their skills" angle though. In the open source community it's well understood that if anything reviewing a lot of code tends to sharpen your skills.
What happens if the reader no longer has enough of that authorial instinct, their own (opinionated) independent understanding?
I think the average experience would drift away from "I thought X was the obvious way but now I see by doing Y you were avoid that other problem, cool" and towards "I don't see the LLM doing anything too unusual compared to when I ask it for things, LGTM."
Let's say you're right though, and you lose that authorial instinct. If you've got five different proposals/PRs from five different models, each one critiqued by the other four, the needs for authorial instinct diminish significantly.
Years of PT have enabled me to work quite effectively and minimize the flare ups :)
Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.
I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.
The entire code? Not there, but with debuggers, I've even started doing that a bit.
In contrast, when I’m trying to do something truly novel, I might spend days with a pen and paper working out exactly what I want to do and maybe under an hour coding up the core logic.
On the latter type of work, I find LLM’s to be high variance with mostly negative ROI. I could probably improve the ROI by developing a better sense of what they are and aren’t good at, but of course that itself is rapidly changing!
That is the mental model I have for the work (computer programing) i like to do and am good at.
Plumbing
The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.
But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.
Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.
That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.
If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".
The alternative where I boil a few small lakes + a few bucks in return for a PR that maybe sometimes hopefully kinda solves the ticket sounds miserable. I simply do not want to work like that, and it doesn't sound even close to efficient or speedier or anything like that, we're just creating extra work and extra waste for literally no reason other than vague marketing promises about efficiency.
But in my experience this is _signal_. If the ai cant get to it with minor back and forth then something needs work, your understanding, the specification, the tests, your code factoring etc.
The best case scenario is your agent one shots the problem. But close behind that is that your agent finds a place where a little cleanup makes everybody’s life easier you, your colleagues and the bot. And your company is now incentivized to invest in that.
The worse case is you took the time to write 2 prompts that didn’t work.
There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.
The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.
And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...
Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.
With an arbitrary PR from a colleague or security audit, you have to come up with mental model first, which is the hardest part.
Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.
Certainly, however:
> That's the point I'm making about reviewing LLM code: you are not on the hook for making it work
The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).
Agentic AI is just yet another, as you put it way to "get in trouble trying to be clever".
My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code. If your only real use of AI is to replace template systems, congratulations on perpetuating the most over-engineered template system ever. I'll stick with a provable, free template system, or just not write the code at all.
You're missing the point.
tptacek is saying he isn't the one who needs to fix the issue because he can just reject the PR and either have the AI agent refine it or start over. Or ultimately resort to writing the code himself.
He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.
> My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code.
There's a vast chasm between simple enough that a non-AI code generator can generate it using templates and simple enough that a fast read-through is enough to show that it's okay to run.
As an example, the other day I had my own agent generate a 1kloc API client for an API. The worst case scenario other than failing to work would be that it would do something really stupid, like deleting all my files. Since it passes its tests, skimming it was enough for me to have confidence that nowhere does it do any file manipulation other than reading the files passed in. For that use, that's sufficient since it otherwise passes the tests and I'll be the only user for some time during development of the server it's a client for.
But no template based generator could write that code, even though it's fairly trivial - it involved reading the backend API implementation and rote-implementation of a client that matched the server.
Not true at all, in fact this sort of thing used to happen all the time 10 years ago, code reading APIs and generating clients...
> He doesn't need to make the AI written code work, and so he doesn't need to spend a lot of time reading the AI written code - he can skim it for any sign it looks even faintly off and just kill it if that's the case instead of spending more time on it.
I think you are missing the point as well, that's still review, that's still being on the hook.
Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.
But I hear you say "all software works like that", well, yes, to some degree. The difference being, one you hopefully actually wrote and have some idea what's going wrong, the other one?
Well, you just have to sort of hope it works and when it doesn't, well you said it yourself. Your code was garbage anyways, time to "kill" it and generate some new slop...
Where is this template based code generator that can read my code, understand it, and generate a full client including a CLI, that include knowing how to format the data, and implement the required protocols?
I'm 30 years of development, I've seen nothing like it.
> I think you are missing the point as well, that's still review, that's still being on the hook.
I don't know if you're being intentionally obtuse, or what, but while, yes, you're on the hook for the final deliverable, you're not on the hook for fixing a specific instance of code, because you can just throw it away and have the AI do it all over.
The point you seem intent on missing is that the cost of throwing out the work of another developer is high, while the cost of throwing out the work of an AI assistant is next to nothing, and so where you need to carefully review a co-workers code because throwing it away and starting over from scratch is rarely an option, with AI generated code you can do that at the slightest whiff of an issue.
> Words like "skim" and "kill" are the problem here, not a solution. They point to a broken process that looks like its working...until it doesn't.
No, they are not a problem at all. They point to a difference in opportunity cost. If the rate at which you kill code is too high, it's a problem irrespective of source. But the point is that this rate can be much higher for AI code than for co-workers before it becomes a problem, because the cost of starting over is orders of magnitude different, and this allows for a very different way of treating code.
> Well, you just have to sort of hope it works and when it doesn't
No, I don't "hope it works" - I have tests.
This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.
I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.
I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.
The same goes with shell scripting.
But more importantly you don’t have to understand code to the same degree and depth. When I read code I understand what the code is doing and if it looks correct. I’m not going over other design decisions or implementation strategies (unless they’re obvious). If I did that then I’d agree. Id also stop doing code reviews and just write everything myself.
I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.
It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.
This will only be resolved out there in the real world. If AI turns a bad developer, or even a non-developer, into somebody that can replace a good developer, the workplace will transform extremely quickly.
So I'll wait for the world to prove me wrong but my expectation, and observation so far, is that AI multiplies the "productivity" of the worst sort of developer: the ones that think they are factory workers who produce a product called "code". I expect that to increase, not decrease, the value of the best sort of developer: the ones who spend the week thinking, then on Friday write 100 lines of code, delete 2000 and leave a system that solves more problems than it did the week before.
It is 100% a function of what you are trying to build, what language and libraries you are building it in, and how sensitive that thing is to factors like performance and getting the architecture just right. I've experienced building functioning systems with hardly any intervention, and repeatedly failing to get code that even compiles after over an hour of effort. There exists small, but popular, subset of programming tasks where gen AI excels, and a massive tail of tasks where it is much less useful.
I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.
I don't think code is ever "obviously right" unless it is trivially simple
The more I use this, the longer the LLM will be working before I even look at the output any more than maybe having it chug along on another screen and occasionally glance over.
My shortest runs now usually takes minutes of the LLM expanding my prompt into a plan, writing the tests, writing the code, linting its code, fixing any issues, and write a commit message before I even review things.
Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.
"Ship it!" - me
This is the piece that confuses me about the comparison to a junior or an intern. Humans learn about the business, the code, the history of the system. And then they get better. Of course there’s a world where agents can do that, and some of the readme/doc solutions do that but the limitations are still massive and so much time is spent reexplaining the business context.
*dusts off hands* Problem solved! Man, am I great at management or what?
Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.
Even when explicitly prompted to use first-principles analysis, AI models can struggle because:
- They lack the intuitive understanding of when to discard prior assumptions
- They don't naturally distinguish between surface-level similarity and deep structural similarity
- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics
This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.
Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.
Humans are also not as susceptible to context poisoning, unlike llms.
Using them for larger bits of code feels silly as I find subtle bugs or subtle issues in places, so I don't necessarily feel comfortable passing in more things. Also, large bits of code I work with are very business logic specific and well abstracted, so it's hard to try and get ALL that context into the agent.
I guess what I'm trying to ask here is what exactly do you use agents for? I've seen youtube videos but a good chunk of those are people getting a bunch of typescript generated and have some front-end or generate some cobbled together front end that has Stripe added in and everyone is celebrating as if this is some massive breakthrough.
So when people say "regular tasks" or "rote tasks" what do you mean? You can't be bothered to write a db access method/function using some DB access library? You are writing the same regex testing method for the 50th time? You keep running into the same problem and you're still writing the same bit of code over and over again? You can't write some basic sql queries?
Also not sure about others, but I really dislike having to do code reviews when I am unable to really gauge the skill of the dev I'm reviewing. If I know I have a junior with 1-2 years maybe, then I know to focus a lot on logic issues (people can end up cobbling toghether the previous simple bits of code) and if it's later down the road at 2-5 years then I know that I might focus on patterns or look to ensure that the code meets the standards, look for more discreet or hidden bugs. With an agent output it could oscilate wildly between those. It could be a solidly written search function, well optimized or it could be a nightmarish sql querry that's impossible to untangle.
Thoughts?
I do have to say I found it good when working on my own to get another set of "eyes" and ask things like "are there more efficient ways to do X" or "can you split this larger method into multiple ones" etc
Now, I can just do `const myModal = useModal(...)` in all my components. Cool. This saved me at least 30 minutes, and 30 minutes of my time is worth way more than 20 bucks a month. (N.B.: All this boilerplate might be a side effect of React being terrible, but that's beside the point.)
That’s an issue I have with generated code. More often, I start with a basic design that evolves based on the project needs. It’s an iterative process that can span the whole timeline. But with generated code, it’s a whole solution that fits the current needs, but it’s a pain to refactor.
For harder problems, my experience is that it falls over, although I haven't been refining my LLM skills as much as some do. It seems that the bigger the project, the more it integrates with other things, the worse AI is. And moreover, for those tasks it's important for me or a human to do it because (a) we think about edge cases while we work through the problem intellectually, and (b) it gives us a deep understanding of the system.
Leaving aside the fact that this isn't an LLM problem; we've always had tech debt due to cowboy devs and weak management or "commercial imperatives":
I'd be interested to know if any of the existing LLM ELO style leaderboards mark for code quality in addition to issue fixing?
The former seems a particularly useful benchmark as they become more powerful in surface abilities.
But this is one of the core problems with LLM coding, right? It accelerates an already broken model of software development (worse is better) rather than trying to help fix it.
There are other times when I am building a stand-alone tool and am fine wiht whatever it wants to do because it's not something I plan to maintain and its functional correctness is self-evident. In that case I don't even review what it's doing unless it's stuck. This is more actual vibe code. This isn't something I would do for something I am integrating into a larger system but will for something like a cli tool that I use to enhance my workflow.
I don't send my coworkers lists of micromanaged directions that give me a pretty clear expectation of what their PR is going to look like. I do however, occasionally get tagged on a review for some feature I had no part in designing, in a part of some code base I have almost no experience with.
Reviewing that the components you asked for do what you asked is a much easier scenario.
Maybe if people are asking an LLM to build an entire product from scratch with no guidance it would take a lot more effort to read and understand the output. But I don't think most people do that on a daily basis.
Is that what you and your buddies talk about at two hour long coffee/smoke breaks while “terrible” programmers work?
I had AI create me a k8s device plugin for supporting sr-iov only vGPU's. Something nvidia calls "vendor specific" and basically offers little to not support for in their public repositories for Linux KVM.
I loaded up a new go project in goland, opened up Junie, typed what I needed and what I have, went to make tea, came back, looked over the code to make sure it wasn't going to destroy my cluster (thankfully most operations were read-only), deployed it with the generated helm chart and it worked (nearly) first try.
Before this I really had no idea how to create device plugins other than knowing what they are and even if I did, it would have easily taken me an hour or more to have something working.
The only thing AI got wrong is that the virtual functions were symlinks and not directories.
The entire project is good enough that I would consider opensourcing it. With 2 more prompts I had configmap parsing to initialize virtual functions on-demand.
90% of my usage of Copilot is just fancy autocomplete: I know exactly what I want, and as I'm typing out the line of code it finishes it off for me. Or, I have a rough idea of the syntax I need to use a specific package that I use once every few months, and it helps remind me what the syntax is, because once I see it I know it's right. This usage isn't really glamorous, but it does save me tiny bits of time in terms of literal typing, or a simple search I might need to do. Articles like this make me wonder if people who don't like coding tools are trying to copy and paste huge blocks of code; of course it's slower.
I know what function I want to write, start writing it, and then bam! The screen fills with ghost text that may partly be what I want but probably not quit.
Focus shifts from writing to code review. I wrest my attention back to the task at hand, type some more, and bam! New ghost text to distract me.
Ever had the misfortune of having a conversation with a sentence-finisher? Feels like that.
Perhaps I need to bind to a hot key instead of using the default always-on setting.
---
I suspect people using the agentic approaches skip this entirely and therefore have a more pleasant experience overall.
Autocomplete is a total focus destroyer for me when it comes to text, e.g. when writing a design document. When I'm editing code, it sometimes trips me up (hitting tab to indent but end up accepting a suggestion instead), but without destroying my focus.
I believe your reported experience, but mine (and presumably many others') is different.
With unfamiliar syntax, I only needs a few minutes and a cheatsheet to get back in the groove. Then typing go back to that flow state.
Typing code is always semi-unconscious. Just like you don't pay that much attention to every character when you're writing notes on paper.
Editing code is where I focus on it, but I'm also reading docs, running tests,...
The author is one who appears unwilling to do so.
Either/or fallacy. There exist a varied set of ways to engage with the technology. You can read reference material and ask for summarization. You can use language models to challenge your own understanding.
Are people really this clueless? (Yes, I know the answer, but this is a rhetorical device.)
Think, people. Human intelligence is competing against artificial intelligence, and we need to step it up. Probably a good time to stop talking like we’re in Brad Pitt’s latest movie, Logical Fallacy Club. If we want to prove our value in a competitive world, we need to think and write well.
I sometimes feel like bashing flawed writing is mean, but maybe the feedback will get through. Better to set a quality bar. We should aim to be our best.
The distinction isn't whether code comes from AI or humans, but how we integrate and take responsibility for it. If you're encapsulating AI-generated code behind a well-defined interface and treating it like any third party dependency, then testing that interface for correctness is a reasonable approach.
The real complexity arises when you have AI help write code you'll commit under your name. In this scenario, code review absolutely matters because you're assuming direct responsibility.
I'm also questioning whether AI truly increases productivity or just reduces cognitive load. Sometimes "easier" feels faster but doesn't translate to actual time savings. And when we do move quicker with AI, we should ask if it's because we've unconsciously lowered our quality bar. Are we accepting verbose, oddly structured code from AI that we'd reject from colleagues? Are we giving AI-generated code a pass on the same rigorous review process we expect for human written code? If so, would we see the same velocity increases from relaxing our code review process amongst ourselves (between human reviewers)?
Libraries are maintained by other humans, who stake their reputation on the quality of the library. If a library gets a reputation of having a lax maintainer, the community will react.
Essentially, a chain of responsibility, where each link in the chain has an incentive to behave well else they be replaced.
Who is accountable for the code that AI writes?
Doesn't matter, I'm not responsible for maintaining that particular code
The code in my PRs has my name attached, and I'm not trusting any LLM with my name
If you consider that AI code is not code any human needs to read or later modify by hand, AI code is modified by AI. All you want to do is just fully test it, if it all works, it's good. Now you can call into it from your own code.
I'm ultimately still responsible for the code. And unlike AI, library authors but their and their libraries reputation on the line.
My company just had internal models that were mediocre at best, but at the beginning this year they finally enabled Copilot for everyone.
At the beginning I was really excited for it, but it’s absolutely useless for work. It just doesn’t work on big old enterprise projects. In an enterprise environment everything is composed of so many moving pieces, knowledge scattered across places, internal terminology, etc. Maybe in the future, with better MCP servers or whatever, it’ll be possible to feed all the context into it to make it spit something useful, but right now, at work, I just use AI as search engine (and it’s pretty good at it, when you have the knowledge to detect when it have subtle problems)
> The quality of the code these tools produce is not the problem.
So even if an AI could produce code of a quality equal to or surpassing the author's own code quality, they would still be uninterested in using it.
To each their own, but it's hard for me to accept an argument that such an AI would provide no benefit, even if one put priority on maintaining high quality standards. I take the point that the human author is ultimately responsible, but still.
I set that up to run then do something different. I come back in a couple minutes, scan the diffs which match expectations and move on to the next task.
That’s not everything but those menial tasks where you know what needs to be done and what the final shape should look like are great for AI. Pass it off while you work on more interesting problems.
The more you deviate from that, the more you have to step in.
But given that I constantly forget how to open a file in Python, I still have a use for it. It basically supplanted Stackoverflow.
There’s your issue, the skill of programming has changed.
Typing gets fast; so does review once robust tests already prove X, Y, Z correctness properties.
With the invariants green, you get faster at grokking the diff, feed style nits back into the system prompt, and keep tuning the infinite tap to your taste.
The Codex workflow however really is a game changer imo. It takes the time to ensure changes are consistent with other code and the async workflow is just so much nicer.
Yep, this is pretty much it. However, I honestly feel that AI writes so much better code than me that I seldom need to actually fix much in the review, so it doesn't need to be as thorough. AI always takes more tedious edge-cases into account and applies best practices where I'm much sloppier and take more shortcuts.
Where AI especially excels is helping me do maintenance tickets on software I rarely touch (or sometimes never have touched). It can quickly read the codebase, and together we can quickly arrive at the place where the patch/problem lies and quickly correct it.
I haven't written anything "new" in terms of code in years, so I'm not really learning anything from coding manually but I do love solving problems for my customers.
Hard disagree. It's still way faster to review code than to manually write it. Also the speed at which agents can find files and the right places to add/edit stuff alone is a game changer.
Although tbh, even in the worse case I think I am still faster at reviewing than writing. The only difference is though, those reviews will never have had the same depth of thought and consideration as when I write the code myself. So reviews are quicker, but also less thorough/robust than writing for me.
This strikes me as a tradeoff I'm absolutely not willing to make, not when my name is on the PR
This is a recipe for disaster with AI agents. You have to read every single line carefully, and this is much more difficult for the large majority of people out there than if you had written it yourself. It's like reviewing a Junior's work, except I don't mind reviewing my Junior colleague's work because I know they'll at least learn from the mistakes and they're not a black box that just spews bullshit.
What I personally find is. It's great for helping me solve mundane things. For example I'm recently working on an agentic system and I'm using LLMs to help me generate elasticsearch mappings.
There is no part of me that enjoy making json mappings, it's not fun nor does it engage my curiosity as a programmer, I'm also not going to learn much from generating elasticsearch mappings over and over again. For problems like this, I'm happy to just let the LLM do the job. I throw some json at it and I've got a prompt that's good enough that it will spit out results deterministically and reliably.
However if I'm exploring / coding something new, I may try letting the LLM generate something. Most of the time though in these cases I end up hitting 'Reject All' after I've seen what the LLM produces, then I go about it in my own way, because I can do better.
It all really depends on what the problem you are trying to solve. I think for mundane tasks LLMs are just wonderful and helps get out of the way.
If I put myself into the shoes of a beginner programmer LLMs are amazing. There is so much I could learn from them. Ultimately what I find is LLMs will help lower the barrier of entry to programming but does not mitigate the need to learn to read / understand / reason about the code. Beginners will be able to go much further on their own before seeking out help.
If you are more experienced you will probably also get some benefits but ultimately you'd probably want to do it your own way since there is no way LLMs will replace experienced programmer (not yet anyway).
I don't think it's wise to completely dismiss LLMs in your workflow, at the same time I would not rely on it 100% either, any code generated needs to be reviewed and understood like the post mentioned.
Having a chatbot telling me what to write would have not sorted the same effect.
It's like having someone tell you the solutions to your homework.
I still use them, but more as a support tool than a real assistant.
Eventually: well, but, the AI coding agent isn't better than a top 10%/5%/1% software developer.
And it'll be that the coding agents can't do narrow X thing better than a top tier specialist at that thing.
The skeptics will forever move the goal posts.
However, assuming we are still having this conversation, that alone is proof to me that the AI is not that capable. We're several years into "replace all devs in six months." We will have to continue wait and see it try and do.
This. The dev's outcompeting by using AI today are too busy shipping, rather than wasting time writing blog posts about what ultimately, is a skill-issue.
IDEs outperform any “dumb” editor in full context of work. You don’t see any less posts about “I use Vim, btw” (and I say this as Vim user).
Responsability and "AI" marketing are two non intersecting sets.
It's very possible that AI is literally making us less productive and dumber. Yet they are being pushed by subscription-peddling companies as if it is impossible to operate without them. I'm glad some people are calling it out.
[1] https://devops.com/study-finds-no-devops-productivity-gains-...
[2] https://arxiv.org/abs/2506.08872
I guess the author is not aware of Cursor rules, AGENTS.md, CLAUDE.md, etc. Task-list oriented rules specifically help with long term context.
Or are you talking about OP not knowing AI tools enough?
Is this possible in any way today? Does one need to use Llama or DeepSeek, and do we have to run it on our own hardware to get persistence?
To me the part I enjoy most is making things. Typing all that nonsense out is completely incidental to what I enjoy about it.
This truly is shocking. If you are reviewing every single line of every package you intend to use how do you ever write any code?
Using a package that hundreds of thousands of other people use is low risk, it is battle tested
It doesn't matter how good AI code gets, a unique solution that no one else has ever touched is always going to be more brittle and risky than an open source package with tons of deployments
And yes, if you are using an Open Source package that has low usage, you should be reviewing it very carefully before you embrace it
Treat AI code as if you were importing from a git repo with 5 installs, not a huge package with Mozilla funding
This remains to be seen. It's still early days, but self-attention scales quadratically. This is a major red flag for the future potential of these systems.
Did the author take their own medicine and measure their own productivity?
Even if that was true for everybody reviews would still be worth doing because when the code is reviewed it gets more than one pair of eyes looking at it.
So it's still worth using AI even if it's slower than writing code yourself. Because you wouldn't have made mistakes that AI would made and AI wouldn't make mistakes you would have made.
It still might be personally not worth it for you though if you prefer to write code than to read it. Until you can set up AI as a reviewer for yourself.
The concept of why can get nebulous in a corporate setting, but it's nevertheless fun to explore. At the end of the day, someone have a problem and you're the one getting the computer to solve it. The process of getting there is fun in a way that you learn about what irks someone else (or yourself).
Thinking about the problem and its solution can be augmented with computers (I'm not remembering Go Standard Library). But computers are simple machines with very complex abstractions built on top of them. The thrill is in thinking in terms of two worlds, the real one where the problem occurs and the computing one where the solution will come forth. The analogy may be more understandable to someone who've learned two or more languages and think about the nuances between using them to depict the same reality.
Same as the TFA, I'm spending most of my time manipulating a mental model of the solution. When I get to code is just a translation. But the mental model is difuse, so getting it written gives it a firmer existence. LLMs generation is mostly disrupting the process. The only way they help really is a more pliable form of Stack Overflow, but I've only used Stack Overflow as human-authored annotations of the official docs.
Apparently models are not doing great for problems out of distribution.
Best counter claim: Not all code has the same risk. Some code is low risk, so the risk of error does not detract from the speed gained. For example, for proof of concepts or hobby code.
The real problem: Disinformation. Needless extrapolation, poor analogies, over valuing anecdotes.
But there's money to be made. What can we do, sometimes the invisible hand slaps us silly.
Counter counter claim for these use cases: when I do proof of concept, I actually want to increase my understanding of said concept at the same time, learn challenges involved, and in general get a better idea how feasible things are. An AI can be useful for asking questions, asking for reviews, alternative solutions, inspiration etc (it may have something interesting to add or not) but if we are still in the territory "this matters" I would rather not substitute the actual learning experience and deeper understanding with having an AI generate code faster. Similar for hobby projects, do I need that thing to just work or I actually care to learn how it is done? If the learning/understanding is not important in a context, I would say then using AI to generate the code is a great time-saver. Otherwise, I may still use AI but not in the same way.
Revised example: Software where the goal is design experimentation; like with trying out variations of UX ideas.
Also, the auto-complete with tools like Cursor are mind blowing. When I can press tab to have it finish the next 4 lines of a prepared statement, or it just knows the next 5 variables I need to define because I just set up a function that will use them.... that's a huge time saver when you add it all up.
My policy is simple, don't put anything AI creates into production if you don't understand what it's doing. Essentially, I use it for speed and efficiency, not to fill in where I don't know at all what I'm doing.
Another great example, is the power of tabbing with Cursor. If I want to change the parameters of a function in my React app, I can be at one of the functions anywhere in my screen, add a variable that relates to what is being rendered, and I can now quickly tab through to find all the spots that also are affected in that screen, and then it usually helps apply the changes to the function. It's like smart search and replace where I can see every change that needs made but it knows how to make it more intelligently than just replacing a line of code - and I didn't have to write the regex to find it, AND it usually helps get the work done in the function as well to reflect the change. That could save me 3-5 minutes, and I could do that 5 times a day maybe, and another almost half-hour is saved.
The point is, these small things add up SO fast. Now I'm incredibly efficient because the tedious part of programming has been sped up so much.
How much do you believe a programmer needs to layout to “get good”?
I think that getting "good" at using AI means that you figure out exactly how to formulate your prompts so that the results are what you are looking for given your code base. It also means knowing when to start new chats, and when to have it focus on very specific pieces of code, and finally, knowing what it's really bad at doing.
For example, if I need to have it take a list of 20 fields and create the HTML view for the form, it can do it in a few seconds, and I know to tell it, for example, to use Bootstrap, Bootstrap icons, Bootstrap modals, responsive rows and columns, and I may want certain fields aligned certain ways, buttons in certain places for later, etc, and then I have a form - and just saved myself probably 30 minutes of typing it out and testing the alignment etc. If I do things like this 8 times a day, that's 4 hours of saved time, which is game changing for me.
I've probably fed $100 in API tokens into the OpenAI and Anthropic consoles over the last two years or so.
I was subscribed to Cursor for a while too, though I'm kinda souring on it and looking at other options.
At one point I had a ChatGPT pro sub, I have found Claude more valuable lately. Same goes for Gemini, I think it's pretty good but I haven't felt compelled to pay for it.
I guess my overall point is you don't have to break the bank to try this stuff out. Shell out the $20 for a month, cancel immediately, and if you miss it when it expires, resub. $20 is frankly a very low bar to clear - if it's making me even 1% more productive, $20 is an easy win.
Most tech companies however tend to operate following a standard enshittification schedule. First they are very cheap, supported by investments and venture capitalists. Then they build a large user base who become completely dependent on them as alternatives disappear (in this case as they lose the institutional knowledge that their employees used to have). Then they seek to make money so the investors can make their profits. In this case I could see the cost of AI rising a lot, after companies have already built it in to their business. AI eventually has to start making money. Just like Amazon had to, and Facebook, and Uber, and Twitter, and Netflix, etc.
From all the talk I see of companies embracing AI wholeheartedly it seems like they aren't looking any further than the next quarter. It only costs so much per month to replace so many man hours of work! I'm sure that won't last once AI is deeply embedded into so many businesses that they can start charging whatever they want to.
However, AI code reviewers have been really impressive. We run three separate AI reviewers right now and are considering adding more. One of these reviewers is kind of noisy, so we may drop it, but the others have been great. Sure, they have false positives sometimes and they don't catch everything. But they do catch real issues and prevent customer impact.
The Copilot style inline suggestions are also decent. You can't rely on it for things you don't know about, but it's great at predicting what you were going to type anyway.
But despite all that, the tools can find problems, get information, and propose solutions so much faster and across such a vast set of challenges that I simply cannot imagine going back to working without them.
This fellow should keep on working without AIs. All the more power to him. And he can ride that horse all the way into retirement, most likely. But it's like ignoring the rise of IDEs, or Google search, or AWS.
None of these things introduced the risk of directly breaking your codebase without very close oversight. If LLMs can surpass that hurdle, then we’ll all be having a different conversation.
And besides, not all LLMs are the same when it comes to breaking existing functions. I've noticed that Claude 3.7 is far better at not breaking things that already work than whatever it is that comes with Cursor by default, for example.
That is the two sides of the argument. It could only be settled, in principle, if both sides were directly observing each other's work in real-time.
But, I've tried that, too. 20 years ago in a debate between dedicated testers and a group of Agilists who believed all testing should be automated. We worked together for a week on a project, and the last day broke down in chaos. Each side interpreted the events and evidence differently. To this day the same debate continues.
People's lives are literally at stake. If my systems screw up, people can die.
And I will continue to use AI to help get through all that. It doesn't make me any less responsible for the result.
Writing a bunch of orm code feels boring? I make it generate the code and edit. Importing data? I just make it generate inserts. New models are good at reformatting data.
Using a third party Library? I force it to look up every function doc online and it still has errors.
Adding transforms and pivots to sql while keeping to my style? It is a mess. Forget it. I do that by hand.
I have not found it useful for large programming tasks. But for small tasks, a sort of personalised boiler plate, I find it useful
Where I find it genuinely useful is in extremely low-value tasks, like localisation constants for the same thing in other languages, without having to tediously run that through an outside translator. I think that mostly goes in the "fancy inline search" category.
Otherwise, I went back from Cursor to normal VS Code, and mostly have Copilot autocompletions off these days because they're such a noisy distraction and break my thought process. Sometimes they add something of value, sometimes not, but I'd rather not have to confront that question with every keystroke. That's not "10x" at all.
Yes, I've tried the more "agentic" workflow and got down with Claude Code for a while. What I found is that its changes are so invasive and chaotic--and better prompts don't really prevent this--that it has the same implications for maintainability and ownership referred to above. For instance, I have a UIKit-based web application to which I recently asked Claude Code to add dark theme options, and it rather brainlessly injected custom styles into dozens of components and otherwise went to town, in a classic "optimise for maximum paperclip production" kind of way. I spent a lot more time un-F'ing what it did throughout the code base than I would have spent adding the functionality myself in an appropriately conservative fashion. Sure, a better prompt would probably have helped, but that would have required knowing what chaos it was going to wreak in advance, as to ask it to refrain from that as part of the prompt. The possibility of this happening with every prompt is not only daunting, but a rabbit hole of cognitive load that distracts from real work.
I will concede it does a lot better--occasionally, very impressively--with small and narrow tasks, but those tasks at which it most excels are so small that the efficiency benefit of formulating the prompt and reviewing the output is generally doubtful.
There are those who say these tools are just in their infancy, AGI is just around the corner, etc. As far as I can tell from observing the pace of progress in this area (which is undeniably impressive in strictly relative terms), this is hype and overextrapolation. There are some fairly obvious limits to their training and inference, and any programmer would be wise to keep their head down, ignore the hype, use these tools for what they're good at and studiously avoid venturing into "fundamentally new ways of working".
Which is kind of like if AI wrote it: except someone is standing behind those words.
Commenter Doug asks:
> > what AI coding tools have you utilized
Miguel replies:
> I don't use any AI coding tools. Isn't that pretty clear after reading this blog post?
Doug didn't ask what tools you use, Miguel. He asked which tools you have used. And the answer to that question isn't clear. Your post doesn't name the ones you've tried, despite using language that makes clear you that you have in fact used them (e.g. "my personal experience with these tools"). Doug's question isn't just reasonable. It's exactly the question an interested, engaged reader will ask, because it's the question your entire post begs.
I can't help but point out the irony here: you write a great deal on the meticulousness and care with which you review other people's code, and criticize users of AI tools for relaxing standards, but the AI-tool user in your comments section has clearly read your lengthy post more carefully and thoughtfully than you read his generous, friendly question.
And I think it's worth pointing out that this isn't the blog post's only head scratcher. Take the opening:
> People keep asking me If I use Generative AI tools for coding and what I think of them, so this is my effort to put my thoughts in writing, so that I can send people here instead of having to repeat myself every time I get the question.
Your post never directly answers either question. Can I infer that you don't use the tools? Sure. But how hard would it be to add a "no?" And as your next paragraph makes clear, your post isn't "anti" or "pro." It's personal -- which means it also doesn't say much of anything about what you actually think of the tools themselves. This post won't help the people who are asking you whether you use the tools or what you think of them, so I don't see why you'd send them here.
> my personal experience with these tools, from a strictly technical point of view
> I hope with this article I've made the technical issues with applying GenAI coding tools to my work clear.
Again, that word: "clear." No, the post not only doesn't make clear the technical issues; it doesn't raise a single concern that I think can properly be described as technical. You even say in your reply to Doug, in essence, that your resistance isn't technical, because for you the quality of an AI assistant's output doesn't matter. Your concerns, rather, are practical, methodological, and to some extent social. These are all perfectly valid reasons for eschewing AI coding assistants. They just aren't technical -- let alone strictly technical.
I write all of this as a programmer who would rather blow his own brains out, or retire, than cede intellectual labor, the thing I love most, to a robot -- let alone line the pockets of some charlatan 'thought leader' who's promising to make a reality of upper management's dirtiest wet dream: in essence, to proletarianize skilled work and finally liberate the owners of capital from the tyranny of labor costs.
I also write all of this, I guess, as someone who thinks commenter Doug seems like a way cool guy, a decent chap who asked a reasonable question in a gracious, open way and got a weirdly dismissive, obtuse reply that belies the smug, sanctimonious hypocrisy of the blog post itself.
Oh, and one more thing: AI tools are poison. I see them as incompatible with love of programming, engineering quality, and the creation of safe, maintainable systems, and I think they should be regarded as a threat to the health and safety of everybody whose lives depend on software (all of us), not because of the dangers of machine super intelligence but because of the dangers of the complete absence of machine intelligence paired with the seductive illusion of understanding.
That’s fine, but it’s an arbitrary constraint he chooses, and it’s wrong to say AI is not faster. It is. He just won’t let it be faster.
Some won’t like to hear this, but no-one reviews the machine code that a compiler outputs. That’s the future, like it or not.
You can’t say compilers are slow because I add on the time I take to Analyse the machine code. That’s you being slow.
That's because compilers are generally pretty trustworthy. They aren't necessarily bug free, and when you do encounter compiler bugs it can be extremely nasty, but mostly they just work
If compilers were wrong as often as LLMs are, we would be reviewing machine code constantly
A stochastic parrot can never be trusted, let alone one that tweaks its model every other night.
I totally get that not all code ever written needs to be correct.
Some throw-away experiments can totally be one-shot by AI, nothing wrong with that. Depending on the industry one works in, people might be on different points of the expectation spectrum for correctness, and so their experience with LLMs vary.
It's the RAD tool discussion of the 2000s, or the "No-Code" tools debate of the last decade, all over again.
But it’s also faster to read code than to write it. And it’s faster to loop a prompt back to fixed code to re-review than to write it.
Run three, run five. Prompt with voice annotation. Run them when normally you need a cognitive break. Run them while you watch netflix on another screen. Have them do TDD. Use an orchestrator. So many more options.
I feel like another problem is deep down most developers hate debugging other people's code and thats effectively what this is at times. It doesn't matter if your Associate ran off and saved you 50k lines of typing, you would still rather do it yourself than debug the code.
I would give you grave warnings, telling you the time is nigh, adapt or die, etc, but it doesn't matter. Eventually these agents will be good enough that the results will surpass you even in simple one task at a time mode.
Closest parallel I can think of is the code-generation-from-UML era, but that explicitly kept the design decisions on the human side, and never really took over the world.
AI can write some tests, but it can't design thorough ones. Perhaps the best way to use AI is to have a human writing thorough and well documented tests as part of TDD, asking AI to write code to meet those tests, then thoroughly reviewing that code.
AI saves me just a little time by writing boilerplate stuff for me, just one step above how IDEs have been providing generated getters and setters.