Rendered at 19:25:09 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
tyingq 1 days ago [-]
The abrupt swing in many non-technology company IT departments from "hey developer, you aren't using enough tokens" to this is just too funny.
And I'm seeing almost no self-awareness from leaders. They are making decisions about things that they just don't understand. And are completely unworried about it. Just blindly following whatever the news cycle is about AI.
datakan 1 days ago [-]
The closer people live to the consequences of their decisions the more rational they become. Until leaders(and I use that term loosely) are held accountable, the insanity will continue.
mrandish 19 hours ago [-]
In addition to being true, this observation is profound. When designing any multi-step system that relies on humans making decisions, whether in governance, organizations or economies, placing root causes as close to end effects as possible is almost always better.
greesil 1 days ago [-]
Their only accountability is to the stock price. The insanity will continue.
dfedbeef 1 days ago [-]
As long as our stock price continues to... Continues to rise... Which... Hmm... I'm just now reading our balance sheet. Is this number right? Great, thanks.
As I was saying, you're all fired.
Henchman21 23 hours ago [-]
I’m willing to bet that most of us here are capable of acquiring pitchforks and torches.
I predict that will be their comeuppance; it will begin a new era in history.
17 hours ago [-]
cindyllm 22 hours ago [-]
[dead]
1 days ago [-]
oofbey 1 days ago [-]
I’m sorry you are used to working with out of touch leadership. Not all companies are like that. Even big ones can have smart, empathetic leaders. Although very often money gets in the way of empathy.
rf15 1 days ago [-]
Money alao has the problematic tendency to warp the people around you, it's its own kind of gravity. The more powerful you are the more you attract yesmannerism and the more you lose touch with what's going on.
therealdrag0 24 hours ago [-]
Also notably these attributes don’t make one infallible. I see a lot of engineers judging from the sidelines without any sense of how to run large orgs and how you have to make tough calls with imperfect info all the time.
pdimitar 1 days ago [-]
You hiring?
OutOfHere 4 hours ago [-]
Being out of touch is the default state for leadership. They mostly just parrot the news with a multi-month lag.
bunderbunder 1 days ago [-]
I've been enjoying journalist Ed Zitron's recent diatribes about how impossible it is to find a business leader who had a plan for measuring their ROI from adopting AI coding.
What he says he's consistently hearing from them mirrors what I saw at my own employer: they thought they had ROI metrics, but they actually only had usage metrics such as "lines of code committed" or "number of pull requests". The only way those could possibly work as an ROI measure is if your business charges customers by the line of code.
conception 21 hours ago [-]
What they really means is they previously had no valid metric to measure productivity of developers before either. AI or not.
bunderbunder 7 hours ago [-]
Measuring productivity of developers isn’t really in line with what needs to happen, either. A team can be incredibly productive and still generate negative 100% ROI if what they are building so industriously is stuff that nobody wants to buy.
Which reflects another thing I’ve seen at work. A lot of what AI coding has enabled is diving headfirst into quagmires. Our costs have spiked - not just because of the token spend, also because we gotta pay the cloud platform to run all these new services, operators to operate them, marketers to market them, etc. - but revenue hasn’t budged.
no-name-here 16 hours ago [-]
But at least pre AI, most managers presumably subjectively measured devs on relevant performance. Using systems where employees who burn the most tokens ($) per week ‘win’ is crazy - just ask the AI to spin up a subagents to implement every conceivable approach to a task, then spin up n agent judge to pick the winner, and repeat. You've immediately got 50x or whatever your previous usage from that alone.
canyp 16 hours ago [-]
I had cynically done this sort of tokenmaxxing for a while as a burnt offering to the token-hungry non-leadership.
Eventually I got tired of it and got back to work.
sdeframond 1 days ago [-]
Groups resist to change - the bigger the group, the most resistance there is.
As a leader, pushing for rapid change cannot really be nuanced lest the push dissipates into the organization's entropy.
HarHarVeryFunny 1 days ago [-]
Perhaps, but the change you get (if any) is most likely to be what you push for and reward/punish.
It's irrational to push for tokenmaxxing (literally "please increase our AI spending") and not expect that this is the result you are going to get. You won't get productivity increase, since that is not what you are pushing for - you will get token usage maximization (engineers running inane agentic tasks against your code base to increase usage, using company paid AI for their side projects, etc, etc).
lanstin 1 days ago [-]
The evidence suggests that many tech leaders do not realize that an immediate result of heavy handed uninformed top down decision making is transforming the “work together, succeed together, giving quality” ethos into a cynical game theory minimax effort to game whatever stupid arbitrary metrics are used to implement the top down fad of the quarter; do it consistently and you get a work force that can be given a metric and immediately, instinctively, tell you how the work flow will be adjusted for the new metric, and where the difficult problems will be shunted to.
SpicyLemonZest 1 days ago [-]
I'm not sure the leaders would disagree with what you're saying. They tokenmaxxed to understand what it looks like when AI gets into every corner of the business; now they feel they've gotten enough info (or at least that more info wouldn't be worth the cost), so they're adding in cost controls. As the article says, this is not great for AI model providers trying to predict what their future revenue is going to be, but it's not obvious that there's any mistake here for AI users.
HarHarVeryFunny 1 days ago [-]
> They tokenmaxxed to understand what it looks like when AI gets into every corner of the business
Perhaps that is what they were trying to do, but the reality is that all they will have got is a large token bill. The decision makers may have hoped that tokens would be used in most productive fashion possible so they could evaluate if the cost was worth it, but what they will have actually got is what they asked for and measured, high token usage (applied to whatever people needed to do to get their usage stats up, regardless of productivity).
The other business-as-usual factor is that there will be false reporting up the chain, so if the company understands the CEO want to see high AI usage and productivity gains, then s/he will see high AI usage (a large token bill) and will be fed success reports of corresponding productivity gains.
In a typical corporate environment, if all your peers are reporting success, achieving what the CEO wants, do you want to be the only one reporting failure? So - everyone reports high AI usage (easy for the employees to make happen), and most everyone also reports productivity gains if they understand this is the expectation.
cratermoon 1 days ago [-]
I’m imagining a lot of programmers suddenly being given the impossible task of reporting what worked and what didn’t, and middle management making up some retrospective evaluation with fat PowerPoint decks and meaningless graphs in an effort to present to C-levels some measures of success other than token use.
HarHarVeryFunny 1 days ago [-]
As the saying goes "figures can't lie, but liars can figure".
If you want to report productivity gains or cost savings from some initiative (increased AI usage or whatever) and need some stats to point to, then you just point to whatever is working, for whatever reason, and attribute the success to the new initiative.
In a company I used to work for, one manager, when pushed to increase machine learning usage (a few years back, before ML became AI), just renamed his product from foo to foo-ML (with ZERO ML usage), and reported how well it is working. He has since been promoted twice.
cratermoon 1 days ago [-]
It’s not clear companies were measuring anything but token usage. What information could leadership have collected to determine what worked, what didn’t, and what needs more data? Other than the balance sheet and revenue, do companies actually have sufficient information to understand the results?
SpicyLemonZest 1 days ago [-]
Were they trying to measure other things? Definitely. The COO at Uber, one of the examples in the source article, has talked publicly about how they've searched for (and so far failed to find) a link between micro-level metrics driven by AI and concrete improvements in high level project velocity.
Do these measurements have sufficient information? As much as any, I'd guess. It sounds like you already know that it's pretty hard in general to measure the productive output of software development organizations.
cratermoon 22 hours ago [-]
I have no doubt a few companies, like Uber, were measuring other things and had applicable metrics in place before adopting Clod or CoPilot or whatever automation.
I'm speaking in the general sense of companies adopting the latest hype without reflection.
qoez 1 days ago [-]
I feel like most successful businesses have such a moat of required capital to compete with them that even tho in theory poor decisions like this is supposed to give opportunities for entreprenuers to hit when the big dogs make a wrong move, it doesn't end up happening.
morgan814 1 days ago [-]
> leaders
Don’t play their game and call them leaders. They are management, bosses, executives.
> They are making decisions about things that they just don't understand. And are completely unworried about it.
Clowns, even.
> Just blindly following whatever the news cycle is about AI.
But followers might be most apt.
——
This is such a huge pet peeve of mine. Describing management goofs using their language that makes them sound all-so-brilliant. We constantly watch these people do the dumbest shit and then they go around describing themselves as “thought leaders” and “servant leaders”. When, really, most are just clowns with fragile egos.
And, while I’m rambling, they’ve tried to take away the fact we are workers by calling us individual contributors. Using language to attempt and hide the hierarchy and power dynamic at play. It just…bothers me so much.
joquarky 1 days ago [-]
I don't hear them refer to themselves as "job creators" much these days.
And many of them still claim they are "risk takers", but have effectively insulated themselves from risk by socializing losses.
danaris 5 hours ago [-]
> Don’t play their game and call them leaders. They are management, bosses, executives.
You're falling into a common trap here: the ambiguity of the English language.
"Leader" means multiple different things. Yes, it means someone who has leadership qualities—who genuinely inspires those around them to do better, or who boldly marches into the unknown and gets people to follow them.
But it also means "someone in charge of a thing."
Now it's certainly true that many people in charge of things who are also really bad at actually inspiring or getting people to follow them (aside from with threats of destitution) also play on that ambiguity to try to convince people that because they're in charge of things, they must also be Good Leaders, and that's crappy...but yelling at others for using the term casually is very much an "old man yells at cloud" situation.
vasco 1 days ago [-]
During ZIRP they discovered that the way to lead companies nowadays is to become a maxxer of whatever current fad is, and the more you maxx the better. And then when things change and you're wrong, you'll be a strong leader and, in ZIRPs case fire everyone you over-hired, with AI will be similar.
Why be a normal guy that waits to see what happens and is measured and pragmatic when you can get attention basically through the whole cycle by being the earliest adopter, adopt it to the maxx, then also be the loudest big brain when the tide changes and be praised for "taking hard decisions" when you revert everything you said so far?
The fakemaxxing economy.
janussunaj 1 days ago [-]
A special case of the more general cringe economy we're in. The dumbest, most outrageous ideas win, amplified by social media. Say stupid sh*t loudly, be wrong, profit.
steve1977 1 days ago [-]
That's nothing new though. It's just very obvious this time.
surgical_fire 1 days ago [-]
I've never seen self-awareness from leaders. They always lead on vibes.
Understanding this was one of the most important things in my career.
im3w1l 1 days ago [-]
Having studied control theory I think it makes perfect sense. When trying to make a system target a new level it's quite natural for there to be overshoot that needs to be reigned in. It's also natural for the correction to go too far and need to be corrected in turn. This is not indicative of stupidity it's completely normal.
It would only be laughable if they waited way too long to reverse course, but I don't think that's the case.
RJIb8RBYxzAMX9u 22 hours ago [-]
Suppose I'm driving at 20 kph, and I set my cruise control to 40 kph. My car then goes WOT, overshoots my target speed and hits 120 kph, at which point it slams on the brakes[0], dropping my speed to 15 kph. It repeats until it finally settles at my target speed. (Rhetorical question) would that be considered "completely normal"?
Over/undershoots and corrections are of course unavoidable and normal; the absurdity is at the magnitude and rate of change. Furthermore, this is giving it the benefit of the doubt, that measuring AI spend is a good indicator; that's arguably also in dispute. To stretch my car analogy a bit more: it would be like the cruse control system has to hit the target speed, but it only has data from the O2 sensors.
[0] I know that the "classic" cruise control system cannot apply the brakes, but hey no analogy's perfect.
adammarples 22 hours ago [-]
It's not like they accidentally overshot, they were telling people to tokenmax, they didn't even know you could overshoot they thought it was exponential gains all the way. Subtle ideas like balance were not on their minds.
im3w1l 16 hours ago [-]
Intentionally overshooting can be a legitimate strategy.
onlyrealcuzzo 1 days ago [-]
The actual cost is going to drop 99% in ~4 years.
How much that makes it into enterprise pricing is TBD, since none of the hyper scalers are making money yet of selling AI inference.
Almost all businesses are ahead of the gun. For most of their use cases, AI is either not yet good enough on its own, or good enough but too expensive.
No one wants to get left behind, so everyone's trying to get onto it now, even though it's not ready for what most enterprises want to do with it.
It's easy for them to look at a small startup without billions of lines of legacy business logic debt and see them having success and wonder why they can't have just as much - or more - why they're bigger so they should have better and more success, right???
Wrong...
But when it gets ~99% cheaper for local inference over the next 4 years, at the same time the price per watt improve 4x -> a lot of those cases will start to pencil out.
BearOso 1 days ago [-]
Going from Opus 4.5 to 4.7 secretly required 6x more compute to run. 4.8 is apparently 30% more on top. I haven't seen any optimizations lately aside from distillation. Nobody's optimizing, they're just scaling up.
rescbr 1 days ago [-]
> Nobody's optimizing
The Chinese, since they lack computing hardware due to US export controls, are.
trollbridge 1 days ago [-]
And our export controls are going to turn China into a winner in the AI arms race if we're not careful.
rented_mule 1 days ago [-]
I retired a few years ago, but I still write a fair bit of code. I was using Copilot's code completion before I retired, but coding agents hadn't come around yet. I've been wanting to try them, but I kept putting it off, and now the price increases make it hard to justify.
So I just started trying CodeWhale (https://github.com/Hmbown/CodeWhale) with DeepSeek V4. I expected to be impressed by the abilities (which still require plenty of oversight). I didn't expect to be completely shocked by how cheep it is. After most of a week of using it 4-8 hours a day, which would amount to a full week of coding in many jobs after you account for non-coding activities, I'm about to hit $3 in total usage. So we're talking $10-20 per month for single-agent use by a full time software developer? And I'm sure some of my usage is waste as I'm still getting my head around things like compaction. If I take a break for a few weeks, I pay nothing because there is no subscription.
If DeepSeek and Xiaomi MiMo stay within a few months of the US-based models in terms of capabilities and US companies don't figure out how to drastically cut prices, I can't see how China hasn't already won. Protectionism would be one reason, but that might be ceding 50-90% of the total addressable market, and bring us closer to moving knowledge work out of the US the same way we did with manufacturing because it's too expensive in the US.
sgc 19 hours ago [-]
How are you using it? More to complete specific functions or scripts, or for larger architectural design and longer implementation runs?
rented_mule 11 hours ago [-]
My initial use was in a repo where I create models for 3d printing using a library called build123d. There are a handful of parametric models and then many instances of those models with parameters (one that's 24 mm in diameter with a cutout, another that's 42 mm in diameter but no cutout, etc.). I tend to be in a hurry when I want a new parametric model, so I've ended up just copying the one that's the most similar and changing what I want to be different.
The first big task was to find the common bits and abstract them out. It did a great job of creating a plan, summarized in a table, that gave a name to shared chunks, the line numbers in various files where they appeared, line counts of new functions vs. removed bits, and some pros/cons about splitting out each chunk. It was very well "thought out", so I told it to go ahead. It did a nice job other than straying from my coding conventions. That gave me a chance to build out my AGENTS.md file (it helped with that, too).
Once that was done, I had it create automated tests for the newly abstracted parts. I think this is probably a bad practice... I believe humans should at least define what the tests are testing so that there is a deeper understanding of what oversight is in place. But I was just trying things. It surprised me how well it did. The biggest surprise was that the tests seemed quite inspired by vision. It would try different parameters and then have comments about making sure the shape protruded in a certain way, then code that did that. I expected it to refactor a bunch of the code to make it more testable. It found a way to not touch the code while testing everything I asked it to with just two simple mocks - I hadn't foreseen that, but it felt quite practical. It was passing around several opaque tuples in the tests and accessing items in them by index. I prompted it to replace the first one with a frozen, kw-only dataclass. Then a second. On the second request, it saw the pattern and did the rest without me asking. It created 44 tests across a handful of files.
The next part is where I was the least happy. I use ruff and ty to check my code with almost all checks enabled. It was mostly good about the ruff issues. But for the type checking, it just wanted to disable 6-8 rules for the entire repo in pyproject.toml, or at least for all the tests. I had to repeatedly tell it not to and it kept telling me it wasn't recommended. When it finally gave in, it fixed most of the type issues (build123d has lots of types specified, but many operations result in type conflicts because things are so deeply overloaded). The things it didn't fix, it just left a comment to ignore type checking altogether on that line. After I did a little more brow beating, it finally changed the comments to only disable specific rules. To be fair, and unlike most of my other repos, I've had to spend way too much time getting types right in this repo myself.
My last task involved a small library management system for our little town library (tracking library cards, books, DVDs, check-outs/check-ins, etc.). I inherited it from someone who had built the entire web app out of bash/awk/troff scripts with the data in text files burdened by a lot of schema changes that he didn't really know how to deal with. I'm halfway through moving it to Python/FastAPI/SQLite. I asked it to do a security audit of the entire code base, both the newer parts and the old parts that are still in bash/awk/troff. It found everything I knew about and a few things I didn't know about. It made a decent assessment of the risks/impact of each issue. It also called out design decisions that were good security practices. One of the next big tasks will be to see how it does at continuing the migration - it has enough examples of how I've done it that I suspect it can do something fairly consistent with my thinking. I'll probably have it do one or two web pages. When I feel like it understands what I'm after, I'll tell it to use sub-agents to do the rest. I'll be very happy if I don't have to tease apart any more troff scripts that are generating PDF files!
zzleeper 22 hours ago [-]
Holy F.. $3 .. once I'm done with my base cursor allocation, each nontrivial question costs $5 . And yes, I'm now switching to a mix of codex and ds4pro
trollbridge 1 days ago [-]
DeepSeek and Alibaba would like to have a word.
whatthesmack 5 hours ago [-]
Hasn't everything DeepSeek and Alibaba created thus far been distilled from the results of many, many accounts logging into Claude and ChatGPT? And that's why there's so much bot detection now at US frontier labs? Doesn't that make the Chinese labs dependent until some unknown point in the future on advancements of US frontier labs? While what they currently provide is cheap, it seems like it's artificially cheap and somewhat static because they took others' intellectual property (no comment needed about US frontier labs stealing the world's knowledge... that's a separate topic).
NekkoDroid 4 hours ago [-]
> Hasn't everything DeepSeek and Alibaba created thus far been distilled from the results of many, many accounts logging into Claude and ChatGPT?
I doubt it is really any different to what the US labs do [1]. I never really bought the "they were basically all just distilling from us" shtick from Anthropic, I just assumed they were either comparing or also creating training data as basically any lab is doing.
Do you mean the marginal cost by the producer, or the cost on the consumer? I can't see the price of electricity falling much, and the demand curve is apparently exponential if the hype is to be believed.
trollbridge 1 days ago [-]
DeepSeep V4 Pro is 99% cheaper than similarly performing models were 2 years ago (if such a model even existed).
Computing has always been about how to wring out more efficiency. The ENIAC was 150,000 watts, with 3 phase 240 volt power, and cost about $500,000.
My day to day laptop (a year old) is 35 watts, with 1 phase 20 volt power, and cost $1,000, so that's 99.98% less power consumption, 99.8% cheaper, and it has about 10 orders of magnitude more computing power, all on a time span of 80 years.
cratermoon 1 days ago [-]
Moore’s law is dead.
HappMacDonald 20 hours ago [-]
It died before AI came around and today's coding agents are somewhere upwards of twice as competent as whatever the state of the art of automatic coding was in 2020. 8I
mrandish 18 hours ago [-]
A good chunk of that was one-time gains from shifting GPU and memory architectures to better match what LLMs need at scale as well as some algorithmic improvements. Most of the low-hanging architecture optimization has already been harvested. We'll certainly have more algorithmic gains but the consensus is they'll generally be smaller and less frequent.
There's always a chance we'll have some dramatic gains far larger than DeepSeek's optimizations a year ago, but it hasn't happened again yet at even that scale. It would be nice but I certainly wouldn't count on it.
packetlost 1 days ago [-]
I don't see how this is even remotely true. Unless there's some super breakthrough into a fundamentally different architecture, there's not really a path to a 50% reduction in price, much less a 99% reduction.
kilroy123 1 days ago [-]
In fairness, I think _current_ capabilities will be cheaper. So the models of today will be run drastically cheaper in 4 years.
onlyrealcuzzo 1 days ago [-]
And yet 90% drops for the same level of quality every 18 months have happened like clockwork...
And the technology already exists on the algorithmic front TODAY to lock in another 10x gain -> when, typically, algorithmic gains only account for ~30% of that drop and the other ~70% comes from better data (often synthetic) and knowledge distilation from frontier models.
Just look at DeepSeek's pricing...
datakan 1 days ago [-]
What makes you think prices will drop? Everyone I’ve spoken to believes they will only skyrocket. Genuinely curious
onlyrealcuzzo 1 days ago [-]
The technology already exists now on the algorithmic front for the next 10x drop between everyone adopting DeepSeek's MLA, MoE (mostly already done), Medusa (a better version of Google's speculative decoding), Kimi's Attn Residuals, and Mimo's Sliding Window Attn, and (possibly) Microsoft's 1.58b (this may be a nothing burger).
Historic trends, every 18 months, performance for the same level of quality has gone down 90%.
Historically, algorithmic gains are only ~30% of the pie, but there's enough out there to get to 10x, with just what's available already. The other ~70% of the pie is better training data (often synthetic) and distilling frontier knowledge. There's no sign we are tapped out on that front.
Additionally, GRAM (from ~10 days ago) is likely to be a 5-10x on its own (if not substantially more for smaller models). It's unlikely within 4 years LeCun's JEPA ideas and similar ideas like GRAM applied to LLMs have ZERO impact. The preliminary results are absolutely astounding (5000x better reasoning - this is not peanuts).
Further, that's not even counting that cost per watt is still dropping ~2x every 2 years on its own on the hardware front.
If you look at the "cost" of inference. People think it's electricity - but it's currently almost ~80% hardware amortization. The memory shortage is not going to last, nor are Nvidia's ~80-90% margins.
The human brain is still 8-10 orders of magnitude more efficient than the best LLMs of today. With ~1/10th of global capex riding on AI, if you don't think they're going to knock of 2 orders of magnitude more, when it's this obvious and easy... I don't know what to tell you...
Sure, it might take 6 years instead of 4. My crystal ball isn't perfect.
HarHarVeryFunny 1 days ago [-]
Sure, the price will come down a lot, even if we can argue about the timeline.
I think what will also happen, once we get past this current CEO AI FOMO mania, is that companies will start to look at AI spending more rationally like any other company expense, and will revert to more rational decision making.
Even if the cost comes down considerably over the next few years, that's plenty of time for companies to look at their financial results and question why AI expenditure isn't resulting in increase in revenue and/or profitability.
datakan 1 days ago [-]
This is great food for thought, thank you
onlyrealcuzzo 1 days ago [-]
Additionally, on the context front -> all the labs are aware that for many tasks you can get 10x+ increases in output quality by feeding better context.
This won't really show up in benchmarks, but it will impact real world usage on the most common use cases.
I'm doing a study right now on the impacts of better context for small models to fix bugs.
A very dumb algorithm can make small models perform at 10x+ model sizes. I'll be surprised if it can't get to 20x+
rednb 1 days ago [-]
I didn't take you seriously initially but after reading this, i think you are the real deal.
Thank you for sharing this and for having the intellectual courage to hold to a sound reasoning that may be unpopular initially.
Nimitz14 1 days ago [-]
This is mostly slop. But you may be directionally correct
AllegedAlec 22 hours ago [-]
> The actual cost is going to drop 99% in ~4 years.
And fusion power is just 2 decades into the future!
jjav 21 hours ago [-]
Full self driving guaranteed here before the end of the year (every year).
mrandish 18 hours ago [-]
> The actual cost is going to drop 99% in ~4 years.
We have little visibility into current frontier model costs at mass scale. As a broad historical trend, tech costs tend to fall over longer time periods but your claim far exceeds Moore's Law rates in its heyday - and that heyday is long gone.
In 2021 TSMC announced it was increasing it's price per gate for new nodes for the first time in its history. In the past five years cutting edge nodes have delivered ~8-15% real-world performance gains on average at costs at least 10-20% more than the last node. If you're positing a string of unprecedented efficiency breakthroughs in LLM algorithms - such extraordinary claims require extraordinary evidence.
bakugo 1 days ago [-]
Prices have been very obviously trending up, not down. Even open weights models are becoming more expensive with every release. Computer hardware is ballooning in price.
onlyrealcuzzo 1 days ago [-]
Prices are going up for BETTER quality -> not for the SAME level of quality.
People are willing to pay more for BETTER quality.
You obviously haven't seen DeepSeek v4 Pro's pricing if you think pricing only goes up...
bakugo 8 hours ago [-]
Maybe so, but that becomes irrelevant when you consider that the new, better quality instantly becomes the expected baseline. So the price of the "baseline" quality is going up regardless.
Let's look at GPU prices as an example. Around 12 years ago, I bought a GTX 970 for around $350. That was considered a very good GPU at the time. Today, the "equivalent" GPU model (RTX 5070) now costs almost double. Of course, the newer GPU is much more powerful (more than double, in fact), but all the things you'd use a GPU for have also advanced and now expect an entirely new level of performance as a baseline, such that the older GPU is fairly worthless today. So most people agree that GPUs in general have become more expensive.
Regarding DeepSeek's price: it's obviously subsidized, and unlikely to match the actual inference cost right now.
abalashov 1 days ago [-]
Just wait for the next model and the next model architecture. Just wait for it, bro.
onlyrealcuzzo 1 days ago [-]
Gemini 3.5 flash is 25% cheaper than 3.1 pro, and outperforms it on almost every benchmark, most by a pretty wide margin...
Rebelgecko 15 hours ago [-]
It's still 5x more expensive than 2.5 flash
abalashov 24 hours ago [-]
Cool.
bigstrat2003 17 hours ago [-]
There has never yet been a new model which actually improved over the previous ones. They suck just as much, and in the same ways, as the models of 3 years ago.
trollbridge 1 days ago [-]
Grab a 5090 and run Qwen 3.6 35b on it (6 parameter seems to work best for me).
Then buy $10 (or $2, if you're cheap, and they take PayPal) of DeepSeek credits.
Whilst you're at it spring for a Claude subscription too and GPT.
Switch models between Qwen, DeepSeek Flash, DeepSeek Pro, and you can meet 99% of your code generation needs.
Hop over to Opus 4.7 (or 4.8, but I haven't really used it yet) and GPT-5.5 when doing very complex architecture/design or troubleshooting something where DeepSeek Pro is getting stuck.
It is ridiculous how cheap this stuff is now. It's affordable at third world prices.
Supermancho 16 hours ago [-]
None of that is cheap.
> spring for a Claude subscription too and GPT.
You started with some random pricing then veered off into impractical hand waving. Far above third world prices...unless you count the USA as third world, I guess.
amazingamazing 1 days ago [-]
AI is overhyped. I have yet to see an end user product that in itself isnt a wrapper around LLMs that is impressive created by LLM assistance. I have also yet to see dramatic increases of revenue of companies using LLMs that don't involve selling things in its supply chain. Is it a nice affordance? Sure. 1T capex good? No.
If it was so good I would expect to see 2005-2015 advancements yearly.
Meanwhile China is blowing past the world with real improvements in the real world- solar, EVs, etc. meanwhile people keep making their fancy sans serif websites about todo apps, faster than ever before. Useless.
criddell 1 days ago [-]
> I have yet to see an end user product that in itself isnt a wrapper around LLMs that is impressive created by LLM assistance.
I don’t disagree that AI is overhyped. But I think you are probably looking in the wrong place.
I think most software that is written isn’t really a product, at least not a public product. It’s an in-house tool or a one-off project needed to complete some larger task. People everywhere are always writing small programs that make their life or job just a bit easier (and explains why so many corporate projects are little more than an excel spreadsheet).
And there are a lot of people who have made custom software just for themselves with AI. Not a product, just a tool or project that finally made sense to build.
pessimizer 1 days ago [-]
But where's the revenue from those? It has to add up to a couple trillion dollars to break even on the capital spending.
pocksuppet 1 days ago [-]
Would you say the same about any other tool, like where is the revenue caused by Susan in accounting having a computer, shouldn't we take away her computer if she can't prove a benefit?
amazingamazing 1 days ago [-]
The benefit of a computer would be trivial to demonstrate.
The benefits of computers were obvious before they went into production.
After 1T in spend, it's still not clear that AIs will beat out a $30k secretary.
Also, your link has nothing to do with computers.
_aavaa_ 7 hours ago [-]
The benefits of having a computer that we can now interact with in plain natural language, that can extract intent from vague questions/statements, and that can piece together answers is obvious.
The link talks directly about the disconnect between the supposed productivity benefits of a technology and the measured productivity benefits of it in practice. And provides historical context about why the “obvious” benefits of a computer did not materialize when it was introduced; business and their processes had to be rebuilt around the computer before real gains were seen.
gamblor956 1 hours ago [-]
But LLMs can't actually do that any better than a 30k secretary with no training.
It's a structural deficiency in the way they work that can't just be handwaived away.
mxschumacher 1 days ago [-]
not sure one would expect huge revenue increases from these internal tools, but maybe dramatic cost savings? Surely a lot of corporate processes could be automated?
bunderbunder 1 days ago [-]
That's been the dream for the 40 years I've been paying attention. And in that time, I've seen plenty of incremental changes but never the kind of sudden sea change that the hype machine anticipates.
The perennial reality is that automation is inherently inflexible, so there's only so much of it that you can do before you've committed a huge strategic blunder by making your business resistant to change and severely curtailing its ability to cope with situations that don't cleanly fit the mold. So then we need to hack in ways to deal with the exceptions, but, since they're hacked in, they're often painful and time consuming. Sometimes so much so that after the new process stabilizes it turns out to be even more cumbersome and require more manual effort than the system it replaced.
When anyone other than a technologist suggests doing that kind of thing, we call it "bureaucracy", and we hate it. I think maybe what we have trouble seeing is that there's actually a pretty fundamental difference between automating purely technical processes like server deployment, and automating processes that are fundamentally about mediating human interactions.
criddell 23 hours ago [-]
> It has to add up to a couple trillion dollars to break even
It doesn't have to and I'm pretty sure it won't.
kajman 1 days ago [-]
> Meanwhile China is blowing past the world with real improvements in the real world- solar, EVs, etc. meanwhile people keep making their fancy sans serif websites about todo apps, faster than ever before. Useless.
Very little about the American economy even makes sense for keeping the edge on LLMs beyond a few years. All the things I would think would be required: energy, research, construction capacity, labor costs -- it's pretty hard to deny who's on the upswing these days. China cranking out current generation microchips will be the last nail in the coffin.
dawnerd 1 days ago [-]
Productivity gains seem like it’s at best a wash when you factor in the massive tech debt cleanup and additional time needed to spec and review.
trollbridge 1 days ago [-]
Misuse of AI tools because of continuing a fundamentally broken software development process.
trollbridge 1 days ago [-]
AI is both overhyped but is also revolutionary at the same time.
I would agree that a lot of companies talking a big talk about using LLMs are failing to actually apply it in a sensible way to their business.
simplesocieties 21 hours ago [-]
In the time before LLMs, humans made satellites, Concorde, life-saving medical surgery, James-Webb Space Telescope, communication at the speed of light, the list goes on.
What changes have LLMs (Not AI, not machine learning in general, I'm not going to waste time discussing the definition), LLMs made in the past 4 years that indicate anything close to the above? Solving a whiteboard math problem?
threatofrain 1 days ago [-]
Oh, war is transforming hard.
gonzalohm 1 days ago [-]
In my opinion, the problem is not even the cost. The problem is that people are using AI for running recurrent stuff instead of writing code to automate it.
For example. Imagine that you are comparing two documents (let's assume diff doesn't exist). You could ask an AI to compare the differences from you or you could use AI to write a tool to do it. For whatever reason, people are starting to go with the former not realizing that now they basically have to pay to compare documents.
bluejay2387 1 days ago [-]
I have exposure to AI initiatives at several companies including a few F500's. I have seen teams dump huge logs into frontier models that took hours to get so-so results that we were able to replace with a few lines of python code at 1000 times the speed and 100% accuracy. When asked why they were doing this they literally said "because we don't understand the subject matter so we were depending on the AI". I saw one team file a complaint with a vendor about a frontier backed coding harness and it's inability to consistently format headers because they were using it as a reporting engine. When I recommended they just use the coding tool to write code to generate reports you would have thought I had just cured cancer from their response. I frequently see people complain about the fact that AI is going to take their jobs and then see them gripe about the fact that AI is 'worthless' because it can't do more of their job than it already does. It's easy to see the difference between the people seeing 10x productivity gains from leveraging AI and those who aren't and it's not the AI.
mxschumacher 1 days ago [-]
i have trouble understanding these situations, e.g. the AI itself would presumably make the suggestion to write a python script for such a task. It seems to me that there two huge problems right now
* understanding which category of problems an LLM is an appropriate solution for (rather than throwing LLMs at any and all problems)
* matching model capability (and therefore cost) to the problem at hand. You can easily overspend massively by using a model that's too powerful
sbarre 1 days ago [-]
I've heard this framed as "AI raises the floor by 2x or less but raises the ceiling by 10x or more"
irishcoffee 1 days ago [-]
Someone asked me if I was using models for fantasy sports, and if it was smart enough to help make decisions about drafting.
My answer: no, but it was able to help me find the website and social handles for every beat writer for every team, and generate a simple website where I can do a daily skim of teams/players and draw my own conclusions.
LLMs are a tool, not a panacea.
throwatdem12311 1 days ago [-]
Laziness, pure and simple. The inevitable consequence of “the LLm is the compiler now”. And what do you even expect people to do when they are forced at threat of termination to use AI for everything as much as possible? Not to mention people are being pressured to do insane thing like review hundreds of pull requests per day and deliver like 15 features per week so OBVIOUSLY there isn’t time to build out proper tooling. Just shove everything in a prompt and call it a day. Some people have families to feed, just do what you’re told.
CompoundEyes 1 days ago [-]
Agreed. I’ve been telling my team to build up internal packages so we can push all that ad hoc reinvention into something more tangible and deterministic. Invest the $$$ in inference into something the agent can reach for next time that’s neutral and consumable by other code to reduce future spend.
trollbridge 1 days ago [-]
Yes. Build compact CLI-driven tools, write a skill for it (you can use your agent to do most of this work for you).
It just requires being willing to think instead of mashing prompts into a keyboard.
jerojero 1 days ago [-]
Because you look at the work from the perspective of a programmer, not the perspective of a regular person.
Normal people have never gone around automating their work. The most automation they do is dynamic tables on excel sheets.
I obviously know building a tool that can programmatically do something is a better solution, but I think that requires a fundamental shift in how people work. People need to be told by someone "this is how you should be using the AI" but right now they're simple told "use the AI".
gonzalohm 19 hours ago [-]
I'm talking about programmers doing this. That's what's sad. These were normal people before,but it feels like they have some kind of AI schizophrenia now. They don't use their brains anymore
avereveard 1 days ago [-]
Same, even opus favor short term solution and scripts with a billion flags that constabtly require rescanning to understand how to launch it is a constant struggle to get it to build sane default and reusable scripts that run with minimal parameters
gonzalohm 1 days ago [-]
Yeah, and what's up with adding dry run to everything? I saw some code that doesn't write anything but still the AI added a dry run which had a completely different codebase
duskdozer 1 days ago [-]
Because dry run is in a lot of scripts in its training data. It's not "thinking" about the script or the concept of a dry run.
lanstin 1 days ago [-]
And everything configurable gets an environment variable. Editing the first few lines in the script is a fine way to configure things in Python.
plmpsu 1 days ago [-]
AI can do things around semantic analysis that a deterministic diff tool cannot.
I understand and agree with your point though.
bilekas 1 days ago [-]
I'm curious if you could give me an example of something that couldn't be down deterministically. We have fuzzy search/matching too ? Regex is a monster when used correctly.
GeoAtreides 1 days ago [-]
A model can 'analyze' the intent of a patch, 'understand' it, and then correctly merge it in a derived codebase, going further than merely resolving conflicts.
plmpsu 1 days ago [-]
Pretty much anything for which you'd need intelligence of any kind. Questions such as: Do these two paragraphs have the same semantic meaning? Do they have the same sentiment? Do these two methods have the same contract? etc.
Not all documents our code and even with code deterministic tools gets you only so far.
8 hours ago [-]
SpicyLemonZest 1 days ago [-]
I sometimes find myself with thousands of log lines from a problematic execution and a known good reference, wondering nonspecifically if "something weird" happened in the first one. I don't think there's any matching-based solution there; you need a scan process that understands variations in execution time, object identifiers, etc. aren't meaningful.
bilekas 1 days ago [-]
You would need specific domain knowledge and a very clever parser, I've done one for a ridiculously over engineered system but a pain. That's fair but how often would you need it? Certainly not token maxing amounts!
SpicyLemonZest 1 days ago [-]
It's a spectrum. Could it be worth it to run that as a first pass on every report of anything going wrong, just in case it produces a useful insight? Depends on how much engineering time it saves!
In practice, I've found the answer seems to be "not much", because human triage is still required to understand whether the insight is correct and useful. But I'm not sure that was obvious in retrospect.
bilekas 1 days ago [-]
It's this and worse. To use your example, it's like people using AI to write a diff algorithm, incorrectly, then using AI to fix it, because they don't know that diff exists already. Lazyness and starting development with a very low level of understanding. People think lowering the barrier to entry is a good thing, when in reality there are just fundamentals and things you just have to know before you can start using a tool like llms properly.
rich_sasha 1 days ago [-]
Isn't that the supposed point of it though? At least how it is marketed/hyped. Don't use your brain, you don't need one, spend all your thinking energy on... dunno, something else, and leave all the "mundane" stuff to AI. Just pay for the tokens, it's going to make you 10x more efficient, the $1000/month is worth it.
m3nu 1 days ago [-]
100% this. For my own company I mostly build deterministic workflows that may have a simple AI step in the middle using an appropriate Chinese model in a very limited way. I wouldn't want to burn tokens to satisfy some metric.
With this AI is a fallback and not the default. Sounds like large companies have it backwards.
dawnerd 1 days ago [-]
Same with writing boilerplate code. It’s been a solved problem yet here we are.
This is why we have business analysts and software developers.
To help identify inefficiencies and to build technical solutions.
r_lee 1 days ago [-]
it's all about cost at the end of the day. if you're allowed and encouraged to tokenmaxx, then of course this'll happen.
cyanydeez 1 days ago [-]
Oh no! People are doing what they've been told to do!
cindyllm 1 days ago [-]
[dead]
jgalt212 1 days ago [-]
I agree, but even this use case isn't the most wasteful. The interwebs says Agentic consumes 50% of token use, but I'd hazard this number is north of 90% for many shops. My cynical view of Agentic is its sole purpose is to make "number go up".
id 1 days ago [-]
Look at me! I'm the smartest guy. I've wasted 10M tokens! No one has wasted more!
kermatt 3 hours ago [-]
My org has a monthly team plan, and because I don't use what I would call unsupervised agents, I rarely come close to exceeding limits. I guess I am one of those "chat only users" that so many articles limit my output. A split terminal with Vim on the left snd Claude on the right has been a great combination. Neanderthalic AI.
Personally I'd consider efficient and economical use of AI to be a key skill for a good developer. Using AI for everything at full throttle would appear to be a crutch. I wonder if this will eventually become a hiring criteria for companies without unlimited budgets (most of them).
dgellow 1 days ago [-]
The cost is a problem, but IMHO more important is delegating so much of your internal knowledge, thinking, and systems to a 3rd party.
We are very close to the point where if Claude and ChatGPT APIs are down, companies cannot function. How is that introduced so quickly into so many critical places without taking that specific fact in consideration? What is the plan for all those companies whose workflows now depend heavily on a remote LLM whenever the services get cut? What if your company account gets banned?
In some ways it is worth than depending on a company for hosting, because even your debugging tools are based on AI. MCP is great to go through datadog, sentry, until your agent or the MCP server are down and you don't know how to look for the issue yourself because you do not actually understand how your systems work.
HappMacDonald 17 hours ago [-]
> We are very close to the point where if Claude and ChatGPT APIs are down, companies cannot function.
Contrast with Gmail/Gsuite/Outlook365/QuickbooksOnline/etc are down, though.
What you cite here isn't a direct attack on AI but on centralized service provision in general. Unfortunately that battle has been lost for decades, now.
dgellow 11 hours ago [-]
None of those are doing the actual development. Here we are talking about a technology people delegate judgement, technical expertise to. It’s way, way deeper of an integration than a standard saas
duskdozer 1 days ago [-]
Those sound like problems for another quarter. The people making the decisions ride the AI hype wave, and if in the worst case the company tanks one day, they take their severance package and leave.
new_account_102 1 days ago [-]
[dead]
cs702 1 days ago [-]
There's an old saying, "in the land of the blind, the one-eyed man is king."
Here we have the opposite: In the land of the one-eyed, the blind are leading.
The blind in this case are all those executives and managers who don't understand much about AI's current potential and limitations, and so far have treated it like a magic button that will solve everything. The one-eyed are rank-and-file employees who maybe sort of know a little more about AI.
pocksuppet 1 days ago [-]
Executives and managers are the ones who correctly understood which game was being played. The game we are playing is not one of making good products, it's one of getting money from people who both have more money and are stupider than you. They're succeeding at that. We're also doing it, but we're not getting as much money.
bastawhiz 1 days ago [-]
In many cases, the people who have more money and are stupider than you are other executives. Sam Altman is arguably one of the executives who know how the game is played. OpenAI is at the front. Microsoft's executives are an example of the ones who got played.
1970-01-01 1 days ago [-]
Would have been nice to see 'soaring costs' with numbers. WSJ could do better here. Hundreds of thousands of dollars a month is nothing compared to how much they take with better financial models.
scronkfinkle 1 days ago [-]
On the one hand, organizations are without question using LLM's well beyond what is actually necessary, and as reality kicks in they're forced to scale back accordingly. However at the same time, on intervals counted in months, we're seeing breakthroughs both in hardware and software that dramatically reduce the cost of inference.
Between corporate FOMO and the rapidly decreasing costs of actually running LLM's I'm interested to see at which side of the spectrum these two meet
lumost 1 days ago [-]
They are likely also starting to realize that the end result of their anthropic contract is that nobody but anthropic knows how to run their business. Why would anthropic not treat their business like a utility in the future?
Majeh905 1 days ago [-]
Don't have a subscription to wsj.
Only thing I can say AI was useful for, in a corporate environment, was learning a new coding language on the fly. Gives me a baseline to work off of and fix.
But I can learn without it, too. A nice tool, but not a need.
dude250711 1 days ago [-]
> Don't have a subscription to wsj.
An ironic analogy sort of, once media started hiding behind a paywall, I just stopped reading them rather than paying. Same with LLMs - usable if cheap/free.
Havoc 1 days ago [-]
Corporate or corporate in programming space?
90%+ of corporate people are not programmers. 1 programmers can cause the same token damage with a bunch of concurrent agents as a couple thousand Karens in compliance asking a chatbot questions
It's much easier to deliver incremental AI ROI on the later even if it's hard to measure/quantify. A 1000 tokens might point this compliance person in the right direction on a key problem. Meanwhile 1000 tokens doesn't get you anything useful on coding
wg0 1 days ago [-]
The other day we (wrongly) concluded that product market fit has been achieved and now the rivers of hot molten milk chocolate and honey are all that's in the future etc.
Where is the tokenmaxxing chad / chadette that burnt a half a billion dollars in a single month?
OutOfHere 4 hours ago [-]
Tokenmaxxing is absurd. Using a fixed cost monthly plan seems sensible. Taking time to review the generated code is a good thing.
UltraSane 1 days ago [-]
These articles are weird because rationing consumption based on price is one of the most fundamental concepts in economics.
elevation 1 days ago [-]
Another reason to favor using AI to build automation instead of relying on it in prod: the risk of war and global instability.
If LLMs are genuinely helpful or even decisive in a military engagement, you can expect any host country to commandeer whatever data centers they need, leaving commercial entities to bid up the prices on the leftover capacity.
Another risk is that data centers are a great target for cyber warfare.
It’s ideal if your business can leverage LLMs when they’re online but continue to operate profitably when they’re offline.
lanstin 1 days ago [-]
Even regular warfare, if the Middle East AWS regions are an indication. The giant and arguably excessive data centers being built are not hardened physically.
checkaiclaims 1 days ago [-]
As a developer, I don’t think it’s just that costs are going up. I’m also seeing more people lately talk about “vibe slop”.
OptionOfT 23 hours ago [-]
We have some dude at work who runs their own agent that makes constant commits. We're supposed to review the agent's output.
checkaiclaims 22 hours ago [-]
Exactly. The bottleneck becomes human review, not code generation. Agents can generate commits faster than humans can verify whether those commits should exist.
BearOso 1 days ago [-]
I've noticed as well. A lot of pull requests are just agents running constantly, hoping to have produced something of value. Entropy is at an all-time high, though.
22 hours ago [-]
checkaiclaims 22 hours ago [-]
The bottleneck becomes human review, not code generation. A PR can look plausible and still add more entropy than value.
marcosdumay 1 days ago [-]
There's a paywall, but it's an interesting question how much of the recent explosion of the AI companies revenues is because of the explosion in prices, and how much their customers will accept the increased prices.
sbochins 21 hours ago [-]
This phase that all these companies went through doesn’t seem that bad. Before these places had a big problem where all their employees didn’t understand how to us ai for their work. Now they’ve overspent and tokenmaxxed and haven’t seen much from it. The next phase is to set the goalpost lower and set quotas based on who uses ai more effectively. Eventually the folks that use it well and are productive will bring in roi. Then you can fire all the folks that aren’t using it effectively and replace them with people that know how to use it. We’re already starting to see this.
dangus 1 days ago [-]
I’ve seen comments on other threads on this subject the general idea that these article headlines are overstating the pullback from AI.
In other words, the news cycle is looking for an AI story that lands with readers, and that the example
of Uber blowing through its AI budget and Microsoft discontinuing use of Claude internally are not good indicators.
I agree that those aren’t good indicators.
However, at some point we have to remember that CEOs and boards of directors are just regular morons who read the news the same way everyone else does.
At some point, if a lot of corporate leaders associate AI with mediocre results, high costs, and public backlash, they might just start saying “this juice isn’t worth the squeeze.”
It will be interesting to see to see Anthropic’s “revenue bubble” pop as this happens. At least it should hopefully free up some capacity.
yowo 15 hours ago [-]
- Global economy on the verge of depression.
- ChatGPT drops, AI is perfect to be our savior.
- AI glorified as the great messiah.
- Everyone worships stocks even remotely related to AI.
- Execs desperate for relevance boast about tokenmaxxing.
- SHTF
- burst
- last year flagship GPUs and DRAM are sold used for the price of a burger.
- Laidoff people start using local AI as hardware price drops to make actual useful stuff
- New round of bootstrapped tech bros that eventually give birth to the new metaverse/NFT/etc.. hype.
486sx33 6 hours ago [-]
[dead]
glass1122 1 days ago [-]
[dead]
ath3nd 1 days ago [-]
[dead]
feverzsj 1 days ago [-]
LLM doesn't work, let alone profit.
ninkendo 1 days ago [-]
Yesterday I updated our dependency on the sqlx crate and put up a PR, and it failed in the CI build in a way I couldn’t reproduce locally.
I asked codex to take a look, and it:
- Grabbed the CI logs on its own to figure out what the CI error was
- Looked at my local setup
- Looked at the changes in sqlx from 0.8 to 0.9
And figured out that sqlx depends on an updated version of the “whoami” crate but doesn’t specify default features, which causes it to fall back on a stub implementation that makes the default user “anonymous”, which was failing to authenticate to the UNIX socket we use in our CI Postgres server. It patched the environment variable for our docker container to explicitly specify a username and the issue was fixed.
It would’ve taken me probably several hours to figure this out on my own. It took codex maybe 5 minutes.
Tell me again how LLM’s “don’t work”?
orwin 1 days ago [-]
I agree with your point in the broad sense, but the example might be bad. If sqlx is an important crate, and not stable yet, upgrading it without reading the changelog is honestly a flaw in your team process. Using the AI to fix organisational issues is typically one of the reasons I'm very skeptical of AI improving productivity in the long run.
I'm not taking a shot, to be clear, we had a similar issue a few years ago and we made sure this wouldn't happen again, that's absolutely not a shot, nor do I think it's a character flaw to use AI, au contraire, this is a very good use. I'm just worried that because AI is so good at fixing minor issues caused by governance/organisation flaws, we will be stuck using it to fix those and be trapped in mediocrity (that's not an issue for me, mediocrity is where I work best, but I'm a bit sad for the great Devs I've worked with.)
ninkendo 1 days ago [-]
> If sqlx is an important crate, and not stable yet, upgrading it without reading the changelog is honestly a flaw in your team process
It’s not in the changelog though, this is an update of a transitive dependency that inadvertently changed the default behavior. sqlx didn’t document this because they didn’t even know it changed.
Even if it was a documented change, our process caught it because it was caught by CI. The issue itself was only a result of how our CI was configured (we had a database url with a domain socket path that didn’t explicitly specify a username, and we inadvertently relied on the default of “the current user”, which the whoami crate now defaults to “anonymous”.) I don’t see an issue in our “team process” (whatever that means) at all.
pocksuppet 1 days ago [-]
You used it in a way where the result was simple and you could verify its correctness. You used it as a super-search tool, it's good at that. It's a different use case than having it generate a lot of code from scratch.
janussunaj 1 days ago [-]
Exactly. If people understood that this is super-search and super-autocomplete, we'd maybe find a real net-positive use for the tech. But I think the conversational tone will keep fooling us, especially since the LLM providers have heavily invested in that direction.
gamblor956 14 hours ago [-]
It sounds like the real problem was that the programmer was not familiar with the tools they were using and decided to dig themselves out of a hole of their own making by turning to AI instead of learning to use their tools better.
r_lee 1 days ago [-]
elaborate please, how does it not work?
bigstrat2003 16 hours ago [-]
They cannot be trusted to produce output that works (let alone works well) because they are just statistical models, without any actual understanding of what they produce. That means that you have to carefully review every single line of code they produce, because you don't know where the hallucinations will be. But by the time you do that, you have saved no time at all (indeed, in my experience you lose time), because typing the code was never the part that took time. It was understanding the problem. So if you use an LLM, you spend a bunch of money for zero gain in productivity, or you sacrifice quality and pray there aren't nasty bugs lurking. I certainly think it's fair to call that state of affairs "it doesn't work".
And I'm seeing almost no self-awareness from leaders. They are making decisions about things that they just don't understand. And are completely unworried about it. Just blindly following whatever the news cycle is about AI.
As I was saying, you're all fired.
I predict that will be their comeuppance; it will begin a new era in history.
What he says he's consistently hearing from them mirrors what I saw at my own employer: they thought they had ROI metrics, but they actually only had usage metrics such as "lines of code committed" or "number of pull requests". The only way those could possibly work as an ROI measure is if your business charges customers by the line of code.
Which reflects another thing I’ve seen at work. A lot of what AI coding has enabled is diving headfirst into quagmires. Our costs have spiked - not just because of the token spend, also because we gotta pay the cloud platform to run all these new services, operators to operate them, marketers to market them, etc. - but revenue hasn’t budged.
Eventually I got tired of it and got back to work.
As a leader, pushing for rapid change cannot really be nuanced lest the push dissipates into the organization's entropy.
It's irrational to push for tokenmaxxing (literally "please increase our AI spending") and not expect that this is the result you are going to get. You won't get productivity increase, since that is not what you are pushing for - you will get token usage maximization (engineers running inane agentic tasks against your code base to increase usage, using company paid AI for their side projects, etc, etc).
Perhaps that is what they were trying to do, but the reality is that all they will have got is a large token bill. The decision makers may have hoped that tokens would be used in most productive fashion possible so they could evaluate if the cost was worth it, but what they will have actually got is what they asked for and measured, high token usage (applied to whatever people needed to do to get their usage stats up, regardless of productivity).
The other business-as-usual factor is that there will be false reporting up the chain, so if the company understands the CEO want to see high AI usage and productivity gains, then s/he will see high AI usage (a large token bill) and will be fed success reports of corresponding productivity gains.
In a typical corporate environment, if all your peers are reporting success, achieving what the CEO wants, do you want to be the only one reporting failure? So - everyone reports high AI usage (easy for the employees to make happen), and most everyone also reports productivity gains if they understand this is the expectation.
If you want to report productivity gains or cost savings from some initiative (increased AI usage or whatever) and need some stats to point to, then you just point to whatever is working, for whatever reason, and attribute the success to the new initiative.
In a company I used to work for, one manager, when pushed to increase machine learning usage (a few years back, before ML became AI), just renamed his product from foo to foo-ML (with ZERO ML usage), and reported how well it is working. He has since been promoted twice.
Do these measurements have sufficient information? As much as any, I'd guess. It sounds like you already know that it's pretty hard in general to measure the productive output of software development organizations.
Don’t play their game and call them leaders. They are management, bosses, executives.
> They are making decisions about things that they just don't understand. And are completely unworried about it.
Clowns, even.
> Just blindly following whatever the news cycle is about AI.
But followers might be most apt.
——
This is such a huge pet peeve of mine. Describing management goofs using their language that makes them sound all-so-brilliant. We constantly watch these people do the dumbest shit and then they go around describing themselves as “thought leaders” and “servant leaders”. When, really, most are just clowns with fragile egos.
And, while I’m rambling, they’ve tried to take away the fact we are workers by calling us individual contributors. Using language to attempt and hide the hierarchy and power dynamic at play. It just…bothers me so much.
And many of them still claim they are "risk takers", but have effectively insulated themselves from risk by socializing losses.
You're falling into a common trap here: the ambiguity of the English language.
"Leader" means multiple different things. Yes, it means someone who has leadership qualities—who genuinely inspires those around them to do better, or who boldly marches into the unknown and gets people to follow them.
But it also means "someone in charge of a thing."
Now it's certainly true that many people in charge of things who are also really bad at actually inspiring or getting people to follow them (aside from with threats of destitution) also play on that ambiguity to try to convince people that because they're in charge of things, they must also be Good Leaders, and that's crappy...but yelling at others for using the term casually is very much an "old man yells at cloud" situation.
Why be a normal guy that waits to see what happens and is measured and pragmatic when you can get attention basically through the whole cycle by being the earliest adopter, adopt it to the maxx, then also be the loudest big brain when the tide changes and be praised for "taking hard decisions" when you revert everything you said so far?
The fakemaxxing economy.
Understanding this was one of the most important things in my career.
It would only be laughable if they waited way too long to reverse course, but I don't think that's the case.
Over/undershoots and corrections are of course unavoidable and normal; the absurdity is at the magnitude and rate of change. Furthermore, this is giving it the benefit of the doubt, that measuring AI spend is a good indicator; that's arguably also in dispute. To stretch my car analogy a bit more: it would be like the cruse control system has to hit the target speed, but it only has data from the O2 sensors.
[0] I know that the "classic" cruise control system cannot apply the brakes, but hey no analogy's perfect.
How much that makes it into enterprise pricing is TBD, since none of the hyper scalers are making money yet of selling AI inference.
Almost all businesses are ahead of the gun. For most of their use cases, AI is either not yet good enough on its own, or good enough but too expensive.
No one wants to get left behind, so everyone's trying to get onto it now, even though it's not ready for what most enterprises want to do with it.
It's easy for them to look at a small startup without billions of lines of legacy business logic debt and see them having success and wonder why they can't have just as much - or more - why they're bigger so they should have better and more success, right???
Wrong...
But when it gets ~99% cheaper for local inference over the next 4 years, at the same time the price per watt improve 4x -> a lot of those cases will start to pencil out.
The Chinese, since they lack computing hardware due to US export controls, are.
So I just started trying CodeWhale (https://github.com/Hmbown/CodeWhale) with DeepSeek V4. I expected to be impressed by the abilities (which still require plenty of oversight). I didn't expect to be completely shocked by how cheep it is. After most of a week of using it 4-8 hours a day, which would amount to a full week of coding in many jobs after you account for non-coding activities, I'm about to hit $3 in total usage. So we're talking $10-20 per month for single-agent use by a full time software developer? And I'm sure some of my usage is waste as I'm still getting my head around things like compaction. If I take a break for a few weeks, I pay nothing because there is no subscription.
If DeepSeek and Xiaomi MiMo stay within a few months of the US-based models in terms of capabilities and US companies don't figure out how to drastically cut prices, I can't see how China hasn't already won. Protectionism would be one reason, but that might be ceding 50-90% of the total addressable market, and bring us closer to moving knowledge work out of the US the same way we did with manufacturing because it's too expensive in the US.
The first big task was to find the common bits and abstract them out. It did a great job of creating a plan, summarized in a table, that gave a name to shared chunks, the line numbers in various files where they appeared, line counts of new functions vs. removed bits, and some pros/cons about splitting out each chunk. It was very well "thought out", so I told it to go ahead. It did a nice job other than straying from my coding conventions. That gave me a chance to build out my AGENTS.md file (it helped with that, too).
Once that was done, I had it create automated tests for the newly abstracted parts. I think this is probably a bad practice... I believe humans should at least define what the tests are testing so that there is a deeper understanding of what oversight is in place. But I was just trying things. It surprised me how well it did. The biggest surprise was that the tests seemed quite inspired by vision. It would try different parameters and then have comments about making sure the shape protruded in a certain way, then code that did that. I expected it to refactor a bunch of the code to make it more testable. It found a way to not touch the code while testing everything I asked it to with just two simple mocks - I hadn't foreseen that, but it felt quite practical. It was passing around several opaque tuples in the tests and accessing items in them by index. I prompted it to replace the first one with a frozen, kw-only dataclass. Then a second. On the second request, it saw the pattern and did the rest without me asking. It created 44 tests across a handful of files.
The next part is where I was the least happy. I use ruff and ty to check my code with almost all checks enabled. It was mostly good about the ruff issues. But for the type checking, it just wanted to disable 6-8 rules for the entire repo in pyproject.toml, or at least for all the tests. I had to repeatedly tell it not to and it kept telling me it wasn't recommended. When it finally gave in, it fixed most of the type issues (build123d has lots of types specified, but many operations result in type conflicts because things are so deeply overloaded). The things it didn't fix, it just left a comment to ignore type checking altogether on that line. After I did a little more brow beating, it finally changed the comments to only disable specific rules. To be fair, and unlike most of my other repos, I've had to spend way too much time getting types right in this repo myself.
My last task involved a small library management system for our little town library (tracking library cards, books, DVDs, check-outs/check-ins, etc.). I inherited it from someone who had built the entire web app out of bash/awk/troff scripts with the data in text files burdened by a lot of schema changes that he didn't really know how to deal with. I'm halfway through moving it to Python/FastAPI/SQLite. I asked it to do a security audit of the entire code base, both the newer parts and the old parts that are still in bash/awk/troff. It found everything I knew about and a few things I didn't know about. It made a decent assessment of the risks/impact of each issue. It also called out design decisions that were good security practices. One of the next big tasks will be to see how it does at continuing the migration - it has enough examples of how I've done it that I suspect it can do something fairly consistent with my thinking. I'll probably have it do one or two web pages. When I feel like it understands what I'm after, I'll tell it to use sub-agents to do the rest. I'll be very happy if I don't have to tease apart any more troff scripts that are generating PDF files!
I doubt it is really any different to what the US labs do [1]. I never really bought the "they were basically all just distilling from us" shtick from Anthropic, I just assumed they were either comparing or also creating training data as basically any lab is doing.
[1]: https://www.reddit.com/r/ClaudeCode/comments/1tqaist/opus_48...
Do you mean the marginal cost by the producer, or the cost on the consumer? I can't see the price of electricity falling much, and the demand curve is apparently exponential if the hype is to be believed.
Computing has always been about how to wring out more efficiency. The ENIAC was 150,000 watts, with 3 phase 240 volt power, and cost about $500,000.
My day to day laptop (a year old) is 35 watts, with 1 phase 20 volt power, and cost $1,000, so that's 99.98% less power consumption, 99.8% cheaper, and it has about 10 orders of magnitude more computing power, all on a time span of 80 years.
There's always a chance we'll have some dramatic gains far larger than DeepSeek's optimizations a year ago, but it hasn't happened again yet at even that scale. It would be nice but I certainly wouldn't count on it.
And the technology already exists on the algorithmic front TODAY to lock in another 10x gain -> when, typically, algorithmic gains only account for ~30% of that drop and the other ~70% comes from better data (often synthetic) and knowledge distilation from frontier models.
Just look at DeepSeek's pricing...
Historic trends, every 18 months, performance for the same level of quality has gone down 90%.
See: https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_co...
And Chart 13 here: https://www.rdworldonline.com/ais-great-compression-20-chart...
And here: https://epoch.ai/data-insights/llm-inference-price-trends
Historically, algorithmic gains are only ~30% of the pie, but there's enough out there to get to 10x, with just what's available already. The other ~70% of the pie is better training data (often synthetic) and distilling frontier knowledge. There's no sign we are tapped out on that front.
Additionally, GRAM (from ~10 days ago) is likely to be a 5-10x on its own (if not substantially more for smaller models). It's unlikely within 4 years LeCun's JEPA ideas and similar ideas like GRAM applied to LLMs have ZERO impact. The preliminary results are absolutely astounding (5000x better reasoning - this is not peanuts).
Further, that's not even counting that cost per watt is still dropping ~2x every 2 years on its own on the hardware front.
If you look at the "cost" of inference. People think it's electricity - but it's currently almost ~80% hardware amortization. The memory shortage is not going to last, nor are Nvidia's ~80-90% margins.
The human brain is still 8-10 orders of magnitude more efficient than the best LLMs of today. With ~1/10th of global capex riding on AI, if you don't think they're going to knock of 2 orders of magnitude more, when it's this obvious and easy... I don't know what to tell you...
Sure, it might take 6 years instead of 4. My crystal ball isn't perfect.
I think what will also happen, once we get past this current CEO AI FOMO mania, is that companies will start to look at AI spending more rationally like any other company expense, and will revert to more rational decision making.
Even if the cost comes down considerably over the next few years, that's plenty of time for companies to look at their financial results and question why AI expenditure isn't resulting in increase in revenue and/or profitability.
See https://arxiv.org/abs/2604.04364.
This won't really show up in benchmarks, but it will impact real world usage on the most common use cases.
I'm doing a study right now on the impacts of better context for small models to fix bugs.
A very dumb algorithm can make small models perform at 10x+ model sizes. I'll be surprised if it can't get to 20x+
Thank you for sharing this and for having the intellectual courage to hold to a sound reasoning that may be unpopular initially.
And fusion power is just 2 decades into the future!
We have little visibility into current frontier model costs at mass scale. As a broad historical trend, tech costs tend to fall over longer time periods but your claim far exceeds Moore's Law rates in its heyday - and that heyday is long gone.
In 2021 TSMC announced it was increasing it's price per gate for new nodes for the first time in its history. In the past five years cutting edge nodes have delivered ~8-15% real-world performance gains on average at costs at least 10-20% more than the last node. If you're positing a string of unprecedented efficiency breakthroughs in LLM algorithms - such extraordinary claims require extraordinary evidence.
People are willing to pay more for BETTER quality.
You obviously haven't seen DeepSeek v4 Pro's pricing if you think pricing only goes up...
Let's look at GPU prices as an example. Around 12 years ago, I bought a GTX 970 for around $350. That was considered a very good GPU at the time. Today, the "equivalent" GPU model (RTX 5070) now costs almost double. Of course, the newer GPU is much more powerful (more than double, in fact), but all the things you'd use a GPU for have also advanced and now expect an entirely new level of performance as a baseline, such that the older GPU is fairly worthless today. So most people agree that GPUs in general have become more expensive.
Regarding DeepSeek's price: it's obviously subsidized, and unlikely to match the actual inference cost right now.
Then buy $10 (or $2, if you're cheap, and they take PayPal) of DeepSeek credits.
Whilst you're at it spring for a Claude subscription too and GPT.
Switch models between Qwen, DeepSeek Flash, DeepSeek Pro, and you can meet 99% of your code generation needs.
Hop over to Opus 4.7 (or 4.8, but I haven't really used it yet) and GPT-5.5 when doing very complex architecture/design or troubleshooting something where DeepSeek Pro is getting stuck.
It is ridiculous how cheap this stuff is now. It's affordable at third world prices.
> spring for a Claude subscription too and GPT.
You started with some random pricing then veered off into impractical hand waving. Far above third world prices...unless you count the USA as third world, I guess.
If it was so good I would expect to see 2005-2015 advancements yearly.
Meanwhile China is blowing past the world with real improvements in the real world- solar, EVs, etc. meanwhile people keep making their fancy sans serif websites about todo apps, faster than ever before. Useless.
I don’t disagree that AI is overhyped. But I think you are probably looking in the wrong place.
I think most software that is written isn’t really a product, at least not a public product. It’s an in-house tool or a one-off project needed to complete some larger task. People everywhere are always writing small programs that make their life or job just a bit easier (and explains why so many corporate projects are little more than an excel spreadsheet).
And there are a lot of people who have made custom software just for themselves with AI. Not a product, just a tool or project that finally made sense to build.
After 1T in spend, it's still not clear that AIs will beat out a $30k secretary.
Also, your link has nothing to do with computers.
The link talks directly about the disconnect between the supposed productivity benefits of a technology and the measured productivity benefits of it in practice. And provides historical context about why the “obvious” benefits of a computer did not materialize when it was introduced; business and their processes had to be rebuilt around the computer before real gains were seen.
It's a structural deficiency in the way they work that can't just be handwaived away.
The perennial reality is that automation is inherently inflexible, so there's only so much of it that you can do before you've committed a huge strategic blunder by making your business resistant to change and severely curtailing its ability to cope with situations that don't cleanly fit the mold. So then we need to hack in ways to deal with the exceptions, but, since they're hacked in, they're often painful and time consuming. Sometimes so much so that after the new process stabilizes it turns out to be even more cumbersome and require more manual effort than the system it replaced.
When anyone other than a technologist suggests doing that kind of thing, we call it "bureaucracy", and we hate it. I think maybe what we have trouble seeing is that there's actually a pretty fundamental difference between automating purely technical processes like server deployment, and automating processes that are fundamentally about mediating human interactions.
It doesn't have to and I'm pretty sure it won't.
Very little about the American economy even makes sense for keeping the edge on LLMs beyond a few years. All the things I would think would be required: energy, research, construction capacity, labor costs -- it's pretty hard to deny who's on the upswing these days. China cranking out current generation microchips will be the last nail in the coffin.
I would agree that a lot of companies talking a big talk about using LLMs are failing to actually apply it in a sensible way to their business.
What changes have LLMs (Not AI, not machine learning in general, I'm not going to waste time discussing the definition), LLMs made in the past 4 years that indicate anything close to the above? Solving a whiteboard math problem?
For example. Imagine that you are comparing two documents (let's assume diff doesn't exist). You could ask an AI to compare the differences from you or you could use AI to write a tool to do it. For whatever reason, people are starting to go with the former not realizing that now they basically have to pay to compare documents.
My answer: no, but it was able to help me find the website and social handles for every beat writer for every team, and generate a simple website where I can do a daily skim of teams/players and draw my own conclusions.
LLMs are a tool, not a panacea.
It just requires being willing to think instead of mashing prompts into a keyboard.
Normal people have never gone around automating their work. The most automation they do is dynamic tables on excel sheets.
I obviously know building a tool that can programmatically do something is a better solution, but I think that requires a fundamental shift in how people work. People need to be told by someone "this is how you should be using the AI" but right now they're simple told "use the AI".
I understand and agree with your point though.
In practice, I've found the answer seems to be "not much", because human triage is still required to understand whether the insight is correct and useful. But I'm not sure that was obvious in retrospect.
With this AI is a fallback and not the default. Sounds like large companies have it backwards.
This is why we have business analysts and software developers.
To help identify inefficiencies and to build technical solutions.
Personally I'd consider efficient and economical use of AI to be a key skill for a good developer. Using AI for everything at full throttle would appear to be a crutch. I wonder if this will eventually become a hiring criteria for companies without unlimited budgets (most of them).
We are very close to the point where if Claude and ChatGPT APIs are down, companies cannot function. How is that introduced so quickly into so many critical places without taking that specific fact in consideration? What is the plan for all those companies whose workflows now depend heavily on a remote LLM whenever the services get cut? What if your company account gets banned?
In some ways it is worth than depending on a company for hosting, because even your debugging tools are based on AI. MCP is great to go through datadog, sentry, until your agent or the MCP server are down and you don't know how to look for the issue yourself because you do not actually understand how your systems work.
Contrast with Gmail/Gsuite/Outlook365/QuickbooksOnline/etc are down, though.
What you cite here isn't a direct attack on AI but on centralized service provision in general. Unfortunately that battle has been lost for decades, now.
Here we have the opposite: In the land of the one-eyed, the blind are leading.
The blind in this case are all those executives and managers who don't understand much about AI's current potential and limitations, and so far have treated it like a magic button that will solve everything. The one-eyed are rank-and-file employees who maybe sort of know a little more about AI.
Between corporate FOMO and the rapidly decreasing costs of actually running LLM's I'm interested to see at which side of the spectrum these two meet
Only thing I can say AI was useful for, in a corporate environment, was learning a new coding language on the fly. Gives me a baseline to work off of and fix.
But I can learn without it, too. A nice tool, but not a need.
An ironic analogy sort of, once media started hiding behind a paywall, I just stopped reading them rather than paying. Same with LLMs - usable if cheap/free.
90%+ of corporate people are not programmers. 1 programmers can cause the same token damage with a bunch of concurrent agents as a couple thousand Karens in compliance asking a chatbot questions
It's much easier to deliver incremental AI ROI on the later even if it's hard to measure/quantify. A 1000 tokens might point this compliance person in the right direction on a key problem. Meanwhile 1000 tokens doesn't get you anything useful on coding
If LLMs are genuinely helpful or even decisive in a military engagement, you can expect any host country to commandeer whatever data centers they need, leaving commercial entities to bid up the prices on the leftover capacity.
Another risk is that data centers are a great target for cyber warfare.
It’s ideal if your business can leverage LLMs when they’re online but continue to operate profitably when they’re offline.
In other words, the news cycle is looking for an AI story that lands with readers, and that the example of Uber blowing through its AI budget and Microsoft discontinuing use of Claude internally are not good indicators.
I agree that those aren’t good indicators.
However, at some point we have to remember that CEOs and boards of directors are just regular morons who read the news the same way everyone else does.
At some point, if a lot of corporate leaders associate AI with mediocre results, high costs, and public backlash, they might just start saying “this juice isn’t worth the squeeze.”
https://news.ycombinator.com/item?id=48268871
https://news.ycombinator.com/item?id=48238896
https://news.ycombinator.com/item?id=48307098
- ChatGPT drops, AI is perfect to be our savior.
- AI glorified as the great messiah.
- Everyone worships stocks even remotely related to AI.
- Execs desperate for relevance boast about tokenmaxxing.
- SHTF
- burst
- last year flagship GPUs and DRAM are sold used for the price of a burger.
- Laidoff people start using local AI as hardware price drops to make actual useful stuff
- New round of bootstrapped tech bros that eventually give birth to the new metaverse/NFT/etc.. hype.
I asked codex to take a look, and it:
- Grabbed the CI logs on its own to figure out what the CI error was
- Looked at my local setup
- Looked at the changes in sqlx from 0.8 to 0.9
And figured out that sqlx depends on an updated version of the “whoami” crate but doesn’t specify default features, which causes it to fall back on a stub implementation that makes the default user “anonymous”, which was failing to authenticate to the UNIX socket we use in our CI Postgres server. It patched the environment variable for our docker container to explicitly specify a username and the issue was fixed.
It would’ve taken me probably several hours to figure this out on my own. It took codex maybe 5 minutes.
Tell me again how LLM’s “don’t work”?
I'm not taking a shot, to be clear, we had a similar issue a few years ago and we made sure this wouldn't happen again, that's absolutely not a shot, nor do I think it's a character flaw to use AI, au contraire, this is a very good use. I'm just worried that because AI is so good at fixing minor issues caused by governance/organisation flaws, we will be stuck using it to fix those and be trapped in mediocrity (that's not an issue for me, mediocrity is where I work best, but I'm a bit sad for the great Devs I've worked with.)
It’s not in the changelog though, this is an update of a transitive dependency that inadvertently changed the default behavior. sqlx didn’t document this because they didn’t even know it changed.
Even if it was a documented change, our process caught it because it was caught by CI. The issue itself was only a result of how our CI was configured (we had a database url with a domain socket path that didn’t explicitly specify a username, and we inadvertently relied on the default of “the current user”, which the whoami crate now defaults to “anonymous”.) I don’t see an issue in our “team process” (whatever that means) at all.