Senko Rašić

Random thoughts | about

Duty calls

I've recently seen many people suffering form hype fatigue. People just don't have the emotional energy to engage for the N-th time with random strangers on the internet to patiently and rationally explain their position, but still want to engage.

I've seen this happen across forums such as Reddit, Hacker News, LinkedIn, the blogosphere, and elsewhere.

As a result, people just lose patience, have “shorter fuse” and write emotionally-charged pieces that polarize the issue. In the end, we're all just shouting across the board or preaching to the choir. The nuance gets lost, the most ludicrous takes get the most attention because everyone jumps to refute it.

The hype waves, and associated flame wars, come and go. Just in this decade we flame-warred on how the pandemic should have been handled, what to do with inflation, and the blockchain.

AI flame wars

The most recent hype wave is, of course, artificial intelligence. The hype fatigue I'm currently seeing everywhere – and experiencing myself – is related to the AI.

There are several AI camps, all warring with each other:

  • AI hypers – “AI is so great it'll do everything instead of us”
  • AGI/ASI hypers – a more extreme version of AI hypers: not only will it do everything, it's going to be sentient
  • AGI doomers – an inverse of AGI hypers: people are going to lose jobs en-masse, leading to major societal upheaval
  • ASI doomers – or worse, the AI will decide it doesn't want to serve humans and kill or enslave us all
  • AI doomers – AI won't do everything, but increased use of AI will worsen pollution, inequality, give more power to authoritarian states, and have other major harmful effects
  • AGI deniers – all AI use is dead because it will not, in fact, lead to AGI
  • AI deniers – it's a very expensive random number generator and it's useless in the real world
  • AI moderates – AI is great for some cases, has some serious problems, let's all try to be calm and use it responsibly (full disclosure: I'm here)
  • Clawbot crew – screw all that, I'm connecting it with everything I've got, what could possibly go wrong
  • the military – Clawbot crew with nukes

God, it was tiring just enumerating all of these, and I probably missed a few variants.

If you're in any of those camps, you'll feel that the others are either exaggerating, or not taking it seriously enough, or dismissing valid concerns or opportunities.

Good intentions and path to insanity

You may have the noblest of intentions in the start, trying to rationally explain your position, back your argument by evidence and proof. Yet every day, everywhere you look, you see these other ridiculous claims, and bit by bit, you lose your patience, lose your temper, become jaded and start preaching instead of arguing. People in your own camp will amplify such takes more because they resonate emotionally.

Of course, you don't stop to think at least some of the ridiculous claims you've heard were end result of just such process somewhere in another camp.

What's a rational person to do?

The clearest option is to get away from the conversation. Unless your work depends on it, you don't have to engage in these shouting matches. Step away from the treadmill, lower your megaphone, and snicker from the sidelines.

Or you can burn out, unsuccessful in your quest to educate the public, moving away completely from this part of the tech, disgusted by it all.

Or you can cynically see what's going on and – intentionally or not – use this to amplify your own voice, presence, and brand. Become an influencer, pundit, someone whose opinion matters, and who gets invited to podcasts and conferences. Tech hype, meet tech populism.

Coping

Here's how I try to cope with it:

I like the new technology and my day job is working with it and helping others understand it. So I can't just get away from it all – not without stopping doing things I like! But I have to be on the constant lookout. When I spot cynism, sarcasm or “wtf are these idiots thinking” train of thought, I try to stop myself.

To people I do reach, I try to convey a balanced picture of the situation, as reasonably and with supporting evidence as I can, and let them make their own conclusion.

Sometimes I succeed, sometimes not. When I spot a heated discussion or sense that someone has deeper emotional motives behind their position, I try not to engage – you can't out-reason a hothead.

I often remember the XKCD cartoon I included at the top of this post – “Someone's wrong on the Internet!”

That someone could be me, too.

Recap of my short posts on LinkedIn in February

AI Slop in Content Writing

Dear bloggers, content writers, commentators and social media managers: I know you like and use AI. For real. It's screamingly obvious. It's so obvious it screams slop – and the blame isn't on AI, it's squarely on you.

Even if the underlying idea or thought is your original, when you apply AI lipstick to it, you sabotage your own reputation. When someone sees an obvious AI post, they immediately discount everything you're trying to convey, even if it does have merit on its own.

Fortunately, this is easy to fix. Here are a few obvious tells that AI shadow-wrote it for you:

  1. You copy-pasted the “Would you like me to expand on this?” followup ChatGPT gives you without even reading the content. This is not just a red flag, this is cause for excommunication!
  2. Your post is full of em-dashes. Cm'on, you never even heard of it before, admit it.
  3. Lose the emoji. It was tone-deaf even before it became a sure sign of LLM authorship.
  4. The “It's not X. It's Y” contrasts are sometimes needed, but AI dials that to eleven.
  5. While I'm at it, read your text aloud: if a friend would ask for your health to hear you utter that, reword it.
  6. In general, if your text sounds like a TED talk, that's a bad sign, even if you 100% wrote it manually.
  7. Thank God Ghibli memes are out of fashion, but if I had a dime every time I see an image of a laptop with screen on the outside lid, or an unmistakable GPT-Image-style cartoon, I wouldn't need to be on LinkedIn anymore.
  8. Yes, ChatGPT and Gemini can do infographics. The results are crowded, hard to read, and boring to boot. I'd rather suffer through an emoji-riddled listicle instead.
  9. If you profess to be an AI expert and offer tips, tricks, workshops or prompt secrets to other, the above applies doubly to you. Low effort means not only you generate slop, you also sell slop.

There are better ways to leverage this wonderful new tech, that don't insult your readers' intelligence.

On the Slow Death of Scaling

On the slow death of scaling is an interesting essay about alternatives to “bigger is better” approaches in modern AI research.

It's a nuanced one and easy to misread: if you're an AI believer it's easy to retort with “scaling just shifted to inference time!”, and if you're a doomer you can point and say “see, exponential cost and environment impact for diminishing returns!” It doesn't say either of these things.

The “scaling laws” or “the bitter lesson”, or “when in doubt, use brute force” refer to the fact that it's often better and easier to solve the problem by applying a bigger hammer (or a graphics card, or a data center).

The essay just states it's not always the case, documents smaller LLMs that obviously outperform larger ones and lists several areas where compound approach improvements end up being better than pure weights/data/compute scaling: better (synthetic) data, chain of thought, distillation, reasoning, tools, RAG, agents...

I read it as an optimistic look into the future where, free from “just increase the size by 10x” arguments, researches can invent even better ways of doing AI.

But don't take my word for it – the essay is an easy read, no hard math, and only 12 pages long (the rest are references). Worth your time if you're into this stuff.

AI Coding, Mediocrity and the Elephant in the Room

Earlier today I had a chat with a friend (also a seasoned senior developer) about the future of coding (in the next year or so) and the implications for software quality.

We're all mostly concerned with whether AI can match human developers in terms of software quality, but the elephant in the room is our assumption that most code today is of good quality. And, to be frank, about the skill level of a median developer.

Median developer is mediocre by definition, and half are even worse than that!

Between the two of us and over many years, my friend and I saw a lot of code in various companies all over the world, from scrappy startups to BigCos, written by many different people.

Large amounts of said code were human-generated slop slapped together by mediocre coders, who weren't really interested in crafting beautiful art, and/or had tight deadlines and uptight bosses who wouldn't let them even if they had an inclination to.

When we talk about AI not being able to match the art and ingenuity of expert developers with lot of time on their hands, we raise the bar several notches higher than we hold it for a large number of human coders.

AI may not top the results of “A players” or “10x devs” or “top performers” (as startup gurus like to call the best software engineers out there), but they probably match “B players”, “1x devs” and “meets expectations” at 1% the price, and are already better than “C players”, “0.1x devs” or “needs improvement” coders.

This is not to insult anyone (or everyone), but to remove our rose-tinted nostalgia glasses looking at humans as somehow all being masters of their craft and bursting with creativity, skill and inspiration all the time.

BTW this applies wider than coding. I laugh sardonically every time I hear or read about “AI slop” on the internet, like we haven't had to endure decades of “human slop”, be it in form of text, video or code.

If AI slop is the death of the internet, we've been hooked to a zombie for a time now.

Cijene API Talk: Scraping, Billions of Prices, and Croatian Law

Last year I gave a few talks about Cijene API, a daily price aggregator API for Croatian retail prices.

I've now uploaded a standalone version of the talk to YouTube.

If you're interested in war stories of data scraping and validation, managing billions of prices on a single server, and peculiar Croatian legislation that motivated that in the first place, check it out.

Karpathy's Rapid Shift to 80% Agent-Driven Coding

Andrej Karpathy on AI-assisted coding:

I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. [...] It hurts the ego a bit

Andrej was recently on record (on Dwarkesh podcast) saying AI coding agents were not good enough for non-boilerplate coding tasks, which makes the new post so surprising (and yet not, to people watching closely what's been happening with the coding agents).

Anecdotally, I've been hearing the same from many senior software developers who know their stuff (ie. aren't just blinded by hype). The development workflows are being structurally redefined very quickly.

In my recent talk on AI-assisted software engineering I tongue-in-cheek joked that the information has a “best before” date of a couple of months. Seems right on spot ... I'll need to update the talk (again).

Security Warning: Don't Run Clawdbot/OpenClaw Without Precautions

Public Service Announcement: Do not run Clawdbot (OpenClaw), unless you really, really, really know what you're doing.

Clawdbot (now renamed to OpenClaw to avoid infringing Antrhopic's Claude trademark) is an AI assistant that you can hook up to your email, whatsapp, telegram, files, and let it, ...well, assist you.

There's a huge hype building up around it, so you might be tempted to try it out (maybe on a spare computer, as a precaution). Don't – not in its current state.

The thing basically runs “YOLO” on all your data and can act without your permission. This is extremely dangerous, as it is trivial to do major damage (intentionally or not) using it.

In the words of Simon Willison:

This project terrifies me. On the one hand it really is very cool, and a lot of people are reporting great results using it. But it's an absolute perfect storm for prompt injection and lethal trifecta attacks. People are hooking this thing up to Telegram and their private notes and their Gmail and letting it loose. I cannot see any way that doesn't end badly.

We are certain to hear about some major security problems from people using this.

If you really know what you're doing, and properly manage access to your resources, sure, you can check it out. But most will just hook it up to everything and basically play russian roulette with their data. Even if run on a separate computer, if you give it access to your mail, messaging, and cloud files, you can still be royally screwed.

AI assistants (ChatGPT, Claude, or home-grown rigs) have been able to do this for months (or even years) now, but until now we've collectively been mostly careful enough about giving them access permissions. There have been whole startups, such as Arcade.dev built around this.

Clawd basically throws all that caution away in a “look, no hands!” move.

New Open Models: Kimi K2.5, Qwen3-Max-Thinking, Trinity Large, Z-Image

With Kimi K2.5, Qwen3-Max-Thinking Trinity Large and Z-Image, this has been an interesting week for open AI models:

Kimi K2.5 is an upgrade from K2 by pretraining it with additional ~15T visual and text tokens. It reportedly improves coding and vision capabilities and supports “agent swarm” operation (many agents collaborating on a task).

The Kimi team also released a coding app (ala Claude Code / Codex) and a mobile app.

Qwen3-Max-Thinking is sadly not an open model, but it's still a notable update (and I hope there's going to be a small, open, distilled version in the near future). Also an improvement on the existing Quen3-Max by scaling up the number of parameters and additional RL.

Z-Image is an image generation model from Alibaba. A month ago they released a small ~6B version (Z-Image-Turbo) and now they have an update for the main (large) model.

Not to be outdone by the Chinese labs, the US lab Arcee AI released Trinity Large, a new open model with 400B parameters (13B active – 4 out of 256 experts in use). An annoucement blog post contains many technical details.

The SpaceX-xAI Merger: Untangling the Financial Logic

Startups, tech, financial, AI, and buble WTF of the week, all rolled into one: SpaceX in talks to merge with xAI.

Okay, let's unparse this:

SpaceX is in the business of building and launching rockets, and providing satellite internet. It's profitable, held in high regard, dominates over other launch providers, Starlink is a major success, and the company is on its way to IPO (and Mars, albeit that's still in the far future).

xAI is the controversial company behind Grok and tied to X (ex Twitter).

Both are private companies, controlled by their majority owner Elon Musk. Besides that, they have nothing else in common.

The rumored intention is to offload the huge debt that Musk initially took on for buying Twitter, first to xAI (diluting it in the process through other investors in xAI), then to SpaceX, which could finally wash it clean through its IPO, without Musk having to sell a bunch of his Tesla shares and crashing their price.

BTW: since both are controlled by the same person, what's there to be “in talks” for?

GPT-5.3-Codex and Claude Opus 4.6: Incremental but Notable Updates

OpenAI GPT-5.3-Codex and Anthropic Claude Opus 4.6 are here.

Codex is coding-optimized version of GPT-5.3: the announcement post showcases a (rudimentary) 3D car chase game built completely by Codex.

Opus 4.6 is an incremental update to the general purpose Opus model: major improvements is context window size (1M tokens, up from 256K) and tweakable effort (low,normal,high,max). Both the large context window and max effort are currently only available via the API (not in Claude Code).

My initial impressions are both are incremental updates over existing models. We'll see if there are any noticable improvements in long coding sessions (especially with 1M context) in the following weeks.

Since all new models can knock out a fairly good Minesweeper clone, this year I'm upping the stakes for my coding tests: the task is to create a minimalistic version of a Real Time Strategy (RTS) game – think WarCraft, StartCraft or C&C.

No combat, enemy AI (heh...) or scenario objectives yet ... but it's a good start!

Codex 5.3 vs Opus 4.6

Left: Codex 5.3; Right: Opus 4.6

A Guide to Agentic Programming for Skeptics

This post details how Mitchell (creator of Vagrant, Terraform, and Ghostty) went from not really being impressed with AI coding performance, to using it constantly as a no-brainer.

If you're a software engineer and have doubts of usefulness of using AI in your coding workflow, the post doubles as a good, zero-hype guide how to try it out and what to expect.

tl;dr: it takes effort and willingless to “waste” time until you get proficient and find a sweet spot.

Letting Claude Read My Email — and Trying to Prompt-Inject Myself

Claude reading email

I won't be installing OpenClaw any time soon, but I did let Claude read my email, just to see what would happen:

While everyone is focusing on OpenClaw, a viral “yolo” bot, the underlying magic is largely due to a bunch of useful tools allowing you to connect to your data and communication channels. Many of these are implemented as simple command-line utilities (since modern LLMs can use them really effectively, better than MCPs).

One of these is https://gogcli.sh/ (open source, written by the OpenClaw author), a CLI client for google apps (gmail, docs, calendar, ...). This is useful by itself, as can be used in various scripts or other custom automation without messing around with OAuth.

Once I got this set up, I wanted to see how (regular) Claude Code could use it, and indeed it works pretty well in that setup. Of course, it's not very useful if I have to tell Claude to check for email or send a message (a few clicks in the browser would do that faster), but opens more room for careful tinkering, without going full-in with OpenClaw.

The next thing I tried was to prompt-inject myself! I tried to get Claude to interpret the instructions in the email instead of just summarizing it to me. It didn't work! At least for this most basic prompt injection attack, Claude was clever enough to spot and ignore it. You can see the result in the screenshot below.

This should not be taken as proof that LLMs are immune to prompt injection attacks: if I tried a bit harder I might have constructed one that worked. But they're not as trivially susceptible as one may believe.

Fun times!

AI-Assisted Coding Talk Now on YouTube

A few months ago I held a talk on AI-assisted coding at a few venues. In one slide I tongue-in-cheek added a “best-before” date: January 2026. Turns out, that date was pretty spot-on.

The bleeding edge has shifted so much in the past three months that half of the talk might already be obsolete. What's not obsolete is the quest for quality, accountability and open-minded exploration of new tools.

I've just posted a recorded version to YouTube.

I've been talking to a lot of folks lately on how they're adapting the newest AI capabilities in their software development teams and will probably have a major update in a few weeks. If you're interested to hear me talk about it at your meetup or company, let me know.

A $46K Vercel Bill That Could Have Been $100 on Hetzner

Vercel bill

This could've been a $100 Hetzner bill:

This is static data. No writes. Trivially cacheable.

AI Coding and the Risk of Overwork

“The AI vampire” by Steve Yegge chimes with my own feelings: it's very easy to get overworked using AI.

Steve's got a flamboyant writing style and his Gas Town approach to future of software engineering is controversial to say the least, but I think there's something to the problems he's describing.

Namely: AI coding pulls you in, you feel like you've got superpowers and could (vibe)code for hours on end. The feeling's been well documented by others like Armin Ronacher (whom I referenced a few times before) and Peter Steinberger (creator of Clawdbot/OpenClaw).

“Just one more prompt...” is an irresistible siren's call, but burning candles on both ends isn't sustainable and will surely result in burnout. This isn't healthy for an individual, and isn't productive for their employer.

I share Steve's fear that productivity expectations will rise throughout the industry, increasing pressure from the top. We're already flooded with 996 and war stories founders and employees pushing themselves far beyond the limits. Crunch death marches used to be a staple of game dev studios, now they're almost a badge of honor in startup circles. AI-assisted coding just adds the fuel to this fire.

In my own experience, as I get older I find that after 6 hours of focused work my mind turns to mush and I'm a zombie for the rest of the day (except I drawl “chocolateee” instead of “brainzzz”). Sure I can “sprint” more – for a couple of days, or weeks. Not for months, not permanently, not without serious metal, emotional, and physical consequences.

With AI doing many of the routine tasks and driving just one agent, I can now breeze past my limit with no sweat – it's better than caffeine! But we're encouraged, motivated and (soon?) expected to multitask. What happens when managing 10 agents in parallel becomes just “meets expectations”?

All of which is to say – I agree with Steve that maybe we should take this productivity improvement opportunity to reflect how we're spending our time, remove our foot from the gas pedal, and – at the risk of sounding too European – dial back a bit.

Analyzing curl|bash Installers with LLMs

I'm dismayed by the normalization of curl | bash pattern for software installers. This downloads and executes an installer from the internet and entails a bunch of risks:

  • trust that the script author is not malicious
  • trust that the installer hasn't been hacked
  • hope you didn't misspell anything and executed a script from a phishing site
  • trust that there's no man-in-the-middle attacks
  • trust that the author's script will work well with your system and won't screw up anything

This gives a random piece of software from the internet the same level of trust you give to verified packages from your OS provider (eg. Debian packages), with no sandboxing (like Flatpak, Snap, or others).

In most cases, it's fine, you're installing software you trust anyways – but this holds true to most insecure practices! Telnet, HTTP, unencrypted passwords are mostly fine – until they aren't.

What's a dev to do? In lieu of pouring through hundreds of lines of creative Bash scripting myself, I created curl-bash-explain.dev, a handy tool for analyzing the script using LLMs.

For example, here's a step-by-step breakdown of what Claude Code installer is doing.

Anthropic's $30B Funding Round and the AI Revenue Race

Anthropic revenue growth

Anthropic has just closed new funding round ($30B investment on $380B valuation). What's more interesting is revenue growth, over 10x in the each of past three years, brigning them to ~$14B ARR today.

In comparison, OpenAI recently announced they've passed $20B ARR mark in 2025 (I would guest, near the end of the year). A year earlier Anthropic did $1B ARR, and OpenAI $6B ARR.

A few thoughts:

  • Anthropic hasn't caught up to OpenAI completely yet, but has shown to be capable competitor – it's not a one-horse race any more
  • ChatGPT seems to have more mindshare among non-techies, while Claude rules for devs: although that might change with Codex (from OpenAI) and Claude Cowork (Anthropic) becoming incresingly capable
  • OpenAI rolls out ads; Anthropic takes shots at them, so I'd expect no ads on Claude for the time being
  • These are revenue numbers (and forward looking ARRs, not even TTMs, to boot); nobody talks expenses

Are we in a bubble? Will it burst? When? Dunno.

All I know is, of those ARR figures, $1200 to Anthropic and $240 to OpenAI is from my pocket, and I don't expect that to decrease.

Velocity without understanding is not sustainable

Velocity without understanding is not sustainable.

This is a quote from a thought-provoking article on cognitive debt by Margaret-Anne Storey.

She defines cognitive debt as a side-effect of going faster than we can assimilate the knowledge. At some point, we “lose the plot” and the results are very similar to the consequences of technical debt.

How to guard against it? This is pretty much an open question. Some suggestions include:

  • require that at least one human in the team thoroughly understands any particular change and documents why and (high-level) what
  • detect cognitive debt (fear of change, not understanding the codebase, etc) before it becomes crippling

As AI-assisted coding gets more capable and accepted, the bottlenecks are reviewing the code, team dynamics, stakeholder alignment, and other non-coding challenges. Viewed through the lens of cognitive debt, we may want to keep some of these bottlenecks!

This is where the quote from the beginning comes in. I've seen many takes on AI software development productivity saying “coding was never the bottleneck”, but that's not exactly true.

Coding often is the bottleneck. Now that we can speed it up (at a cost), the question is, how fast should we go? “No faster than we can understand it” is a pretty good rule of thumb.

The Claude C Compiler

Recently, the Anthropic team released a working C compiler built (almost) entirely by AI. Chris Lattner, a compiler expert and the creator of LLVM, CLang and Swift, has just published a detailed review.

Chris believes the Claude C Compiler (CCC) is “real progres, a milestone for the industry. [...] CCC shows that AI systems can internalize the textbook knowledge of a field and apply it coherently at scale.” but also warns “I see nothing novel in this implementation.

His conclusion is optimistic for both software developers and AI:

AI coding is therefore best understood as another step forward in automation. It dramatically lowers the cost of implementation, translation, and refinement. As those costs fall, the scarce resource shifts upward: deciding what systems should exist and how software should evolve.

Pot, Kettle, Anthropic, DeepSeek

Recently, Anthropic reported they identified “large-scale distillation attacks” by the Chinese model makers, “extracting (Claude's) capabilities to train and improve their own models”. This announcement backfired.

For one, researchers have pointed out Claude can churn out large sections of books like Harry Potter, Game of Thrones, etc.

What's even more embarrasing is what Claude tells you when you ask it for its name in Chinese. Claude Sonnet 4.6, via the API, merrily reports it's ChatGPT or DeepSeek-V3, depending on the exact question (see image). This doesn't happen with English or Croatian.

Claude reporting as ChatGPT and DeepSeek

Whatever your stance on how copyright should apply to AI and legality of training on copyrighted materials, it's clear Anthropic has no moral high ground here:

  • either distillation and training are fair game, in which case it shouldn't complain
  • or they're not, in which case it's involved in massive IP theft
  • complaining DeepSeek ripped you off, but then self-reporting as DeepSeek, is some major hypocrisy

I love Anthropic as a company, use their AI models daily, and – for the record – think the current copyright system is a massive overreach and in dire need of major reform.

But this is simply embarrassing.

CroAI Code Club Meetup Recap

CroAI Code Club 1

Had a great time yesterday at the inaugural CroAI (Croatian AI Association) Code Club meetup!

Matija Stepanić and me ostensibly presented live vibe coding, but we also (intentionally) opened the floor for discussion from the get go, and heard many points of view, experiences, tips and tricks from the participants. We've learned as much from the audience as they from us!

Some takeaways:

  • use Project feature from Claude (or ChatGPT) for planning the project before you switch to coding
  • Matija intentionally lowered his Claude plan to check usage limits – tokens burn up really fast! this motivated a discussion about performance, quality and price tradeoffs for different models
  • I demoed Claude Code use to explain and work on a codebase I know nothing about (project is in Flutter and uses Firebase and I'm not familiar with either)
  • there are no best practices yet – we've heard so many different use cases! we're all collectively trying to figure this new agentic engineering thing

Big thanks for Valentina Zadrija for organizing everything and inviting us, Herman Zvonimir Došilović for kickstarting the whole thing, Matija for leading the charge yesterday, Wana Kiiru and the CroAI crew for the logistics, and everyone who came and participated.

Overall, great time, learned a lot, the feedback was very positive and multiple people stepped up to propose a talk at a future Code Club – already looking forward to the second one!

“You're Prompting It Wrong”: A Story, a Challenge, and an Offer

“You're prompting it wrong”: a story, a challenge, and an offer.

Recently I talked with a friend about the difficulties of automatically evaluating AI agents and he shared an example task for the agent: “go to website xyz, fetch the newest articles, and send the summaries of the top 5 most interesting ones to my mail”.

I pointed out a few problems with the prompt:

  • this is a straightforward deterministic procedure and LLMs are non-deterministic – it's better to ask AI to write a script to do this, than hope it will always properly adhere to the steps
  • “most interesting” is a judgement call, LLMs are notoriously bad at this, you'll get a biased random result instead – ask it to summarize every article and include in the email, and you will quickly determine what's interesting to you

This anecdote reminded me that the intuition of how and what to ask AI is something you have to practice to get. It looks easy but if you don't have the intuition, you'll get random quality results.

If you're bad at it due to lack of practice, it's easy to dismiss it as “AI is stupid” or “am I stupid?”. Neither. You just need practice and some guidance.

This brings me to my challenge and offer: if you're trying to get your AI to do something and it just seems dumb, yet feels like the modern ones should be able to do it, send me a DM and I'll debug it with you.

Let's see if we can improve your AI intuition!

(offer valid for problems you can explain in a couple of messages; for more complex matters I'm available for consultation:)

Like many developers, I find myself more and more using AI agents to help with software development.

I currently use Claude Code, the command line interface, together with Opus 4.5 (Anthropic's top model as of this writing). I use it to distill my rough task requirements into a detailed development plan, then implement the plan.

By default, Claude Code asks each time if it may read and write files and run software. This is sensible default configuration, but does get annoying after a time. Worse, it interrupts me often enough that I can't do much in parallel while babysitting it.

There's also a --dangerously-skip-permissions (a.k.a. “YOLO”) mode which will happily run anything without asking. This can be risky (although I know of some people that run it like that and still haven't destroyed their dev machines).

Sandboxing

The standard solution is to sandbox the agent – either on a remote machine (exe.dev, sprites.dev, daytona.io), or locally via Docker or other virtualization mechanism.

A lightweight alternative on Linux is bubblewrap, which uses Linux kernel features like cgroups and user namespaces to limit (jail) a process.

As it turns out, bubblewrap is a good solution for lightweight sandboxing of AI agents. Here's what I personally need from such a solution:

  • mimic my regular Linux dev machine setup (I don't want to manage multiple dev environment)
  • minimal/no access to information outside what's required for the current project
  • write access only to the current project
  • can directly operate on the files/folders of the project so I can easily inspect or modify the same files from my IDE or run the code myself
  • network access – both to connect to AI providers and search the internet, and to be able to start a server that I can connect to

Bubblewrap and Docker are not hardened security isolation mechanisms, but that's okay with me. I'm not really concerned about the following risks:

  • escape via zero-day Linux kernel bug
  • covert side channel communications
  • exfiltration of data from current project (including project-specific access keys)
  • screwing up the codebase (the code is managed via git and backed up at GitHub or elsewhere)

The last bit is tricky, but even full remote sandboxes can't protect against that. In theory, we could have transparent API proxies that would inject proper access keys without the AI agent ever being aware of it, but this is really non-trivial to set up right now.

An alternative is to contain potential damage by creating project-specific API keys so at least the blast area is minimal if those keys are leaked.

In practice

Here's how my bubblewrap sandbox script looks:

#!/usr/bin/bash

exec 3<$HOME/.claude.json

exec /usr/bin/bwrap \
    --tmpfs /tmp \
    --dev /dev \
    --proc /proc \
    --hostname bubblewrap --unshare-uts \
    --ro-bind /bin /bin \
    --ro-bind /lib /lib \
    --ro-bind /lib32 /lib32 \
    --ro-bind /lib64 /lib64 \
    --ro-bind /usr/bin /usr/bin \
    --ro-bind /usr/lib /usr/lib \
    --ro-bind /usr/local/bin /usr/local/bin \
    --ro-bind /usr/local/lib /usr/local/lib \
    --ro-bind /opt/node/node-v22.11.0-linux-x64/ /opt/node/node-v22.11.0-linux-x64/ \
    --ro-bind /etc/alternatives /etc/alternatives \
    --ro-bind /etc/resolv.conf /etc/resolv.conf \
    --ro-bind /etc/profile.d /etc/profile.d \
    --ro-bind /etc/bash_completion.d /etc/bash_completion.d \
    --ro-bind /etc/ssl/certs /etc/ssl/certs \
    --ro-bind /etc/ld.so.cache /etc/ld.so.cache \
    --ro-bind /etc/ld.so.conf /etc/ld.so.conf \
    --ro-bind /etc/ld.so.conf.d /etc/ld.so.conf.d \
    --ro-bind /etc/localtime /etc/localtime \
    --ro-bind /usr/share/terminfo /usr/share/terminfo \
    --ro-bind /usr/share/ca-certificates /usr/share/ca-certificates \
    --ro-bind /etc/nsswitch.conf /etc/nsswitch.conf \
    --ro-bind /etc/hosts /etc/hosts \
    --ro-bind /etc/ssl/openssl.cnf /etc/ssl/openssl.cnf \
    --ro-bind /usr/share/zoneinfo /usr/share/zoneinfo \
    --ro-bind $HOME/.bashrc $HOME/.bashrc \
    --ro-bind $HOME/.profile $HOME/.profile \
    --ro-bind $HOME/.gitconfig $HOME/.gitconfig \
    --ro-bind $HOME/.local $HOME/.local \
    --bind $HOME/.claude $HOME/.claude \
    --bind $HOME/.cache $HOME/.cache \
    --file 3 $HOME/.claude.json \
    --bind "$PWD" "$PWD" \
    claude --dangerously-skip-permissions $@

If this looks rather idiosyncratic, it's because it is. Rather than using some generic rules, I experimented with bwrap until I found minimal configuration that I need to set up for my system.

Some interesting stuff:

  • /tmp, /proc and /dev are automatically handled by bwrap
  • I bind-mount (ie. expose) files and directories under the same path as local machine, so there's no difference in file locations, project paths, etc
  • I don't expose entire /etc, just the bare minimum
  • The content of $HOME/.claude.json is injected into the sandbox so any changes there won't get saved to the real one
  • The content of $HOME/.claude/ directory is mapped read-write, so Claude can save and modify files there (such as session data)
  • /opt/node/node-v22.11.0-linux-x64/ is my custom nodejs install location
  • I change the hostname so it's easy to distinguish between the host and sandbox

I will probably be tweaking the script as needed, but this is a pretty good starting point for me.

How to customize

If you want to adapt this to another AI agent or to your system, my suggestion is to tweak the script to run bash instead, then run your agent manually, see what breaks and tweak as appropriate.

A useful command for this is strace, which can trace file access system calls so you can see what's needed:

strace -e trace=open,openat,stat,statx,access -o /tmp/strace.log codex

Inspecting the log you can spot which files are needed and bind them as needed.

Recap of my short posts on LinkedIn and elsewhere in January

Truly Open Source AI: OLMo 3 and Nemotron 3 Nano

The past few weeks have been good for open source AI, with the recent releases of Olmo 3 (by Ai2) and Nemotron 3 Nano (NVidia).

Both are truly open source: not only the weights (needed to “run” the model, akin to compiled code), model reports, and the source code, but also the training data and code required to re-train them from scratch, under a permissive license.

Olmo 3 comes in 7B and 32B sizes and several variants: Base, Instruct (focus on quick responses, multi-turn chat, instruction following, tool use), Think (long reasoning chains of thought), and RL Zero (reinforcement learning directly on top of the base model.

Nemotron 3 Nano is a Mixture-of-Experts (MoE) model with 3.2B active, 32B total parameter model (30B-A3B).

Both perform better than Qwen3 32B-A3B and GPT-OSS 20B (which are not fully open – just have the weights available).

What does this mean for you?

If you're an LLM researcher, this is super-useful information, data, and insight – and you probably already know everything about these releases!).

If you're an FOSS enthusiast, rejoice at truly open models becoming viable for everyday use (even if they fall short from the big guns).

If you use local AI, you probably don't care about all the training details, but you do get more options for running on premises or on-device, which is always good.

If you're using AI products or integrate with LLMs over API, you probably don't care much, but I hope the post was interesting.

If you're none of the above – I mean I appreciate it and am thankful, but why are you reading this?! Do let me know in the comments why, I'm curious :)

(As a software developer, I couldn't write the above English-language SWITCH statement without a default/else catchall block... devs will understand).

AI Model Releases: Gemini 3 Flash, GPT Image 1.5, SAM Audio, and SHARP

A bunch of interesting AI model releases this week! If you've got the need for (AI) speed, want to create images, do 2D->3D or analyze audio, here's new stuff to play with:

Google released Gemini 3 Flash, a speedier version of its (also recently released) Gemini 3 Pro. By all accounts, it's quite good while being noticably faster and 4x cheaper.

OpenAI released GPT Image 1.5. While not as powerful as Gemini Image Pro (a.k.a Nano Banana Pro), it's a significant improvement over the previous GPT Image version. My personal favorite is the “graphite pencil sketch” preset. Already integrated in ChatGPT.

Meta released SAM Audio, an Audio-capable variant of their Segment Anything Model that can now also pinpoint and isolate audio samples. I was impressed by the ability to click on an object on the video to isolate its sound (integration of video and audio segmentation).

Apple releases a fast on-device vision model, SHARP, that can take a single image and produce a (faux) 3D view suitable for eg. 3D vision goggles in real time. While it doesn't do a 3D scene reconstruction (like MAST3R & friends), it's probably useful where you want to have (subjective) 2d-to-3d effect, fast, on-device.

OpenAI Launches ChatGPT App Store with MCP-Powered Plugins

OpenAI has opened its ChatGPT app store for outside submissions. ChatGPT apps are plugins (widgets) that users can tag and use directly from the conversation without leaving ChatGPT.

Apps are MCP-powered and represent a second try on the “Custom GPTs” attempt from a few years ago that hasn't become very popular. Beyond the usual MCP goodies, apps can insert (simple) UI (widgets) right into the chat.

To use an app within ChatGPT, you need to “connect” it with using the standard OAuth authorization flow, after which you can @ tag them in the chat.

For developers, OpenAI has published some guidelines on building good experiences.

It'll be interesting to see if this grows into a full app store in the future. Maybe with payments? Monetization is one of the unsolved problems for ChatGPT. Charging a fee or % is an obvious move, though it isn't obvious if that would really work and move the needle.

So far, the new ChatGPT Apps feature looks like a promising start.

Andrej Karpathy's 2025 LLM Year in Review

Andrej Karpathy posted a 2025 LLM year in review.

Andrej listed a few of notable and mildly surprising paradignm changes according to his (very well informed!) opinion:

  1. Reinforcement Learning from Verifiable Rewards (RLVR) – the next scaling frontier
  2. Jagged Intelligence – we're not reimplementing human/animal intelligence, so the strengths and weaknesses don't match – it's just different
  3. New layer of LLM apps – “LLM wrappers” like Cursor show there's a lot of value in properly orchestrating and integrating the LLMs for a specific vertical
  4. AI that lives on your computer – Claude Code as the first convincing demonstration of what a real AI agent looks like
  5. Vibe Coding – will terraform software and alter job descriptions
  6. LLM GUI – chat interface is the worst computer interface, but we're still learning how to do better

Andrej's articles (& videos) are always worth a read (& watch) and this one is no exception. If you haven't been following the AI hype closely (tell me how you managed that feat!), this is a great no-nonsense overview.

How Much Energy Does AI-Assisted Coding Consume?

How much energy does a typical AI coding session consume?

Typical AI query is estimated to consume ~0.3Wh, but agentic coding is far from typical AI chatbot query, spending 10x-100x more tokens (and energy).

In a great post diving into the actual costs, the author analyzes his typical coding session and typical day (full day of coding, several agents in parallel) to arrive at 1.3kWh per day of work.

I ran the calculation with my own numbers and came up at ~18kWh per month, but that's not full time use – a few hours per day, and not every workday.

The cost is not insignificant, but I'd say it's comparable to using other useful technology such as dishwasher or fridge. In my case at least, the cost (in dollar and kWh terms), is worth it.

Google's Search Monopoly and the Web Crawling Problem

A perspective from Kagi (alternative, paid, no-ads search engine) on how Google keeps its search monopoly, why it's bad, and what to do about it.

A few days ago I wrote an article on a similar topic – how AI scraping is a problem because of the lack of unified common public corpus and shared some ideas on how to solve it.

In both cases, the technical challenges, while formidable, are solvable. The blocker is the legal framework and practice. In a nutshell, Google used a ladder, then kicked it down, and nobody can follow.

Learning Zig

I've decided to learn Zig over the Christmas holidays.

Zig is a low-level systems language with explicit memory management and error handling, alongside a sophisticated compile-time (comptime) functionality. It's roughly in the same niche as Rust.

Why Zig

Why not Rust?

I've started, and abandoned, learning Rust several times over the years. I just don't like it — this is a personal, subjective preference. I don't have anything against the language, I just don't like to use it. Rust has been touted as “a better C++” and in my (admittedly limited) experience, that's exactly right — and that's just the problem.

Personally, I like small languages, with minimal surface area, that keep things (mostly) explicit. Languages like C, Go, Scheme, or Python. I dislike large languages with complex, often implicit, effects, like C++, Rust, Common Lisp, or Haskell.

I'm happy with my choice of Python and Go. I haven't used Scheme in a long time (since R6RS!), because the batteries-included aspect of Python (and increasingly Go) just trounces it. And while C is still the lingua franca (literally: most other languages interop using C ABI), it shows its age, especially around (non)safety and minimalistic standard library.

Zig looks like it might fit my preferences perfectly.

I first noticed Zig a couple of years ago. I didn't really have the need to learn it yet, but figured it'd be a fun thing to do over the holidays!

Hello World

Here's a hello world in Zig:

// "std" is the complete Zig standard library
const std = @import("std");

// Defining a public entry point function that returns
// nothing (void) or an error (that's the ! part)
pub fn main() !void {
    // init a "writer" object (struct) with empty buffer (.{}) over stdout
    var w = std.fs.File.stdout().writer(&.{});
    const stdout = &w.interface;

    // format the message using provided tuple and print it
    // "try" doesn't catch, it immediately returns error to caller
    try stdout.print("Hello {s}!\n", .{"world"});
}

The C equivalent is actually shorter:

#include <stdio.h>

void main() {
    printf("Hello %s!\n", "world");
}

Hello world is too small, so it doesn't touch on the memory management, but even in this small example there are some benefits:

  • In C, you have to know printf is from stdio, while std... is explicit in Zig.
  • I have to explicitly handle the error (by choosing to propagate it further with try) in Zig, while I can happily ignore any runtime issues with printf in C.
  • Format string and arguments are statically type-checked in Zig. C allows you to pass garbage data.

Things get more interesting when memory management comes into play. To give you a hint:

const std = @import("std");

pub fn main() !void {
    // Initialize the memory allocator
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    // At the end