Senko Rašić

Random thoughts | about

Recap of my short posts on LinkedIn in February

AI Slop in Content Writing

Dear bloggers, content writers, commentators and social media managers: I know you like and use AI. For real. It's screamingly obvious. It's so obvious it screams slop – and the blame isn't on AI, it's squarely on you.

Even if the underlying idea or thought is your original, when you apply AI lipstick to it, you sabotage your own reputation. When someone sees an obvious AI post, they immediately discount everything you're trying to convey, even if it does have merit on its own.

Fortunately, this is easy to fix. Here are a few obvious tells that AI shadow-wrote it for you:

  1. You copy-pasted the “Would you like me to expand on this?” followup ChatGPT gives you without even reading the content. This is not just a red flag, this is cause for excommunication!
  2. Your post is full of em-dashes. Cm'on, you never even heard of it before, admit it.
  3. Lose the emoji. It was tone-deaf even before it became a sure sign of LLM authorship.
  4. The “It's not X. It's Y” contrasts are sometimes needed, but AI dials that to eleven.
  5. While I'm at it, read your text aloud: if a friend would ask for your health to hear you utter that, reword it.
  6. In general, if your text sounds like a TED talk, that's a bad sign, even if you 100% wrote it manually.
  7. Thank God Ghibli memes are out of fashion, but if I had a dime every time I see an image of a laptop with screen on the outside lid, or an unmistakable GPT-Image-style cartoon, I wouldn't need to be on LinkedIn anymore.
  8. Yes, ChatGPT and Gemini can do infographics. The results are crowded, hard to read, and boring to boot. I'd rather suffer through an emoji-riddled listicle instead.
  9. If you profess to be an AI expert and offer tips, tricks, workshops or prompt secrets to other, the above applies doubly to you. Low effort means not only you generate slop, you also sell slop.

There are better ways to leverage this wonderful new tech, that don't insult your readers' intelligence.

On the Slow Death of Scaling

On the slow death of scaling is an interesting essay about alternatives to “bigger is better” approaches in modern AI research.

It's a nuanced one and easy to misread: if you're an AI believer it's easy to retort with “scaling just shifted to inference time!”, and if you're a doomer you can point and say “see, exponential cost and environment impact for diminishing returns!” It doesn't say either of these things.

The “scaling laws” or “the bitter lesson”, or “when in doubt, use brute force” refer to the fact that it's often better and easier to solve the problem by applying a bigger hammer (or a graphics card, or a data center).

The essay just states it's not always the case, documents smaller LLMs that obviously outperform larger ones and lists several areas where compound approach improvements end up being better than pure weights/data/compute scaling: better (synthetic) data, chain of thought, distillation, reasoning, tools, RAG, agents...

I read it as an optimistic look into the future where, free from “just increase the size by 10x” arguments, researches can invent even better ways of doing AI.

But don't take my word for it – the essay is an easy read, no hard math, and only 12 pages long (the rest are references). Worth your time if you're into this stuff.

AI Coding, Mediocrity and the Elephant in the Room

Earlier today I had a chat with a friend (also a seasoned senior developer) about the future of coding (in the next year or so) and the implications for software quality.

We're all mostly concerned with whether AI can match human developers in terms of software quality, but the elephant in the room is our assumption that most code today is of good quality. And, to be frank, about the skill level of a median developer.

Median developer is mediocre by definition, and half are even worse than that!

Between the two of us and over many years, my friend and I saw a lot of code in various companies all over the world, from scrappy startups to BigCos, written by many different people.

Large amounts of said code were human-generated slop slapped together by mediocre coders, who weren't really interested in crafting beautiful art, and/or had tight deadlines and uptight bosses who wouldn't let them even if they had an inclination to.

When we talk about AI not being able to match the art and ingenuity of expert developers with lot of time on their hands, we raise the bar several notches higher than we hold it for a large number of human coders.

AI may not top the results of “A players” or “10x devs” or “top performers” (as startup gurus like to call the best software engineers out there), but they probably match “B players”, “1x devs” and “meets expectations” at 1% the price, and are already better than “C players”, “0.1x devs” or “needs improvement” coders.

This is not to insult anyone (or everyone), but to remove our rose-tinted nostalgia glasses looking at humans as somehow all being masters of their craft and bursting with creativity, skill and inspiration all the time.

BTW this applies wider than coding. I laugh sardonically every time I hear or read about “AI slop” on the internet, like we haven't had to endure decades of “human slop”, be it in form of text, video or code.

If AI slop is the death of the internet, we've been hooked to a zombie for a time now.

Cijene API Talk: Scraping, Billions of Prices, and Croatian Law

Last year I gave a few talks about Cijene API, a daily price aggregator API for Croatian retail prices.

I've now uploaded a standalone version of the talk to YouTube.

If you're interested in war stories of data scraping and validation, managing billions of prices on a single server, and peculiar Croatian legislation that motivated that in the first place, check it out.

Karpathy's Rapid Shift to 80% Agent-Driven Coding

Andrej Karpathy on AI-assisted coding:

I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December. [...] It hurts the ego a bit

Andrej was recently on record (on Dwarkesh podcast) saying AI coding agents were not good enough for non-boilerplate coding tasks, which makes the new post so surprising (and yet not, to people watching closely what's been happening with the coding agents).

Anecdotally, I've been hearing the same from many senior software developers who know their stuff (ie. aren't just blinded by hype). The development workflows are being structurally redefined very quickly.

In my recent talk on AI-assisted software engineering I tongue-in-cheek joked that the information has a “best before” date of a couple of months. Seems right on spot ... I'll need to update the talk (again).

Security Warning: Don't Run Clawdbot/OpenClaw Without Precautions

Public Service Announcement: Do not run Clawdbot (OpenClaw), unless you really, really, really know what you're doing.

Clawdbot (now renamed to OpenClaw to avoid infringing Antrhopic's Claude trademark) is an AI assistant that you can hook up to your email, whatsapp, telegram, files, and let it, ...well, assist you.

There's a huge hype building up around it, so you might be tempted to try it out (maybe on a spare computer, as a precaution). Don't – not in its current state.

The thing basically runs “YOLO” on all your data and can act without your permission. This is extremely dangerous, as it is trivial to do major damage (intentionally or not) using it.

In the words of Simon Willison:

This project terrifies me. On the one hand it really is very cool, and a lot of people are reporting great results using it. But it's an absolute perfect storm for prompt injection and lethal trifecta attacks. People are hooking this thing up to Telegram and their private notes and their Gmail and letting it loose. I cannot see any way that doesn't end badly.

We are certain to hear about some major security problems from people using this.

If you really know what you're doing, and properly manage access to your resources, sure, you can check it out. But most will just hook it up to everything and basically play russian roulette with their data. Even if run on a separate computer, if you give it access to your mail, messaging, and cloud files, you can still be royally screwed.

AI assistants (ChatGPT, Claude, or home-grown rigs) have been able to do this for months (or even years) now, but until now we've collectively been mostly careful enough about giving them access permissions. There have been whole startups, such as Arcade.dev built around this.

Clawd basically throws all that caution away in a “look, no hands!” move.

New Open Models: Kimi K2.5, Qwen3-Max-Thinking, Trinity Large, Z-Image

With Kimi K2.5, Qwen3-Max-Thinking Trinity Large and Z-Image, this has been an interesting week for open AI models:

Kimi K2.5 is an upgrade from K2 by pretraining it with additional ~15T visual and text tokens. It reportedly improves coding and vision capabilities and supports “agent swarm” operation (many agents collaborating on a task).

The Kimi team also released a coding app (ala Claude Code / Codex) and a mobile app.

Qwen3-Max-Thinking is sadly not an open model, but it's still a notable update (and I hope there's going to be a small, open, distilled version in the near future). Also an improvement on the existing Quen3-Max by scaling up the number of parameters and additional RL.

Z-Image is an image generation model from Alibaba. A month ago they released a small ~6B version (Z-Image-Turbo) and now they have an update for the main (large) model.

Not to be outdone by the Chinese labs, the US lab Arcee AI released Trinity Large, a new open model with 400B parameters (13B active – 4 out of 256 experts in use). An annoucement blog post contains many technical details.

The SpaceX-xAI Merger: Untangling the Financial Logic

Startups, tech, financial, AI, and buble WTF of the week, all rolled into one: SpaceX in talks to merge with xAI.

Okay, let's unparse this:

SpaceX is in the business of building and launching rockets, and providing satellite internet. It's profitable, held in high regard, dominates over other launch providers, Starlink is a major success, and the company is on its way to IPO (and Mars, albeit that's still in the far future).

xAI is the controversial company behind Grok and tied to X (ex Twitter).

Both are private companies, controlled by their majority owner Elon Musk. Besides that, they have nothing else in common.

The rumored intention is to offload the huge debt that Musk initially took on for buying Twitter, first to xAI (diluting it in the process through other investors in xAI), then to SpaceX, which could finally wash it clean through its IPO, without Musk having to sell a bunch of his Tesla shares and crashing their price.

BTW: since both are controlled by the same person, what's there to be “in talks” for?

GPT-5.3-Codex and Claude Opus 4.6: Incremental but Notable Updates

OpenAI GPT-5.3-Codex and Anthropic Claude Opus 4.6 are here.

Codex is coding-optimized version of GPT-5.3: the announcement post showcases a (rudimentary) 3D car chase game built completely by Codex.

Opus 4.6 is an incremental update to the general purpose Opus model: major improvements is context window size (1M tokens, up from 256K) and tweakable effort (low,normal,high,max). Both the large context window and max effort are currently only available via the API (not in Claude Code).

My initial impressions are both are incremental updates over existing models. We'll see if there are any noticable improvements in long coding sessions (especially with 1M context) in the following weeks.

Since all new models can knock out a fairly good Minesweeper clone, this year I'm upping the stakes for my coding tests: the task is to create a minimalistic version of a Real Time Strategy (RTS) game – think WarCraft, StartCraft or C&C.

No combat, enemy AI (heh...) or scenario objectives yet ... but it's a good start!

Codex 5.3 vs Opus 4.6

Left: Codex 5.3; Right: Opus 4.6

A Guide to Agentic Programming for Skeptics

This post details how Mitchell (creator of Vagrant, Terraform, and Ghostty) went from not really being impressed with AI coding performance, to using it constantly as a no-brainer.

If you're a software engineer and have doubts of usefulness of using AI in your coding workflow, the post doubles as a good, zero-hype guide how to try it out and what to expect.

tl;dr: it takes effort and willingless to “waste” time until you get proficient and find a sweet spot.

Letting Claude Read My Email — and Trying to Prompt-Inject Myself

Claude reading email

I won't be installing OpenClaw any time soon, but I did let Claude read my email, just to see what would happen:

While everyone is focusing on OpenClaw, a viral “yolo” bot, the underlying magic is largely due to a bunch of useful tools allowing you to connect to your data and communication channels. Many of these are implemented as simple command-line utilities (since modern LLMs can use them really effectively, better than MCPs).

One of these is https://gogcli.sh/ (open source, written by the OpenClaw author), a CLI client for google apps (gmail, docs, calendar, ...). This is useful by itself, as can be used in various scripts or other custom automation without messing around with OAuth.

Once I got this set up, I wanted to see how (regular) Claude Code could use it, and indeed it works pretty well in that setup. Of course, it's not very useful if I have to tell Claude to check for email or send a message (a few clicks in the browser would do that faster), but opens more room for careful tinkering, without going full-in with OpenClaw.

The next thing I tried was to prompt-inject myself! I tried to get Claude to interpret the instructions in the email instead of just summarizing it to me. It didn't work! At least for this most basic prompt injection attack, Claude was clever enough to spot and ignore it. You can see the result in the screenshot below.

This should not be taken as proof that LLMs are immune to prompt injection attacks: if I tried a bit harder I might have constructed one that worked. But they're not as trivially susceptible as one may believe.

Fun times!

AI-Assisted Coding Talk Now on YouTube

A few months ago I held a talk on AI-assisted coding at a few venues. In one slide I tongue-in-cheek added a “best-before” date: January 2026. Turns out, that date was pretty spot-on.

The bleeding edge has shifted so much in the past three months that half of the talk might already be obsolete. What's not obsolete is the quest for quality, accountability and open-minded exploration of new tools.

I've just posted a recorded version to YouTube.

I've been talking to a lot of folks lately on how they're adapting the newest AI capabilities in their software development teams and will probably have a major update in a few weeks. If you're interested to hear me talk about it at your meetup or company, let me know.

A $46K Vercel Bill That Could Have Been $100 on Hetzner

Vercel bill

This could've been a $100 Hetzner bill:

This is static data. No writes. Trivially cacheable.

AI Coding and the Risk of Overwork

“The AI vampire” by Steve Yegge chimes with my own feelings: it's very easy to get overworked using AI.

Steve's got a flamboyant writing style and his Gas Town approach to future of software engineering is controversial to say the least, but I think there's something to the problems he's describing.

Namely: AI coding pulls you in, you feel like you've got superpowers and could (vibe)code for hours on end. The feeling's been well documented by others like Armin Ronacher (whom I referenced a few times before) and Peter Steinberger (creator of Clawdbot/OpenClaw).

“Just one more prompt...” is an irresistible siren's call, but burning candles on both ends isn't sustainable and will surely result in burnout. This isn't healthy for an individual, and isn't productive for their employer.

I share Steve's fear that productivity expectations will rise throughout the industry, increasing pressure from the top. We're already flooded with 996 and war stories founders and employees pushing themselves far beyond the limits. Crunch death marches used to be a staple of game dev studios, now they're almost a badge of honor in startup circles. AI-assisted coding just adds the fuel to this fire.

In my own experience, as I get older I find that after 6 hours of focused work my mind turns to mush and I'm a zombie for the rest of the day (except I drawl “chocolateee” instead of “brainzzz”). Sure I can “sprint” more – for a couple of days, or weeks. Not for months, not permanently, not without serious metal, emotional, and physical consequences.

With AI doing many of the routine tasks and driving just one agent, I can now breeze past my limit with no sweat – it's better than caffeine! But we're encouraged, motivated and (soon?) expected to multitask. What happens when managing 10 agents in parallel becomes just “meets expectations”?

All of which is to say – I agree with Steve that maybe we should take this productivity improvement opportunity to reflect how we're spending our time, remove our foot from the gas pedal, and – at the risk of sounding too European – dial back a bit.

Analyzing curl|bash Installers with LLMs

I'm dismayed by the normalization of curl | bash pattern for software installers. This downloads and executes an installer from the internet and entails a bunch of risks:

  • trust that the script author is not malicious
  • trust that the installer hasn't been hacked
  • hope you didn't misspell anything and executed a script from a phishing site
  • trust that there's no man-in-the-middle attacks
  • trust that the author's script will work well with your system and won't screw up anything

This gives a random piece of software from the internet the same level of trust you give to verified packages from your OS provider (eg. Debian packages), with no sandboxing (like Flatpak, Snap, or others).

In most cases, it's fine, you're installing software you trust anyways – but this holds true to most insecure practices! Telnet, HTTP, unencrypted passwords are mostly fine – until they aren't.

What's a dev to do? In lieu of pouring through hundreds of lines of creative Bash scripting myself, I created curl-bash-explain.dev, a handy tool for analyzing the script using LLMs.

For example, here's a step-by-step breakdown of what Claude Code installer is doing.

Anthropic's $30B Funding Round and the AI Revenue Race

Anthropic revenue growth

Anthropic has just closed new funding round ($30B investment on $380B valuation). What's more interesting is revenue growth, over 10x in the each of past three years, brigning them to ~$14B ARR today.

In comparison, OpenAI recently announced they've passed $20B ARR mark in 2025 (I would guest, near the end of the year). A year earlier Anthropic did $1B ARR, and OpenAI $6B ARR.

A few thoughts:

  • Anthropic hasn't caught up to OpenAI completely yet, but has shown to be capable competitor – it's not a one-horse race any more
  • ChatGPT seems to have more mindshare among non-techies, while Claude rules for devs: although that might change with Codex (from OpenAI) and Claude Cowork (Anthropic) becoming incresingly capable
  • OpenAI rolls out ads; Anthropic takes shots at them, so I'd expect no ads on Claude for the time being
  • These are revenue numbers (and forward looking ARRs, not even TTMs, to boot); nobody talks expenses

Are we in a bubble? Will it burst? When? Dunno.

All I know is, of those ARR figures, $1200 to Anthropic and $240 to OpenAI is from my pocket, and I don't expect that to decrease.

CroAI Code Club Meetup Recap

CroAI Code Club 1

Had a great time yesterday at the inaugural CroAI (Croatian AI Association) Code Club meetup!

Matija Stepanić and me ostensibly presented live vibe coding, but we also (intentionally) opened the floor for discussion from the get go, and heard many points of view, experiences, tips and tricks from the participants. We've learned as much from the audience as they from us!

Some takeaways:

  • use Project feature from Claude (or ChatGPT) for planning the project before you switch to coding
  • Matija intentionally lowered his Claude plan to check usage limits – tokens burn up really fast! this motivated a discussion about performance, quality and price tradeoffs for different models
  • I demoed Claude Code use to explain and work on a codebase I know nothing about (project is in Flutter and uses Firebase and I'm not familiar with either)
  • there are no best practices yet – we've heard so many different use cases! we're all collectively trying to figure this new agentic engineering thing

Big thanks for Valentina Zadrija for organizing everything and inviting us, Herman Zvonimir Došilović for kickstarting the whole thing, Matija for leading the charge yesterday, Wana Kiiru and the CroAI crew for the logistics, and everyone who came and participated.

Overall, great time, learned a lot, the feedback was very positive and multiple people stepped up to propose a talk at a future Code Club – already looking forward to the second one!

“You're Prompting It Wrong”: A Story, a Challenge, and an Offer

“You're prompting it wrong”: a story, a challenge, and an offer.

Recently I talked with a friend about the difficulties of automatically evaluating AI agents and he shared an example task for the agent: “go to website xyz, fetch the newest articles, and send the summaries of the top 5 most interesting ones to my mail”.

I pointed out a few problems with the prompt:

  • this is a straightforward deterministic procedure and LLMs are non-deterministic – it's better to ask AI to write a script to do this, than hope it will always properly adhere to the steps
  • “most interesting” is a judgement call, LLMs are notoriously bad at this, you'll get a biased random result instead – ask it to summarize every article and include in the email, and you will quickly determine what's interesting to you

This anecdote reminded me that the intuition of how and what to ask AI is something you have to practice to get. It looks easy but if you don't have the intuition, you'll get random quality results.

If you're bad at it due to lack of practice, it's easy to dismiss it as “AI is stupid” or “am I stupid?”. Neither. You just need practice and some guidance.

This brings me to my challenge and offer: if you're trying to get your AI to do something and it just seems dumb, yet feels like the modern ones should be able to do it, send me a DM and I'll debug it with you.

Let's see if we can improve your AI intuition!

(offer valid for problems you can explain in a couple of messages; for more complex matters I'm available for consultation:)

Like many developers, I find myself more and more using AI agents to help with software development.

I currently use Claude Code, the command line interface, together with Opus 4.5 (Anthropic's top model as of this writing). I use it to distill my rough task requirements into a detailed development plan, then implement the plan.

By default, Claude Code asks each time if it may read and write files and run software. This is sensible default configuration, but does get annoying after a time. Worse, it interrupts me often enough that I can't do much in parallel while babysitting it.

There's also a --dangerously-skip-permissions (a.k.a. “YOLO”) mode which will happily run anything without asking. This can be risky (although I know of some people that run it like that and still haven't destroyed their dev machines).

Sandboxing

The standard solution is to sandbox the agent – either on a remote machine (exe.dev, sprites.dev, daytona.io), or locally via Docker or other virtualization mechanism.

A lightweight alternative on Linux is bubblewrap, which uses Linux kernel features like cgroups and user namespaces to limit (jail) a process.

As it turns out, bubblewrap is a good solution for lightweight sandboxing of AI agents. Here's what I personally need from such a solution:

  • mimic my regular Linux dev machine setup (I don't want to manage multiple dev environment)
  • minimal/no access to information outside what's required for the current project
  • write access only to the current project
  • can directly operate on the files/folders of the project so I can easily inspect or modify the same files from my IDE or run the code myself
  • network access – both to connect to AI providers and search the internet, and to be able to start a server that I can connect to

Bubblewrap and Docker are not hardened security isolation mechanisms, but that's okay with me. I'm not really concerned about the following risks:

  • escape via zero-day Linux kernel bug
  • covert side channel communications
  • exfiltration of data from current project (including project-specific access keys)
  • screwing up the codebase (the code is managed via git and backed up at GitHub or elsewhere)

The last bit is tricky, but even full remote sandboxes can't protect against that. In theory, we could have transparent API proxies that would inject proper access keys without the AI agent ever being aware of it, but this is really non-trivial to set up right now.

An alternative is to contain potential damage by creating project-specific API keys so at least the blast area is minimal if those keys are leaked.

In practice

Here's how my bubblewrap sandbox script looks:

#!/usr/bin/bash

exec 3<$HOME/.claude.json

exec /usr/bin/bwrap \
    --tmpfs /tmp \
    --dev /dev \
    --proc /proc \
    --hostname bubblewrap --unshare-uts \
    --ro-bind /bin /bin \
    --ro-bind /lib /lib \
    --ro-bind /lib32 /lib32 \
    --ro-bind /lib64 /lib64 \
    --ro-bind /usr/bin /usr/bin \
    --ro-bind /usr/lib /usr/lib \
    --ro-bind /usr/local/bin /usr/local/bin \
    --ro-bind /usr/local/lib /usr/local/lib \
    --ro-bind /opt/node/node-v22.11.0-linux-x64/ /opt/node/node-v22.11.0-linux-x64/ \
    --ro-bind /etc/alternatives /etc/alternatives \
    --ro-bind /etc/resolv.conf /etc/resolv.conf \
    --ro-bind /etc/profile.d /etc/profile.d \
    --ro-bind /etc/bash_completion.d /etc/bash_completion.d \
    --ro-bind /etc/ssl/certs /etc/ssl/certs \
    --ro-bind /etc/ld.so.cache /etc/ld.so.cache \
    --ro-bind /etc/ld.so.conf /etc/ld.so.conf \
    --ro-bind /etc/ld.so.conf.d /etc/ld.so.conf.d \
    --ro-bind /etc/localtime /etc/localtime \
    --ro-bind /usr/share/terminfo /usr/share/terminfo \
    --ro-bind /usr/share/ca-certificates /usr/share/ca-certificates \
    --ro-bind /etc/nsswitch.conf /etc/nsswitch.conf \
    --ro-bind /etc/hosts /etc/hosts \
    --ro-bind /etc/ssl/openssl.cnf /etc/ssl/openssl.cnf \
    --ro-bind /usr/share/zoneinfo /usr/share/zoneinfo \
    --ro-bind $HOME/.bashrc $HOME/.bashrc \
    --ro-bind $HOME/.profile $HOME/.profile \
    --ro-bind $HOME/.gitconfig $HOME/.gitconfig \
    --ro-bind $HOME/.local $HOME/.local \
    --bind $HOME/.claude $HOME/.claude \
    --bind $HOME/.cache $HOME/.cache \
    --file 3 $HOME/.claude.json \
    --bind "$PWD" "$PWD" \
    claude --dangerously-skip-permissions $@

If this looks rather idiosyncratic, it's because it is. Rather than using some generic rules, I experimented with bwrap until I found minimal configuration that I need to set up for my system.

Some interesting stuff:

  • /tmp, /proc and /dev are automatically handled by bwrap
  • I bind-mount (ie. expose) files and directories under the same path as local machine, so there's no difference in file locations, project paths, etc
  • I don't expose entire /etc, just the bare minimum
  • The content of $HOME/.claude.json is injected into the sandbox so any changes there won't get saved to the real one
  • The content of $HOME/.claude/ directory is mapped read-write, so Claude can save and modify files there (such as session data)
  • /opt/node/node-v22.11.0-linux-x64/ is my custom nodejs install location
  • I change the hostname so it's easy to distinguish between the host and sandbox

I will probably be tweaking the script as needed, but this is a pretty good starting point for me.

How to customize

If you want to adapt this to another AI agent or to your system, my suggestion is to tweak the script to run bash instead, then run your agent manually, see what breaks and tweak as appropriate.

A useful command for this is strace, which can trace file access system calls so you can see what's needed:

strace -e trace=open,openat,stat,statx,access -o /tmp/strace.log codex

Inspecting the log you can spot which files are needed and bind them as needed.

Recap of my short posts on LinkedIn and elsewhere in January

Truly Open Source AI: OLMo 3 and Nemotron 3 Nano

The past few weeks have been good for open source AI, with the recent releases of Olmo 3 (by Ai2) and Nemotron 3 Nano (NVidia).

Both are truly open source: not only the weights (needed to “run” the model, akin to compiled code), model reports, and the source code, but also the training data and code required to re-train them from scratch, under a permissive license.

Olmo 3 comes in 7B and 32B sizes and several variants: Base, Instruct (focus on quick responses, multi-turn chat, instruction following, tool use), Think (long reasoning chains of thought), and RL Zero (reinforcement learning directly on top of the base model.

Nemotron 3 Nano is a Mixture-of-Experts (MoE) model with 3.2B active, 32B total parameter model (30B-A3B).

Both perform better than Qwen3 32B-A3B and GPT-OSS 20B (which are not fully open – just have the weights available).

What does this mean for you?

If you're an LLM researcher, this is super-useful information, data, and insight – and you probably already know everything about these releases!).

If you're an FOSS enthusiast, rejoice at truly open models becoming viable for everyday use (even if they fall short from the big guns).

If you use local AI, you probably don't care about all the training details, but you do get more options for running on premises or on-device, which is always good.

If you're using AI products or integrate with LLMs over API, you probably don't care much, but I hope the post was interesting.

If you're none of the above – I mean I appreciate it and am thankful, but why are you reading this?! Do let me know in the comments why, I'm curious :)

(As a software developer, I couldn't write the above English-language SWITCH statement without a default/else catchall block... devs will understand).

AI Model Releases: Gemini 3 Flash, GPT Image 1.5, SAM Audio, and SHARP

A bunch of interesting AI model releases this week! If you've got the need for (AI) speed, want to create images, do 2D->3D or analyze audio, here's new stuff to play with:

Google released Gemini 3 Flash, a speedier version of its (also recently released) Gemini 3 Pro. By all accounts, it's quite good while being noticably faster and 4x cheaper.

OpenAI released GPT Image 1.5. While not as powerful as Gemini Image Pro (a.k.a Nano Banana Pro), it's a significant improvement over the previous GPT Image version. My personal favorite is the “graphite pencil sketch” preset. Already integrated in ChatGPT.

Meta released SAM Audio, an Audio-capable variant of their Segment Anything Model that can now also pinpoint and isolate audio samples. I was impressed by the ability to click on an object on the video to isolate its sound (integration of video and audio segmentation).

Apple releases a fast on-device vision model, SHARP, that can take a single image and produce a (faux) 3D view suitable for eg. 3D vision goggles in real time. While it doesn't do a 3D scene reconstruction (like MAST3R & friends), it's probably useful where you want to have (subjective) 2d-to-3d effect, fast, on-device.

OpenAI Launches ChatGPT App Store with MCP-Powered Plugins

OpenAI has opened its ChatGPT app store for outside submissions. ChatGPT apps are plugins (widgets) that users can tag and use directly from the conversation without leaving ChatGPT.

Apps are MCP-powered and represent a second try on the “Custom GPTs” attempt from a few years ago that hasn't become very popular. Beyond the usual MCP goodies, apps can insert (simple) UI (widgets) right into the chat.

To use an app within ChatGPT, you need to “connect” it with using the standard OAuth authorization flow, after which you can @ tag them in the chat.

For developers, OpenAI has published some guidelines on building good experiences.

It'll be interesting to see if this grows into a full app store in the future. Maybe with payments? Monetization is one of the unsolved problems for ChatGPT. Charging a fee or % is an obvious move, though it isn't obvious if that would really work and move the needle.

So far, the new ChatGPT Apps feature looks like a promising start.

Andrej Karpathy's 2025 LLM Year in Review

Andrej Karpathy posted a 2025 LLM year in review.

Andrej listed a few of notable and mildly surprising paradignm changes according to his (very well informed!) opinion:

  1. Reinforcement Learning from Verifiable Rewards (RLVR) – the next scaling frontier
  2. Jagged Intelligence – we're not reimplementing human/animal intelligence, so the strengths and weaknesses don't match – it's just different
  3. New layer of LLM apps – “LLM wrappers” like Cursor show there's a lot of value in properly orchestrating and integrating the LLMs for a specific vertical
  4. AI that lives on your computer – Claude Code as the first convincing demonstration of what a real AI agent looks like
  5. Vibe Coding – will terraform software and alter job descriptions
  6. LLM GUI – chat interface is the worst computer interface, but we're still learning how to do better

Andrej's articles (& videos) are always worth a read (& watch) and this one is no exception. If you haven't been following the AI hype closely (tell me how you managed that feat!), this is a great no-nonsense overview.

How Much Energy Does AI-Assisted Coding Consume?

How much energy does a typical AI coding session consume?

Typical AI query is estimated to consume ~0.3Wh, but agentic coding is far from typical AI chatbot query, spending 10x-100x more tokens (and energy).

In a great post diving into the actual costs, the author analyzes his typical coding session and typical day (full day of coding, several agents in parallel) to arrive at 1.3kWh per day of work.

I ran the calculation with my own numbers and came up at ~18kWh per month, but that's not full time use – a few hours per day, and not every workday.

The cost is not insignificant, but I'd say it's comparable to using other useful technology such as dishwasher or fridge. In my case at least, the cost (in dollar and kWh terms), is worth it.

Google's Search Monopoly and the Web Crawling Problem

A perspective from Kagi (alternative, paid, no-ads search engine) on how Google keeps its search monopoly, why it's bad, and what to do about it.

A few days ago I wrote an article on a similar topic – how AI scraping is a problem because of the lack of unified common public corpus and shared some ideas on how to solve it.

In both cases, the technical challenges, while formidable, are solvable. The blocker is the legal framework and practice. In a nutshell, Google used a ladder, then kicked it down, and nobody can follow.

Learning Zig

I've decided to learn Zig over the Christmas holidays.

Zig is a low-level systems language with explicit memory management and error handling, alongside a sophisticated compile-time (comptime) functionality. It's roughly in the same niche as Rust.

Why Zig

Why not Rust?

I've started, and abandoned, learning Rust several times over the years. I just don't like it — this is a personal, subjective preference. I don't have anything against the language, I just don't like to use it. Rust has been touted as “a better C++” and in my (admittedly limited) experience, that's exactly right — and that's just the problem.

Personally, I like small languages, with minimal surface area, that keep things (mostly) explicit. Languages like C, Go, Scheme, or Python. I dislike large languages with complex, often implicit, effects, like C++, Rust, Common Lisp, or Haskell.

I'm happy with my choice of Python and Go. I haven't used Scheme in a long time (since R6RS!), because the batteries-included aspect of Python (and increasingly Go) just trounces it. And while C is still the lingua franca (literally: most other languages interop using C ABI), it shows its age, especially around (non)safety and minimalistic standard library.

Zig looks like it might fit my preferences perfectly.

I first noticed Zig a couple of years ago. I didn't really have the need to learn it yet, but figured it'd be a fun thing to do over the holidays!

Hello World

Here's a hello world in Zig:

// "std" is the complete Zig standard library
const std = @import("std");

// Defining a public entry point function that returns
// nothing (void) or an error (that's the ! part)
pub fn main() !void {
    // init a "writer" object (struct) with empty buffer (.{}) over stdout
    var w = std.fs.File.stdout().writer(&.{});
    const stdout = &w.interface;

    // format the message using provided tuple and print it
    // "try" doesn't catch, it immediately returns error to caller
    try stdout.print("Hello {s}!\n", .{"world"});
}

The C equivalent is actually shorter:

#include <stdio.h>

void main() {
    printf("Hello %s!\n", "world");
}

Hello world is too small, so it doesn't touch on the memory management, but even in this small example there are some benefits:

  • In C, you have to know printf is from stdio, while std... is explicit in Zig.
  • I have to explicitly handle the error (by choosing to propagate it further with try) in Zig, while I can happily ignore any runtime issues with printf in C.
  • Format string and arguments are statically type-checked in Zig. C allows you to pass garbage data.

Things get more interesting when memory management comes into play. To give you a hint:

const std = @import("std");

pub fn main() !void {
    // Initialize the memory allocator
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    // At the end of this block, deinitialize the allocator
    defer _ = gpa.deinit();
    const allocator = &gpa.allocator();

    // Allocate 1024 bytes and handle errors
    const buffer: []u8 = allocator.alloc(u8, 1024) catch |err| {
        std.debug.print("Memory allocation failed: {}\n", .{err});
        return;
    };
    // At the end of this block, free the allocated buffer
    defer allocator.free(buffer);

    // Initialize stdin reader with the provided buffer
    var reader = std.fs.File.stdin().reader(buffer);
    const stdin = &reader.interface;

    std.debug.print("What's your name: ", .{});
    // "name" is a borrowed slice from somewhere within buffer
    const name = stdin.takeDelimiterExclusive('\n') catch "";
    std.debug.print("Hello, {s}!\n", .{name});

    // Defer runs in FIFO order from the "defer" statements
}

This is a bunch of work! In Zig, any function that needs to allocate memory takes a memory allocator struct as the parameter, making the memory management front and center via dependency injection.

Learning strategy

The official Zig docs are hit and miss. The Documentation button links to the language reference, which is not really structured for beginners. The Getting started button references zig.guide, which is out of date (see below why that's a big thing). The Learn section of the site does list several more documentation resources.

After a few false starts, I found (also on that page) Introduction to Zig, an open-source and up-to-date book, which looks pretty solid so far.

My current learning strategy is loosely following the book. I often spend more time chasing down rabbit holes (for example in UTF8 handling) I spot in a specific chapter. I'm also working on a toy busybox clone, giving me simple tasks for file and string handling. The idea is to get immersed into the language and get it into my muscle memory.

I am using LLMs as TAs, asking for more background or rationale on some detail, for example why literal strings are null-terminated in Zig (spoiler: for easy C integration).

Bleeding edge

The current version of Zig, as I write this, is 0.15.2. There was a major change in 0.15 related to how reading from and writing to files works. In 0.16 this will change again, as several parts of the standard library are reorganized.

This proved to be a real blocker for me initially. I accidentally downloaded master (future 0.16) with some of the breaking changes in, and the documentation (for 0.15 or earlier versions) was extremely confusing.

With the language very much in flux, is there a point in trying to learn it now?

I would be skeptical, if not for examples of high-quality software written in Zig, like TigerBeetle and Ghostty. To me, these show the changes are not so insurmountable and perhaps it's a good time to start learning the language. In a few years, when the language is ready, so will I!

Do I plan to write production code in Zig? Not currently — that's certainly not my motivation right now. In a few years? Who knows!

Lately I’ve been thinking about the cloud platforms, SaaS, AI, scraping, costs, and the gradual closing of the web.

At first glance these seem like separate topics, but they share a set of assumptions that have quietly become “best practices” in the industry. Assumptions about what’s professional, what’s safe, what’s scalable, and what’s supposedly too hard for individuals or small teams to do themselves.

While I'm not anti-cloud or anti-SaaS in general, I do have a feeling these tools are often used far beyond where they make sense, largely due to marketing pressure and fear. That overuse creates real downstream effects: higher costs, lock-in, fragile systems, and eventually people closing off their own sites and apps just to stay afloat.

Cloud platforms and the myth of “serious infrastructure”

There’s a widespread belief that building something “serious” online requires “serious” platforms, which usually means public cloud infrastructure: AWS, GCP, Azure. Anything else is treated as amateurish.

The argument usually goes like this:

  • Cloud gives you scalability, reliability, and failover.
  • You shouldn’t self-host or manage servers yourself.
  • Professionals use managed platforms.
  • You'll need to hire sysadmins anyways so it won't be cheaper.

This narrative sounds reasonable and may be true in some cases, but is off the mark in many real-world cases.

Cloud platforms are not set-and-forget. The cloud solutions are complex beasts that you can easily mis-configure if you don't know what you're doing. As a result, you still need expertise to design, audit, and maintain the system, and over time you’ll need changes, fixes, and migrations. Instead of Unix tools and config files, you use web consoles, access policies, managed services, logs, metrics, and alerts. That just shifts where the complexity lives lives, but the work and costs remain.

Cloud infrastructure also isn’t inherently more reliable. In recent months alone, AWS, Cloudflare, GitHub, and others have had significant outages. Shared platforms fail too, and when they do, failures are global. A small, well-understood system under your control is easier to reason about and recover.

Security follows a similar pattern. Cloud providers have professional teams, but they also represent high-value targets with enormous blast radii. A small, simple, well-patched server with a minimal setup is often simpler to secure and audit.

Cost is where the differences become unavoidable: following cloud “best practices” gets expensive fast. Compute is only the starting point, and then load balancers, replication, managed databases, traffic, logs, metrics, alerting, and add-ons all pile on. In practice, cloud setups are often an order of magnitude or more expensive, and they still require specialized knowledge.

The usual justification is that this avoids hiring a sysadmin. What actually happens is that you replace that role with an AWS, GCP or Azure consultant at the same. If you can learn cloud tooling deeply enough to manage it yourself, you can learn Linux administration.

This pattern isn’t new. In the 2000s, Linux and Apache were considered unprofessional compared to Windows servers or branded Unix systems. Postgres and MySQL were dismissed in favor of Oracle. Linux routers were seen as inferior to Cisco hardware. These were marketing narratives framed as best practices, not technical inevitabilities.

Cloud infrastructure follows the same trajectory.

The Anti-Not-Invented-Here syndrome

A second pattern shows up in how people approach application architecture.

Instead of building small, simple components, there’s an increasing tendency to default to SaaS platforms and heavyweight frameworks for problems that are already well understood and mostly straightforward.

A blog is a good example.

Someone wants to write a blog. They reach for React and Next.js. That leads to client-side rendering, which causes SEO issues, so server-side rendering gets added. A remotely exploitable vulnerability appears in Next.js, raising concerns about arbitrary code execution. Running it on a plain Linux host now feels risky. Containers sound safer, or better yet, deploying to Vercel so someone else handles it.

At first it’s free. Then traffic grows. Bots and AI scrapers arrive. Bills start increasing. Now the problem is framed as “evil scrapers.”

None of this was necessary.

A blog is static content. Static HTML generation has existed for decades. Hosting it on a cheap VPS or static hosting costs almost nothing. The attack surface is minimal, and traffic volume doesn’t matter.

Unnecessary complexity creates cascading requirements: more infrastructure, more security layers, more tooling. Eventually outsourcing does make sense, but only because the system was made hard to operate in the first place.

The same thing happens with databases. Instead of running Postgres, MySQL, or SQLite locally, people jump straight to platforms like Supabase or RDS. They’re convenient and feature-rich, but most projects use only a small subset. If you later rely on the advanced features, moving away becomes painful or impossible. Growth then turns into a recurring cost problem.

Authentication is another example. Auth is a solved problem with solid libraries and established patterns. The (solid) advice to avoid rolling your own cryptography has expanded into a blanket avoidance of auth entirely. Instead of teaching people what not to do, the default is outsourcing to third-party services. The complexity doesn’t disappear, it just becomes vendor-specific.

This shifts effort from general skills to lock-in.

Over time, SaaS subscriptions accumulate. Individually they seem minor, but together they add up, especially if you have unexpected usage spikes. Running your own product becomes expensive, which forces aggressive monetization, artificial limits, or scaling simply to justify the cost structure.

Scrapers and anti-scrapers

When infrastructure is expensive and usage-based, every request matters. Even serving a blog becomes a cost center. That’s where scrapers of any kind (but especially AI) become a visible problem.

Much of today’s hostility toward scraping is driven by real bills. If serving traffic costs money, unwanted traffic becomes something to block. Cloudflare rules, captchas, garbage responses: anything that reduces load.

With static files served at near-zero cost, this wouldn’t matter. Scraping would be irrelevant or even welcome. Scraping becomes a problem because of the underlying cost model.

The response to those costs is a gradual closing of the web.

Large platforms already operate this way. Facebook, chat platforms, and social networks allow data in while tightly controlling access out. Public content is often only accessible through proprietary interfaces, with limited or hostile APIs.

Individuals increasingly mirror this behavior. Access is blocked for everyone except Google or Bing for SEO reasons. Others are explicitly denied, not merely discouraged through robots.txt.

This dynamic strengthens incumbents. Building a competitor to Google today is limited by permissions, not technology. The web is increasingly crawlable only by the largest players.

The irony is that the actors people fear most (OpenAI, Facebook, Google, Anthropic) can easily bypass these barriers. They have the resources to do so. Smaller companies, researchers, and hobbyists do not. The web closes unevenly.

This creates a feedback loop: marketing-driven practices raise costs, higher costs incentivize restriction, and restriction concentrates power further.

DIY is an advantage

This trend is unlikely to reverse on its own. The incentives are strong, and the marketing is effective.

Understanding that you can operate systems yourself still matters. Hosting simple services and keeping systems boring and cheap are often the most robust choices available.

There's value is recognizing when you only need a small slice of what’s being sold. With AI-assisted coding, implementing that slice is easier than it’s ever been.

People who are comfortable one layer below the current fashion retain more options. They can decide when outsourcing makes sense and when it doesn’t.

Learn Linux. Go ahead and self-host PostgreSQL. Roll out your own auth. Run your projects on cheap boxes. Set up your own VPN and load balancers. Keep it Simple, Stupid.

Doing Things Yourself is a real competitive advantage, and it’s becoming rare. I hope it doesn't become extinct.


Further reading:

Increasingly I notice when I talk about AI with others, we often mean subtly different things.

When I'm talking about AI, I'm talking about the technology (LLMs, agentic systems, etc.) and the type of product one can build using that technology (chatbots, assistants, classifiers, process automation, etc.)

I now see that for many others, that's not the primay concern. Instead they think about a class of (consumer) products (ChatGPT, WhatsApp AI chatbot, MS Copilot) and/or the effects of people (mis?)using the technology (AI slop).

To illustrate: In a recent conversation I stated I believed the AI models will improve in the future. In my mind, an LLM improving is an objective fact: I can task it with something and get better result.

The reply was “in which direction, and for whom?” which is a (totally fair) product and business strategy issue – not a technical one. We indeed might have much better LLMs within much shittier chatbots!

Another conversation from a few weeks ago was about AI serving porn. LLMs are gigantic autocomplete machines, they'll serve whatever you want them to serve (to a first approximation). From a technical standpoint, that's a complete non-issue.

Looking at AI as a class of products increasingly used by everyone, including our children, and including people who are not tech (or AI) savvy – what, how, and why the product makers choose to serve IS very important!

This is similar to “social networks”. What we today call “social networks” are everyhing but – a far cry from the “web2.0” social networks era, when the point was to interact with your (real) social circle. The name stuck, the damage is still being assessed (witness recent Australia ban for social networks for under 16s), but the “social network” aspect is not the problem: the hyper-optimized engagement machine peddling all sorts of questionable stuff is.

Sadly, I think the meme battle is already lost: AI is increasingly being understood as a class of products. The problem is when this mixup causes people to blame the technology for the questionable business and product strategies that big tech companies use to maximize shareholder value.

AI, the technology, is now blamed for layoffs, struggling artists, and slop, to name a few. In fact, companies have a great scapegoat for layoffs, Disney just made $1B from AI (none of which will go to struggling artists), and people have been hand-crafting slop in the name of SEO for years.

I don't have an answer. This post is mainly for my friends, to explain that when I'm bullish about AI, I'm bullish about the underlying technology. I'm not bullish about the way big tech is going to (ab)use it to maximize profit.

Hacker ethic is about using tech in inventive ways to improve people's lives. I believe AI has the potential to do so. Sadly, I know it will also be used by those of “Greed is good” ethic.

I hope when we direct our critique, we aim at the right target.

When you submit a pull-request, you accept full responsibility for the code you're submitting.

This comes up so often in conversation I am amazed that I need to spell this out at all – but here it is.

It doesn't matter if you vibe-coded, used AI-autocomplete, copy-pasted from Stack Overflow or from some other project, or if you asked your aunt to help you. By hitting that “Create PR” (or equivalent) button, you attest that you fully understand what the code is doing and that you have legal rights to submit it (ie. you're not stealing).

If I'm reviewing a PR, or any production code, and the author has no idea how or why it works, it's a red flag. Worse if the author hides the fact that they have used AI, Stack Overflow, or subcontracted someone on Upwork. In my book, that's serious and unacceptable breach of professional conduct.

Note that there are situations where it's perfectly fine to have bunch of spaghetti slapped together with duct-tape: spikes, prototypes, quick throwaway code or low-impact internal tool. Wanna vibe-code that new app screen as a functional mockup? Knock your socks off!

The required quality of the code, and understanding of the details and effects, is unrelated to the tool used to create said code. I would expect any developer (except perhaps the most junior novices – they first need to learn this) to understand how much care they need to put in.

You can't abdicate your responsibility for the code. “The AI wrote it” carries the same weight as “the dog ate my homework.”

What if Pull Requests weren't linear?

Saša Jurić has a wonderful talk titled “Tell me a Story”, described as a “three-act monodrama that explores the quiet art of narrative structure in collaborative software work”. It's not available online yet, but if you have a chance to see it in-person, you should!

I won't spoil the theme here, but the talk in part touches on topic of code reviews using git and GitHub (or similar system, like GitLab): in a nutshell, how do you present and keep a coherent story of your changes for the reviewers.

For example: nice commit history

While I share the ideals, the reality starts to get messy when review feedback commits get added to the PR:

it's downhill from here

Saša does address that somewhat, but I still feel it's messy, cumbersome and tedious to deal with:

  1. You can merge the reviewed branch as-is, leaving the mess in your git history
  2. You can amend (fixup) individiual commits as per the feedback, which will produce different commits and reviewers will need to re-review everything on GitHub
  3. You can amend the commits after the PR is approved, cleaning up the branch before merging. If the feedback and rework is extensive, this can be hard to do and error-prone.
  4. You can squash everything while merging, which won't leave any trace of the discussion (Saša argues strongly against this)

I admit I often do 4 just because it's the easiest, and because I treat a branch as the atomic unit of work (once finished), but it does lead to large commits and often unrelated things creep in.

I resigned to live in this messy reality until I stumbled on Code Review Can Be Better post by Alex Kladov, which introduced me to interdiff reviews.

Interdiff reviews are a way to have your cake and eat it too:

  1. amend the relevant commits
  2. push everything as a separate series of commits
  3. the reviewer reviews the difference between the original series of commits and the new one
  4. repeat as needed
  5. merge the final reviewed series of commits, discard the intermediate ones

Intuitively this sounds like a good approach: each original commit is fixed where needed, the fixes are all reviewed, and the resulting git history looks nice and is easy to navigate.

This is all natively supported by git with the range-diff command.

Assuming the PR is made off of main branch, new-feature is the original PR branch and new-feature-v2 is fixed after the PR feedback: git range-diff main new-feature new-feature-v1 shows the changes between those two branches, per-commit.

A big downside of this approach is complete lack of support from Github, Gitlab, and other similar services. Given that many developers use one of these for PR reviews, it's hard to implement.

However, moving code reviews to local machine (perhaps helped with git worktree to avoid the pain of WIP branch changes) could allow curious teams to experiement and perhaps adopt the workflow.

One of the related challenges is where to hold the comments (discussion). The Code Review Can Be Better post mentions placing the review comments as code comments (the rationale is the comment should live next to the code that's being commented) and links to a very interesting talk on how Jane Street does code review.

I find the idea of PR-comments-in-code-comments intriguing: ideally, it does make sense! I don't think we're anywhere near there with the tooling though (Jane Street built their own).

I do want to explore the branch less traveled, though!

I've recently installed a new laptop, which was an opportunity for me to revisit and revise the default software I usually install on any new workstation I set up (I use the term “workstation” here to mean a laptop or desktop machine that I can comfortably use in my daily work — software development in web/backend, AI, audio/video/streaming and related areas).

Here's my latest setup, roughly in the order of installation:

Debian / Ubuntu

I prefer using a Debian-based distribution, ideally Debian stable if all the hardware is supported. Right now Debian 12 (current stable) is pretty old and doesn't support the latest (Meteor Lake) hardware. Instead of mucking around with testing or unstable (which can be fun, but the fun can strike at inopportune times), I've installed the latest Ubuntu (24.10) which supports almost everything out of the box (Linux users will not be surprised to heard I had to tweak some driver options to get suspend/resume working).

The OS installation is pretty standard, the only non-default option I pick is to encrypt the whole disk. Having a non-encrypted disk on a device that can be easily stolen is a no-go for me. On the desktop workstation, I also set up SSH so I can remotely access it from elsewhere, but don't open any ports on the router.

I don't like Ubuntu's tweaks to the GNOME desktop and Snap packages, so if installing Ubuntu I do remove those. In general, I prefer setting up the apt repositories for 3rd party packages (getting auto-updates and all the other apt goodies). If the app doesn't have a repository but has a .deb package, I'll install that. I can also live with flatpak packages, and as a last resort, I'll manually install the app into /opt/<app> (if it's a GUI app or has many files) or /usr/local/bin (if it's a single binary).

GNOME

GNOME is a pretty opinionated piece of software. The developers' particular set of opinions resonates with me and I've been a very happy user for the past 20 years or so. I prefer vanilla GNOME interface (ie. without Ubuntu tweaks) and only need minimal customizations (mostly a few key shortcuts).

I do tend to only use the basic system and utilities. Not because I don't want to use various GNOME apps, but they just don't fit in my preferred workflow much.

1Password

I keep all my passwords in 1p, so it is the first thing that gets installed on a new machine after the OS is installed. It's very easy to set up – install via their apt repo, scan the QR code on my mobile 1Password app, and it's all there.

I also keep my SSH private keys in 1Password and set it up as the ssh agent. This way, I only need to unlock 1Password to unlock my SSH keys.

Dropbox

The next app is Dropbox. I'm a paid user and keep everything important (documents, company and personal documents, some media files, etc.) there. I also use Dropbox to auto-upload my mobile photos and videos, and symlink Pictures/, Videos/, and Music/ to the respective Dropbox folders. Another useful option I use is to scan documents, bills, etc. with my mobile and have them auto-uploaded to Dropbox. Though the quality is not the same as with a proper scanner, it's good enough for most purposes.

I don't keep my code, or the dotfiles/settings in Dropbox.

Firefox / Firefox Dev Edition

I use Firefox as my main browser, and a separate installation of Firefox Developer Edition for development work. Although Firefox is already available on both Ubuntu (via Snap) and Debian (Firefox ESR), I remove those and install the latest version directly from Mozilla's apt repositories.

I love the Multi-Account Containers feature/extension in Firefox. I also install UBlock Origin (ad/tracking blocker), 1Password (1p integration) and Kagi (search engine) extensions. I use (and pay for) Kagi as my search engine, and I'm very happy with it.

Many of the services I use daily are web-based (Fastmail for private mail, Google apps for work, GitHub,)

Tailscale

I have a personal VPN provisioned with Tailscale. Setup involves installing and enabling tailscale client and logging in with my account. Once enabled, I can connect from my laptop to my desktop from anywhere without punching holes in my router or worrying about security.

Visual Studio Code

I use VSCode as my main editor, mostly for editing Python, JavaScript and Markdown files. I use the official Microsoft binary and immediately turn off all telemetry (hopefully all!).

I heavily use the Remote SSH feature of VSCode: most of my projects are located on my desktop. When on Laptop, I open them via Remote SSH, and since that goes through the Tailscale VPN I can do this anywhere in the world. The SSH latency (for terminal work) can be a bit high when accessing from another continent though.

One thing I was worried when first setting this up is potential conflicts if the same project is opened locally (on desktop) and remotely (from laptop) at the same time, but I haven't had any issues with it.

VSCode has pretty good support for Python (including my linter/formatter of choice, ruff) and JavaScript. I also use GitHub Copilot, mostly as a smart auto-completion tool.

Obsidian

I love Obsidian, but keep it simple. I use it without any extra plugins or customizations – just a bunch of Markdown files in folders. I keep the data in Dropbox to get free sync with Dropsync on my phone.

CLI tools

My terminal app of choice is Tilix. It's fast, has tiling support, and integrates nicely with the rest of the GNOME terminal. I use Bash with minimal customizations (prompt, a few aliases and history settings).

I use vim for quick edits in terminal (no config to speak of – just syntax and smart indentation), ripgrep to search in files and fdfind to search files by name/extension. When connecting to a remote server I prefer screen (shows my age, I guess).

As a Python developer, I love the new ruff (a linter/formatter) and uv (package manager) tools so these get installed immediately (just drop them in /usr/local/bin).

I use git for version control, and my personal and work repos are hosted on GitHub. I don't use their CLI app though.

Media

OBS Studio

For any kind of screen recording or streaming, I use OBS Studio. Pretty vanilla setup, works great out of the box, I barely scratch the surface of its capabilities.

CLI tools

I use the command-line mpv for video playback, ffmpeg and friends for audio/video manipulation in the command line, and yt-dlp for downloading videos from YouTube (hey, that's not piracy, I'm a YT Premium subscriber!).

GIMP

If I need to do some image editing (cropping, resizing, adding text, minimal tweaking) I use GIMP. I'm not a graphic designer or a photographer, so GIMP is more than enough for my needs.

Spotify

I still have an old carefully-curated archive of MP3s somewhere, but these days I just use Spotify across all my devices. After a few years of use I've favorited enough of the songs I like so that it recommendations are mostly on point. Not everything is there, and for that I use YouTube. I also like the fact that I can just download all my liked songs for offline use (eg. when outside wifi and mobile coverage, on a plane, or roaming).

Online meetings

This depends on what I'm working on and with whom, but some combination of Slack, Zoom, Google Meet, and Discord (ideally all in the browser whenever possible).


This list is not exhaustive, but it covers the apps I use (almost) daily and will invariably need on a computer. All of them also have very good alternatives, so the list is highly subjective – it contains the tools I prefer and that work well in my personal workflow. Each time I (re)install a workstation there's some tweaking, but these are the ones that I keep coming back to.

Conventional wisdom these days, especially for startups, is to design your software architecture to be horizontally scalable and highly available. Best practices involve Kubernetes, microservices, multi-availability zones at a hyperscaler, zero-downtime deployments, database sharding and replication, with possibly a bit of serverless thrown in for good measure.

While scalability and high availability are certainly required for some, the solutions are often being recommended as a panacea. Don't you want to be ready for that hockey-stick growth? Do you want to lose millions in sales when your site is down for a few hours? Nobody got fired for choosing Kubernetes¹, so why take the chance?

In ensuing flame wars on Hacker News and elsewhere, this camp and the opposing You Ain't Gonna Need It (YAGNI) folks talk past each other, exchange anecdata and blame resume-padding architecture astronauts for building it wrong.

A hidden assumption in many of these discussions is that higher availability is always better. Is it, though?

Everything else being equal, having a highly-available system is better than having one that has poor reliability, in the same way as having fewer bugs in your code is better than having more bugs in your code. Everything else is not the same, though.

Approaches to increase code quality like PR reviews (maybe from multiple people), high test coverage (maybe with a combination of statement-level and branch-level unit tests, integration tests, functional tests and manual testing) increase cost proportionally to the effort put in. A balance is achieved when the expected cost of a bug (in terms of money, stress, damage, etc.) is equal to the additional cost incurred to try and avoid the bugs.

You can't add a bit of Kubernetes, though. The decisions about horizontal scalability and high availability influence the entire application architecture (whichever way you choose) and are hard to change later. The additional architecture and ops complexity, as well as the additional platform price to support it, goes up much easier than down.

Faced with this dilemma, it pays to first understand how much availability we really need, and how quickly we will need to scale up if needed. This is going to be specific to each project, so let me share two examples from my personal experience:

At my previous startup, AWW, our product was a shared online whiteboard – share a link and draw together. It was used by millions of people worldwide, across all timezones, and the real-time nature meant it had to be up pretty much all the time to be usable. If you have a meeting or a tutoring session at 2PM and are using AWW, it better be working at that time! One stressful episode involved scheduled downtime on an early Sunday morning European time and getting angry emails from paying customers in India who couldn't tutor their students on the Saturday evening.

Clearly, for AWW the higher the availability, the better. During COVID, we also experienced the proverbial hockey stick growth, servers were constantly “on fire” and most of the tech work included keeping up with the demand. A lot of complexity was introduced, a lot of time and money was spent on having a system that's as reliable as it can be, and that we can scale.

On the other hand, at API Bakery, the product is a software scaffolding tool – describe your project, click a few buttons to configure it, get the source code and you're off to the races. It's a low-engagement product with very flexible time limits. If it's down, no biggie, you can always retry a bit later. It's also not such a volume product that we'd lose a bunch of sales if it's down for a few hours. Finally, it's not likely to start growing so fast that it couldn't be scaled up the traditional way (buy a few bigger servers) in a reasonable time frame (days). It would be foolish to spend nearly as much effort or money on making it scale.

When thinking about high availability and scalability needs of a system, I look at three questions (with example answers):

1) How much trouble would you be in if something bad happened:

  • low – nobody would notice
  • minor – mild annoyance to someone, they'd have to retry later; small revenue loss
  • major – pretty annoying to a lot of your users, they're likely to complain or ask for a refund; significant revenue loss
  • critical – everything's on fire, you can't even deal with the torrent of questions or complaints, incurring significant revenue and reputation loss
  • catastrophic – you're fired, your company goes under, or both

2) How often are you prepared to experience these events:

  • low – daily or weekly
  • minor – once per month
  • major – once or twice per year at most
  • critical – hopefully never?
  • catastrophic – definitely never!

3) What's the downtime for each severity:

  • low – 30s/day (AWW – we had auto-recovery so this was mostly invisible to users), 5min/day (API Bakery)
  • minor – 5min/day (AWW), 1h/day (API Bakery)
  • major – 1h/day (AWW), several hours outage (API Bakery)
  • critical – 4h+/day (AWW), several days outage (API Bakery)
  • catastrophic – 2+ days (AWW), a few weeks outage (API Bakery)

These are example answers to give you intuition about thinking in terms of expected cost. In this case, it's obvious that the availability and scalability needs of AWW and API Bakery are wildly different.

Quantifying the costs of (not) implementing some architecture or infrastructure decision is harder, and also depends on the experience and skill set of people involved. Personally, for me it's much easier to whip up a VPS with a Django app, PostgreSQL database server, Caddy web server, with auto-backups, than it is to muck around with Helmfiles, configuring K8s ingress and getting the autoscaling to work, but I know there are people who feel exactly the opposite.

When quantifying the cost, I think about:

  • is this something we already know how to do?
  • if not, does it make sense to try and learn about it (appreciating the fact that we will certainly do a substandard job while we're learning)
  • can we engage outside experts, and will we be dependent on them if we do?
  • what are the infrastructure costs, and how easy is it to scale them up or down?
  • how will the added complexity impact the ongoing development, growth and maintenance/ops of the system?
  • how far can we push current/planned architecture and what would changing the approach entail?

We might not be able to get perfect answers to all these questions, but we will be better informed and base the decision on our specific situation, not cargo-culting “best practices” invented or promoted by organizations in a wildly different position.


¹ I'm not hating on Kubernetes or containers in general here. Those are just currently the most common solutions people recommend for failover and scaling.