My experience with "AI"

Ever since ChatGPT's release in 2022, "AI" is everywhere. It's very tiring.

Meme showing Mr Krabs in front of a blackboard saying 0 days without AI nonsense — Every day since ChatGPT

In this post, I thought I would share my experience with LLMs and a few techniques I picked up for getting the most out of them.

(If you are in a rush, you can jump straight to the tips)

Hand-drawn sketch of the words Context with a crown on it

Surprising moments with LLMs

I think the best way to describe my experience with LLMs is to describe the surprising moments I had with them over the last few years.

ChatGPT goes live (December 2022)

I was amazed at ChatGPT when it first came out. Nothing before it could creatively create or manipulate text the same way it could. It was just fun to see what it could come up with.

Screenshot of a poem about space kittens composed by ChatGPT — The first thing that T asked from ChatGPT

Naturally, like many of you, I wanted to find its limits. I asked it to write poems. I quizzed it on various topics. It did all of that surprisingly well. But then it started to struggle when I asked it to write in different languages I knew. It could handle simple requests in Czech but not more advanced ones. It spelled some words correctly in Sinhala but the sentences didn't make any sense.

Screenshot of a nonsensical letter in Sinhala by ChatGPT — Comas Miliya, I demand to know about the special bank in your Colombo warehouse

This experience shaped how I saw LLMs in the years to come. They can do amazing things but their output could always be wrong.

The early days of Github Copilot (December 2022)

At first, I didn't believe that LLMs could write code. It's one thing to write cute poems but another to write code that requires logic and an understanding of programming concepts. But then I asked ChatGPT to write a few JavaScript functions and React components and it wrote perfectly valid code.

I was still hesitant when GitHub Copilot came along a few weeks later. I am extremely frugal and Copilot was only available back then with a paid subscription. I didn't know whether I would use it enough to justify the price. But I also knew that small improvements to everyday experience add up over time and Copilot promised a better user experience for generating code in VS code. ChatGPT was also having frequent outages at the time as they struggled to scale up and I figured that I could use Copilot as an alternative to it, even for non-coding tasks. So I signed up and I have been paying GitHub every month since then.

The early versions of GitHub Copilot's VS Code extension were extremely simple. Write a comment, get some code. Write some code, get some more code. That was all it could do. But that alone was quite helpful. It could reliably generate utility functions, example data and example code. If it had some examples, it could easily generate boilerplate code, configuration files and JSON data much faster than I could write it. Waiting for Copilot's autocomplete, especially for trivial things like module imports, started to be a habit.

Chatting with Github Copilot (Late 2023)

GitHub Copilot Chat started rolling out in late 2023 and I joined the early preview. I didn't think much of it at the time but a few weeks later, I was very glad about my decision.

I was modernizing an older React project and after putting it off as long as I could, it was finally time to migrate it from vanilla Redux and Redux-Saga to Redux Toolkit. This involved refactoring many of the existing components and that was very tedious. Then I remembered Copilot Chat and I figured that I could craft a prompt to do the refactoring for me. Sure enough, I managed to come up with a prompt that worked for a single component and I sat back and ran the same prompt on each component that needed refactoring. Copilot was able to refactor each component in seconds, where it would have taken me several minutes. I was stretched thin that month as I was temporarily allocated to multiple projects and the $10 for Copilot was easily the best $10 I spent that month.

Cline and Claude 3.5 Sonnet with computer use (Late 2024)

I stopped following "AI" news closely because there was nothing remarkable for a while. But then I was jolted awake from my slumber by a LinkedIn post from Vojta.

Screenshot of a LinkedIn post — What the heck is Cline?

What the heck is Cline? I decided to find out. As I downloaded Cline, I saw that it had recently added support for something called computer use, a feature of the latest Claude Sonnet model. Computer use allowed Claude Sonnet to interact directly with the OS and other applications by taking screenshots, moving the cursor and clicking on UI elements. Cline could also open terminal windows and edit files directly in VS Code. All of this sounded too good to be true so after installing Cline and signing up on OpenRouter, I immediately tried to break it.

I wanted to test all this functionality on real tasks rather than simulated ones. I had been putting off a side project for a while and I thought it would be the perfect way to test this out. So I gave Cline the command to set up Pocketbase on my machine. I expected it to fail miserably. But then it generated a valid URL to the Pocketbase zip file, ran wget to download it, unzip to extract it and ran the binary in the extracted folder. That was impressive. But then it opened a browser to Pocketbase's admin dashboard and completed the initial setup by entering the admin credentials. 🤯

(Note: The recordings below are a reenactment of my early interactions with Cline. The prompts are slightly different and the sessions played differently due to LLMs being LLMs.)

I opened the admin dashboard to see for myself whether it had been set up. It had. I hadn't expected it to get this far and now I wanted to see how far it can go. I asked it to create a new collection in Pocketbase. It did that after more clicking. I asked it to write a React component that could display the records in the collection. It generated valid code but running that code threw an error. Aha! I asked it to fix the error by pasting it in. I didn't know the cause of the error myself. But then it surmised that the error was due to the collection not having the correct permissions and as I watched open-mouthed, it opened the admin dashboard, clicked the small collection settings icon and changed the permissions of the collection. 🤯

This was one of the biggest surprises of my life. I suddenly realized that LLMs had crossed the line into being capable agents.

Vibe coding with Aider (Late 2024)

After my experience with Cline, I wanted to find the limits of LLM agents. While Claude Sonnet's computer use was amazing, it was rather slow and very expensive. My initial hour-long experimentation session cost $20 which was 2x the cost of Copilot for an entire month. Moreover, what I really needed help with was the code and not setting things up in the UI. Cline without computer use would have been fit for this purpose but it had a strange bug at the time where it couldn't read the files on disk. So I started looking for other alternatives.

I felt that the ideal agent would be one that was terminal-based so that it could be controlled entirely from the keyboard. It would be even better if it were open source and not locked into a single model or vendor. With Aider, I found all of these things.

Screenshot of Aider running in VS Code — Aider in VS Code

So I set up Aider with Claude Sonnet 3.5 and continued my small side project. I challenged myself to use Aider as much as possible without writing code myself and it worked surprisingly well. In 2 days, I was able to finish a small React app with around 4 screens and a mobile-first layout, with Aider writing most of the layout code using components from Mantine. Admittedly, a lot of the React code it generated was not good but it was easy to ask it to change it to my liking. For example, I had to keep asking Aider to remove unnecessary useState hooks which it added to the components. I had also decided to use XState after my positive experience with it in Timo but Sonnet 3.5 struggled with this. It kept generating code that used deprecated methods since XState had changed its API a lot between v4 and the current v5. But after I wrote some v5 state machines myself and added them to the prompt as examples, it got much better at writing them.

Since then, I have used Aider for several small projects. I find it a lot more comfortable than writing code myself, especially when I am working on side projects in the evenings or weekends. It generates code much faster than I could write it but then it also generates some truly atrocious code at times that I need to step in and fix before it compounds the mess. It still feels magical to give commands and see the changes appear in a few seconds, to detach from the code and to focus on the outcome. Andrej Karpathy shared a similar sentiment on X and accidentally introduced the term "vibe coding".

Screenshot of tweet by Andrej Karpathy about vibe coding — And thus Vibe coding was born

Documentation-driven development (2025)

I haven't played around with this a lot yet but this is the direction that I am seeing LLM-assisted development go in right now. I don't know if anyone has coined a term for it yet but documentation-driven development (DDD) is how I would describe it, where you ask the LLM to generate a specification or plan and then ask it to execute parts of the plan while updating the document. Using documents to guide the behavior of LLM agents is also becoming a thing with Claude's CLAUDE.md and Aider's CONVENTIONS.md. I find it funny that documentation went from being the least sexy thing to the thing which all the cool kids are doing. (I originally named this practice spec-based or documented-based development but a quick search revealed Documentation driven development which describes the same thing for the outcome of having good documentation)

The problem I have with this and with vibe coding is the lack of control. After years of writing code myself and being in complete control of it, it is not easy to let go of that control and to put it in the hands of a fallible LLM. This will be my challenge in the months to come. Right now, I only use Aider on my own projects and I cautiously use Copilot for client work.

With the way things are going, it is possible that writing code will become as archaic as writing assembly when agents live alongside the code in the codebase and you interact with them to make all changes to it.

Getting the most out of LLMs

Here are a few things I found for getting the most out of LLMs for assisting with code.

Context is king

LLMs are just token-guessing machines. Their output is only as good as their input so what we put into the prompt heavily influences what we get out of them. Put too many things in and it will get lost. Put too little in and it will not understand the task correctly. It needs to be a balance between the two.

I found that I was more likely to get what I wanted from a model if I used specific technical terms. I found that feeding similar code dramatically improved the consistency and quality of the generated code, especially when the model struggled to generate code using that language, framework or library. I found that including too many files caused the model to go off the rails and return garbage results.

Build intuition

LLMs are very temperamental and each model has different capabilities and quirks. Their performance can also vary a lot between different languages, frameworks and libraries. The only way to get a grip on all this is to use them and see what they can handle. After using a model for some time, you start to get a sense of the kind of tasks they can handle and how granular those tasks need to be.

Stay mainstream

This is not good advice on its own. But it is one of the best ways to get better output from LLMs. LLMs perform better on popular languages, frameworks and libraries than on lesser known ones. So staying mainstream greatly improves the velocity of LLM-assisted development.

Explain and execute

Asking an LLM to explain how to complete a task is very helpful for larger tasks and for tasks that affect many parts of the codebase. This also gives you the chance to review whether the LLM has understood your request correctly and to correct it if necessary.

Plan and execute

One-shotting, that is getting a model to do a complex task in one go, feels cool but is rarely what you actually want. With one-shotting, you are letting the model make decisions and it can decide differently than you. To have more control over the process, it is better to ask it to create a plan first and then adjust that plan as necessary. Then, depending on its capabilities, you can ask it to execute the entire plan or parts of it.

Reduce the time to prompt

LLM-assisted development can be represented as a loop that goes from prompt to code.

Hand-drawn illustration of a continuous loop between prompt and code

Often, the slowest part of this loop is creation of the prompt as you describe what needs to be done based on the requirements and a review of the existing code. Human comprehension is the limiting factor here and while you can't do much about this, you can take steps to improve how fast you review code and enter the prompt. In my case, I try to do this by using more keyboard shortcuts and command-line interfaces.

Keep pushing the limits

LLM models and LLM tools have come a long way since ChatGPT's release in 2022. You might already have an impression of what they are capable of. But it is very likely that your impression is out of date. So keep playing with them and pushing their limits. Like me, you might be surprised by what they can do.

My experience with "AI"