Tuning out the AI bro snakeoil - how to use LLMs and stay sane

So this is an article I originally wrote last year for our agency’s internal knowledge base. It’s primarily aimed at junior devs, and I thought I’d share it on my public blog for others to peruse.

To be clear, I use LLMs daily for work and play, whether that’s within Neovim or Zed, locally run Ollama models, or ChatGTP/Gemini web interfaces, so I’m hardly an anti-LLM naysayer. But it’s definitely a breathlessly over-hyped sector.

This article also deals mostly with practical rather than ethical considerations, and is concerned with the realm of coding - topics like bias, consent and copyright and the rationale for, and the moral and economic impact of for-profit corporations attempting to hoover up and repackage human knowledge, artistry and creativity is a whole other minefield.

State of play

AI in general as well as its use in coding has been the subject of reams of hype lately, although we are arguably well into the trough of disillusionment.

There are essentially two main UIs for using AI to help you code

Other notable solutions in this space include Cursor (VS Code with AI baked in), Zed (A rust text editor which now includes Claude and its own Zed AI out of the box), Windsurf (Codeium’s VS Code fork), Phind (an AI powered code search engine) and Ollama (an easy way to run local AI models if your machine/GPU stack is powerful enough).

An emerging trend is the so-called “agentic” mode where LLMs can carry out tasks autonomously. Cursor, Windsurf and Copilot are all exploring this, as well as Devin - an “AI developer team mate” that by many accounts is pretty much useless. These all tend to be fairly impressive on a surface level, but notoriously error prone in practise and currently nowhere near production ready, in my experience at least.

The AI sector is extremely fast moving, so by the time you read this, there will probably be a dozen other popular tools.

While AI can be a useful productivity aid if used judiciously, it does have some quite important limitations. I’ll delve into this and try to highlight the main shortcomings and trade offs.

Limitations

1. Accuracy and Reliability

Whatever the latest silicon valley hypeman wants you to think, AI models are very far from infallible. They can generate code snippets or responses that seem correct at first glance but contain subtle errors. Small mistakes can easily lead to significant bugs or security vulnerabilities.

In fact by some accounts/measurements, LLMs are wrong about 50% of the time. However they are usually right enough to still be useful, as long as you know how to audit their output, and don’t just blindly accept their suggestions.

One of the risks is that users might trust AI suggestions too much, potentially overlooking critical errors that they would otherwise catch during manual coding. So-called agentic solutions particularly encourage this.

The AI’s training data tends to contain outdated or deprecated practices, leading to recommendations that may not adhere to the latest standards or best practices. Just because a solution is popular doesn’t mean it’s any good.

Finally, the code the LLM generates might be technically correct, but truly awful from a stylistic or maintainability perspective. Unless you keep it on a short leash, it has a definite propensity towards being super verbose, repetitive or downright weird.

2. Context Awareness

AI models often struggle with maintaining context over extended interactions. While they are generally ok at generating responses based on the immediate input, they tend to lose track of the broader context in more complex or lengthy sessions or get sidetracked into dead-ends, requiring users to constantly restate instructions or summarise previous interactions. While the so-called “context window” is getting larger and there are certain prompting techniques you can use to improve chat bots’ focus, YMMV.

Furthermore, you might find subtle differences in output from response to response within the same session, which can easily introduce bugs or unforeseen side effects - LLMs can be bit over-eager to suggest changes you didn’t ask for, or might try to “correct” code that is explicitly engineered in a particular way.

For coding extensions, understanding the entire codebase context is challenging. For example they might suggest code that works reasonably well, but does not follow established patterns or integrate at all well with the overall system architecture. Systems like Cursor and Windsurf are supposedly designed to address this, but results can be hit and miss.

If you’re using an extension to help autocomplete repetitive code blocks, it may get the pattern mostly right, but usually introduces errors every few iterations.

3. Security Concerns

Any AI-generated code and responses must be carefully scrutinised to avoid introducing vulnerabilities.

For example they can suggest code vulnerable to injection attacks or other security flaws if not properly vetted.

Users can easily inadvertently share sensitive information with AI services, which can be problematic if they process input on external servers.

4. Creativity and Innovation

AI can assist with routine tasks and provide creative suggestions, but it cannot replace human ingenuity, intuition and innovation.

One common use case is leveraging LLMs for writing text content such as documentation or articles. The problem is that chatbots have a certain overly verbose and bullet-point ridden written style which often feels off and more than a bit… cringe. This means it’s usually quite obvious when someone has been using AI to generate text - it’s much better to write in your own voice than to sound like every smear of SEO slop on the interweb. I find it’s generally more productive to use an LLM to check and finesse your article, rather than the other way around.

There is also a related ethical and legal point which is that LLMs have been trained on human generated material, much of it copyrighted.

Back in the code domain, an over-dependence on AI tools might stifle a developer’s ability to think critically and creatively solve problems for themselves, which hampers learning.

5. Hallucinations

AI models, including those used in coding extensions and chat applications, often produce outputs that are factually incorrect or nonsensical. This phenomenon is known as a “hallucination.”

AI may generate code or responses that seem plausible but are entirely wrong. For instance, it might suggest a non-existent library or completely invented API usage. Other errors might be more subtle, like code that doesn’t compile because it’s missing essential imports or type declarations or has scoping issues. Or arguably harder to spot, they might be riddled with non-obvious domain logic inconsistencies.

Additionally, LLMs usually present absolutely incorrect information with very high confidence, making it difficult for users to discern errors if they are not themselves subject matter experts, or not paying close enough attention.

Best Practices

So how do we mitigate these issues? These are some of the ways I’ve found that enable me to use AI coding tools more effectively.

  • CHECK AND CHECK AGAIN: Always review and validate AI-generated code and responses to ensure accuracy and appropriateness. Regularly audit for potential security vulnerabilities. Treat the code as if it came from an extremely inexperienced junior (albeit one with the sum total of the internet in their digital neurons), or a random stack overflow post. Use it as a starting point, but never trust it absolutely.
  • KEEP LEARNING: Basically, make sure you know what you’re doing. Stay updated with the latest best practices, coding standards, as well as security and ethical guidelines so that you can accurately vet AI suggestions.
  • KISS: Provide clear and concise context when interacting with AI tools to improve the relevance and accuracy of their outputs. The smaller and simpler the task, the more useful the results tend to be.
  • RTFM: Cross-check AI-generated outputs against reliable sources or documentation to ensure their correctness and practical applicability. Run the code! Write tests!
  • BRAINSTORM: One area where chat bots are quite helpful is when we’re gaming out high level solutions. Provide general context and see what it comes up with, but take any specific prescriptions with a generous pinch of salt.

Treat AI as a glorified autocomplete, a clue to create your own code, a semi-sentient search engine, a sort of idiot savant coding partner, and you won’t go too far wrong.

But remember not to rely on it too much, and to always be wary of blindly accepting anything that it churns out. We’re a long way from letting AI write whole codebases, unless you’re happy with an unmaintainable pile of bananas garbage (you might be, I don’t know, but I’m not). Try to tune out the AI bro snakeoil, there’s a lot of it about.

At the end of the day, whatever tools you use to produce it - always remember that you are responsible for the resulting code.

Addendum: Personally, I find LLMs useful for tasks such as writing tests, autocompleting boilerplate, exploring languages or codebases or patterns I’m less familiar with, writing throwaway scripts that I don’t need to maintain, or brainstorming higher level architectural or implementation decisions - but probably less so for writing robust production code. Usually it’s quicker overall for me to just browse documentation and write it myself, rather than reviewing and debugging whatever cranky code the LLM spews out. Yes, code review is a valuable endeavour and skill, but I’d rather spend my time reviewing code by actual humans, where there is some tangible mutual benefit.

Essentially, LLMs are a tool like any other, and I use them in the round, alongside my other tools.