As part of developing Glama, I try to stay at the cutting edge of everything AI, especially when it comes to LLM-enabled development. I've tried GitHub Copilot, Supermaven, and many other AI code completion tools. However, earlier this week I gave a try to locally hosted LLMs and I am not coming back.

Setup

These instructions assume that you are a macOS user.

The setup takes no more than a few minutes.

Download and install Ollama.

What about LM Studio? I saw a few posts debate one over the other. LM Studio has intuitive UI; Ollama does not. However, my research led me to belief that Ollama is faster than LM Studio.

Install the model that you want to use.

ollama pull starcoder2:3b

I've evaluated a few and landed on starcoder2:3b. It provides a good balance of usefuless and interference speed.

For context, the following table shows the speed of each model.

Model	tokens/second
`starcoder2:3b`	99
`llama3.1:8b`	54
`codestral:22b`	21

Finally, install a continue.dev – a VSCode extension that enables tab completion (and chat) using local LLMs.

Then update continue.dev settings to use the desired model.

{
  "models": [
    {
      "title": "Starcoder2",
      "provider": "ollama",
      "model": "starcoder2:3b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Starcoder2",
    "provider": "ollama",
    "model": "starcoder2:3b"
  }
}

Restart VSCode and you should be good to go.

Ensure that you've disabled GitHub Copilot and other overlapping VSCode extensions.

Pros and Cons

Pros

Offline Availability: Work anywhere without relying on an internet connection.
Privacy: Your code and prompts never leave your machine, ensuring maximum data privacy.
Customization: Ability to fine-tune models to your specific needs or codebase.
No Subscription Costs: Once set up, there are no ongoing fees unlike many cloud-based services.
Consistent Performance: No latency issues due to poor internet connection or server load.
Open Source: Many local LLMs are open-source, allowing for community improvements and transparency.

Cons

Initial Setup Time: Requires some time and technical knowledge to set up properly.
Hardware Requirements: Local LLMs can be resource-intensive, requiring a reasonably powerful machine.
Limited Model Size: Typically, local models are smaller than their cloud-based counterparts, which might affect performance for some tasks.
Manual Updates: You need to manually update models and tools to get the latest improvements.

Closing Thoughts

I was hesitant to adopt local LLMs because services like GitHub Copilot "just work." However, as I've been traveling the world, I found myself often regretting having to depend on Internet connection for my auto completions. In that sense, switching to a local model has been a huge win for me. If Internet connectivity was not issue, I think services like Supermaven are still very appealing and worth the cost.

If you are not familiar with Supermaven and if you are Okay with depending on Internet connection, then it's worth checking out. Compared to GitHub Copilot, I found Supermaven's auto completion to be much more reliable and much faster.

However, if you are like me and want your code completion to work with or without an Internet connection, then this is definitely worth a try.

Replacing GitHub Copilot with Local LLMs

Setup

Pros and Cons

Pros

Cons

Closing Thoughts