How to use

mirror of https://github.com/ggml-org/llama.vscode.git synced 2026-05-07 01:15:23 +00:00

Table of Contents

How to use llama-vscode

llama-vscode is an extension for code completion, chat with ai and agentic coding, focused on local model usage with llama.cpp.

Show llama-vscode menu by clicking "llama-vscode" in the status bar or by Ctrl+Shift+M, and select 'Install/upgrade llama.cpp' (sometimes restart is needed to adjust the paths to llama-server)

This will download (only the first time) the models and run llama.cpp servers locally (or use external servers endpoints, depends on env)

For code completion - just start typing (uses completion model)
For edit code with AI - select code, right click and select 'llama-vscode Edit Selected Text with AI' (uses chat model, no tools support required)
For chat with AI (quick questions to (local) AI instead of searching with google) - select 'Chat with AI' from llama.vscode menu (uses chat model, no tools support required, llama.cpp server should run on model endpoint.)
For agentic coding - select 'Show Llama Agent' from llama.vscode menu (or Ctrl+Shift+A) and start typing your questions or requests (uses tools model and embeddings model for some tools, most intelligence needed, local usage supported, but you could also use external, paid providers for better results)

If you want to use llama-vscode only for code completion - you could disable RAG from llama-vscode menu to avoid indexing files.

If you are an existing user - you could continue using llama-vscode as before.

For more details - select 'View Documentation' from llama-vscode menu