Local ai runner

mirror of https://github.com/ggml-org/llama.vscode.git synced 2026-05-07 01:15:23 +00:00

Table of Contents

Use as local AI runner (as LM Studio, Ollama, etc.)

Overview
How to use it

Use as local AI runner (as LM Studio, Ollama, etc.)

Overview

llama-vscode could be used as a local AI runner (as LM Studio, Ollama, etc.) . Models are searched in Huggingface. After a model is selected, llama-vscode automatically downloads it and starts a llama-server with it. With this the user could start chatting with an AI.

How to use it

From llama-vscode menu select "Use as local AI runner" - llama view will be opened with buttons "llama.cpp", "Add", "Select", "Chat".
Click "llama.cpp" button to install/upgrade llama.cpp (if not yet done). The installation for Windows (with winget) and Mac (with brew) is automatic. For Linux, the user should do it manually (download the latest llama.cpp package for Linux and add the bin folder to the PATH)
Click "Add" button, enter search words to see a list of models from Huggingface, select a model, select quantization. If prefered - accept to start the model immediately. (not needed if the model is already added)
Click "Select" button and select a model to run (not needed if the model is already started in the previous step)
Click "Chat" button - a web page for chat with AI will be shown in VS Code

Enjoy talking with local AI.

https://github.com/user-attachments/assets/e75e96de-878b-43db-a45b-47cc0c554697