- File menu.ts refactored
- Predefined lists added for completion models, chat models, embeddings models, tools models and for envs
- Bugfixes
- If chat model is not selected, but a tools model is selected, it is used for generating commit messages, editing code with AI and in search_source tool
- xAI Grog4 free (from OpenRouter) added to the initial models
- Chat with AI with project context removed (agent does it better)
- Chat with AI about llama-vscode is now with agent, not using webui
- Agent - new buttons "Tools Model" and "Agent" - possibility to view the selected model and agent and to change them.
- Chat with AI with project context removed (agent does it better)
- Chat with AI about llama-vscode is now with agent, not using webui
- Agent - new buttons "Tools Model" and "Agent" - possibility to view the selected model and agent and to change them.
- xAI Grog4 free (from OpenRouter) added to the initial models
- Added rules - setting agent_rules or llama-vscode-rules.md
- Added agent commands - setting agent_commands/llama-vscode menu "agent commands...". (shortcut for often used prompts, in agent - press "/" and select agent command).
- Generate commit message now checks if there is a running chat model (or endpoint_chat)
- In Agent UI the requests the tokens are shown immediately, no when the complete response is received
- Bug fixes for Edit with AI
- tools_custom and context_custom settings are added
- -fa option is removed from huggingface download command
- Add model menu command is replaced with two Add local model and Add external model
Setting ask_install_llamacpp added to control if llama-vscode should ask the user to install llama.cpp
Setting upgrade_llamacpp_hours added to control how often llama-vscode should ask the user to upgrade llama.cpp
If the user cancels the llama.cpp installation on startup - llama-vscode suggests to disable the future popups for installation
If the user cancels the llama.cpp upgrade on startup - llama-vscode suggests to disable the future popups for upgrade (sets upgrade_llamacpp_hours to more than 8 years)
- Changes history added
- Chats could be selected, deleted, exported, imported
- llama-vscode UI (agent) is shown in a separate view now, not as part of Explorer view.
- Agent entity added - agents with different system prompts and default tools could be selected
- Fixed showing tables in llama agent
- Local envs with gpt-oss 20B added (also available for import from here )
* Increase the space for llama agent,
* fix a bug for showing llama-agent.
* Update the documentation for llama-vscode
* Envs with local gpt-oss for agent removed
Llama Agent UI improved - look and feel, statuc, etc.
New menus for managing completion models, chat models, embeddings models and tools models
Concept of selected models - for completion, chat, embeddings and tools
Orchestra concept introduced. Orchestra is a group of models. Starting(selecting)/stopping orchestra starts(selects)/stops all the models
Import/Export orchestra and models from/to file implemented
OpenAI gpt-oss 20B added as a local one in tools models and chat models
Predefined Orchestras for different use cases - only completion, chat + completion, chat + agent, etc.
- Llama Agent UI in Explorer view
- OpenRouter API model selection (assumes your OpenRauter key is in setting Api_key_tools)
- MCP Support
- 9 internal tools available for use
- custom_tool - returns the content of a file or a web page
- custom_eval_tool - write your own tool in Typescript/javascript
- Attach the selection to the context
- Configure maximum loops for Llama Agent
* feat: enhance text editor functionality
- Added methods to expand selection to full lines
- Implemented functions to remove leading spaces from text
- Added functions to add leading spaces to text
* fix: don't send requests for updating the context if the completions are disabled
* refactor: remove unused code and optimize performance
- Removed duplicate code and optimized performance in `architect.ts` and `text-editor.ts`
* feat: add RAG configuration option `rag_max_files` to limit the number of files indexed for RAG search
* fix: update cosine similarity logic
- Updated cosine similarity function to use chunk.embedding instead of getting embedding again
- Fixed edge case where chunk.embedding is empty
* feat: update menu items
- Added "Start all models" item with description
* feat: update chat edit text prompt
- Improve formatting for instructions and original text
- Remove redundant chunks section
- Navigate to the first difference after opening diff panel
* feat: update configuration options for llama.cpp server API keys
- Added `llama-vscode.api_key_chat` and `llama-vscode.api_key_embeddings` configuration options
- Updated `llama-vscode.api_key` to use new key names
- Edit with AI - don't send chunks, navigate to the first change in the diff panel
* bug: update API key configuration
- Updated API key configuration for chat and embeddings endpoints
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
* feat: add commit message generation feature
- Implemented a new command to generate git commit messages using AI
- Added a new prompt template for generating commit messages
- Integrated the feature with the VS Code Git extension

* fix: don't add --lora option if lora_completion or lora_chat is undefined.
* fix: add a check for lora_completion and lora_chat settings not to be string "undefined" before using them.
* Add RAG search for Ask With AI with project conext
* Remove duplicated call for getting context.
* Chat with project supports providing files as context with @ prefix (i.e. @test.cpp)
* Reindex files if rag settings are changed
* Add menu item for starting embedding server on mac.
* Improve excuding the files from .gitignore; reduce the memory usage by BM25 algorith.
* Update file chunks on save improvement, progress bar for calculating embeddings for RAG. search.
* Add prefix llama-vscode for the shortcut commands. This way it is easier to filter them.
* Removed senidng extra context chunks to the chat server. Show error in case of problem with embeddings server. If embeddings server endpoint is not available - shows message and uses only BM25 filtering.
* Typing error fix in translations
* style : fix whitespaces + disable extra context for chat edit
* config : adjust params
* menu : fix embedding commands
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Edit a selected text with a command - Ctrl+Shift+e for entering prompt, Tab for accepting the change.
* Code edits - reject the suggestion by pressing Escape
* Prompt for text edit is improved, context menu item for editing selected text added.
* Remove the context from the edit prompt as the output includes part of it; In the diff window show 25 lines before and after the change to facilitate comparison.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Training properties are added - launch_training_completion and launch_training_chat for commands for starting training the models.
Also properties lora_completion and lora_chat were added for the location of the .gguf lora files. If not empty - options for lora adapters are added on starting the server from properties launch_completion and launch_chat.
The chat with UI for now can't use Lora adapter. This will require a change in webui.
* Implement ask ai - open ai chat window from local model with or without project context - from menu or with Ctrl+; or Ctrl+Shift+; (with project context). New property endpoint_chat for the chat server endpoint.
* Add menu items for showing ask AI windows
* Reduce menu items - remove those with 0.5B models
* Remove default value for self-sert file
* various fixes
* Rename ask-ai to chat-with-ai, clear the sent chunks to ai in case the window is closed.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* initial openai compatible api endpoint integration
* fix watch
* added openAiClientModel to config; tested with local vllm server
* fixed config and completions to work with FIM models by default
* remove unnecessary try catch
* core : remove repeating suffix of a suggestion + fix speculative FIM (#18)
* Remove repeating suffix of a suggestion
* If linesuffix is empty - cut the repeating suffix of the suggestion.
* If there is a linesuffix, suggest only one line, don't make hidden second request
* Fix the caching of the future suggestion in case of max inputPrefix length.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
* core : disable trimming of suggestions
* release : v0.0.6
* readme : add CPU-only configs
* fixed configuration/settings UI
* fixed conflicts
* fix watch
* fixed
* fixes
* update version
* readme : add example
* core : fix cutting the lines of a suggestion (#22)
* Fix the problem with cutting the lines of a suggestion after the first one.
* Remove the less important checks on cutting the suggestion.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
* Fix manual trigger without cache + accept always on pressing a Tab (#25)
* Ensure Ctrl+Shift+L always makes a new request to the servers.
* If a suggestion is visible - pressing a Tab always accepts it.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
* fixed conflicts
* fix watch
* fixed
* fixes
* initial openai compatible api endpoint integration
* added openAiClientModel to config; tested with local vllm server
* fixed config and completions to work with FIM models by default
* fixed
* make api key optional for openai compatible endpoints as well
* updated to work with llama.cpp without api key
* removed this.handleOpenAICompletion() call from prepareLlamaForNextCompletion per @igardev
* updated package-lock.json after build
---------
Co-authored-by: igardev <49397134+igardev@users.noreply.github.com>
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Ensure Ctrl+Shift+L always makes a new request to the servers.
* If a suggestion is visible - pressing a Tab always accepts it.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
* Fix the problem with cutting the lines of a suggestion after the first one.
* Remove the less important checks on cutting the suggestion.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>