- Setting max_parallel_completions determines how many completions to generate in parallel (default 3)
- Shortcuts - Alt+] - next completion, Alt+[ - previous completion
- Requires llama.cpp after December, 6, 2025 (commit c42712b) but is backword compatible (generates one completion for older versions)
- Skills (https://agentskills.io/home) could be now parsed by the LLM and added in the prompt
- skills_folder setting determines where are skills descriptions. If empty the <project_folder>/skills folder is used by default
- Anthropic models support skills best. I guess, the open source models will catch up.
* menu.ts is refactored - services classes are extracted
* - Refactor menu.ts model - extract services
- agent "Ask" added for questions about the project without changing the files
- predefiled free models from OpenRouter added (and xAi removed as not free anymore)
- Some bugs fixed
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
- Changes history added
- Chats could be selected, deleted, exported, imported
- llama-vscode UI (agent) is shown in a separate view now, not as part of Explorer view.
- Agent entity added - agents with different system prompts and default tools could be selected
- Fixed showing tables in llama agent
- Local envs with gpt-oss 20B added (also available for import from here )
Llama Agent UI improved - look and feel, statuc, etc.
New menus for managing completion models, chat models, embeddings models and tools models
Concept of selected models - for completion, chat, embeddings and tools
Orchestra concept introduced. Orchestra is a group of models. Starting(selecting)/stopping orchestra starts(selects)/stops all the models
Import/Export orchestra and models from/to file implemented
OpenAI gpt-oss 20B added as a local one in tools models and chat models
Predefined Orchestras for different use cases - only completion, chat + completion, chat + agent, etc.
- Llama Agent UI in Explorer view
- OpenRouter API model selection (assumes your OpenRauter key is in setting Api_key_tools)
- MCP Support
- 9 internal tools available for use
- custom_tool - returns the content of a file or a web page
- custom_eval_tool - write your own tool in Typescript/javascript
- Attach the selection to the context
- Configure maximum loops for Llama Agent
* feat: add commit message generation feature
- Implemented a new command to generate git commit messages using AI
- Added a new prompt template for generating commit messages
- Integrated the feature with the VS Code Git extension

* Add RAG search for Ask With AI with project conext
* Remove duplicated call for getting context.
* Chat with project supports providing files as context with @ prefix (i.e. @test.cpp)
* Reindex files if rag settings are changed
* Add menu item for starting embedding server on mac.
* Improve excuding the files from .gitignore; reduce the memory usage by BM25 algorith.
* Update file chunks on save improvement, progress bar for calculating embeddings for RAG. search.
* Add prefix llama-vscode for the shortcut commands. This way it is easier to filter them.
* Removed senidng extra context chunks to the chat server. Show error in case of problem with embeddings server. If embeddings server endpoint is not available - shows message and uses only BM25 filtering.
* Typing error fix in translations
* style : fix whitespaces + disable extra context for chat edit
* config : adjust params
* menu : fix embedding commands
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* initial openai compatible api endpoint integration
* fix watch
* added openAiClientModel to config; tested with local vllm server
* fixed config and completions to work with FIM models by default
* remove unnecessary try catch
* core : remove repeating suffix of a suggestion + fix speculative FIM (#18)
* Remove repeating suffix of a suggestion
* If linesuffix is empty - cut the repeating suffix of the suggestion.
* If there is a linesuffix, suggest only one line, don't make hidden second request
* Fix the caching of the future suggestion in case of max inputPrefix length.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
* core : disable trimming of suggestions
* release : v0.0.6
* readme : add CPU-only configs
* fixed configuration/settings UI
* fixed conflicts
* fix watch
* fixed
* fixes
* update version
* readme : add example
* core : fix cutting the lines of a suggestion (#22)
* Fix the problem with cutting the lines of a suggestion after the first one.
* Remove the less important checks on cutting the suggestion.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
* Fix manual trigger without cache + accept always on pressing a Tab (#25)
* Ensure Ctrl+Shift+L always makes a new request to the servers.
* If a suggestion is visible - pressing a Tab always accepts it.
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
* fixed conflicts
* fix watch
* fixed
* fixes
* initial openai compatible api endpoint integration
* added openAiClientModel to config; tested with local vllm server
* fixed config and completions to work with FIM models by default
* fixed
* make api key optional for openai compatible endpoints as well
* updated to work with llama.cpp without api key
* removed this.handleOpenAICompletion() call from prepareLlamaForNextCompletion per @igardev
* updated package-lock.json after build
---------
Co-authored-by: igardev <49397134+igardev@users.noreply.github.com>
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* First draft version of llama.vscode plugin
* Update the instructions how to run llama.cpp server on Mac in Readme file. Removed not needed imports and not used variables.
* Fix the problem with not sending extra context.
* Reduce the number of requests on fast typing or deleting. Fix error on suggestion when curson on last line.
* Reduce last completion if the typed chars are the first chars of it
* Small fixes and improvements
- Next word in case of end of line should be the first word of the next line
- Similar for next line
- Avoid sendind requests on accepting next word or next line.
* Fix the problem with wrong prompt sending on removing chars with backspace.
No cashing if the suggestion is empty.
nindent parameter added in the request to llama-server
* Revert the publisher name change as it results in error on creating the installation file
* -ctrl+shift+l forces trigerring a request (no cache)
- Status message improved
- other minor fixes
* - n_indent added in the request to llama server
- ctrl+alt+c - copy chunks in the clipboard
* - n_indent parameter is now correctly sent to llama server
- Search in cache extended - now searches for match, which partially or completly includes prompt
- Messages in the status bar are now short (for users, not for developers)
- Setting for choosing language for the status bar messages is added.
* - n_indent parameter is now correctly sent to llama server
- Search in cache extended - now searches for match, which partially or completly includes prompt
- Messages in the status bar are now short (for users, not for developers)
- Setting for choosing language for the status bar messages is added.
* Readme file updated and other small refactorings.
* Run async slow pick chunks operations
* Fix error on search in cache.
Don't send a request if one is still running.
* Fix error on ignoring the result of the last request in some situations.
* Change the key assignment:
Copy chunks and cache: Ctrl+Shift+,
Accept next word: Ctrl+right arrow
* - Improve the time calculation for status bar
- Show additional info only if show_info is true. Show basic info (thinkg..., no suggestion + time) always
* Fix the error on accepting a line when the cursor is at the last line.
* Minor whitespace clean-up
* More whitespaces
* Update readme
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>