Commit graph

9 commits

Author SHA1 Message Date
Alex619829
e35880e7ac Add code search with zoekt support (#8827)
This PR adds zoekt as a code search engine for forgejo. This Pull Request is a continuation of the discussion #8302.
The meilisearch search engine was not suitable, as it is not designed for searching by code. The zoekt project was proposed instead. Zoekt copes well with code indexing, but its operating principle differs from such search engines as elasticsearch.
While elasticsearch can return a result in a ready-made form (with pagination, ready-made snippets, etc.) and forgejo only needs to show this result in the interface with a little work with the data, zoekt works completely differently.

Zoekt finds matches in the repository index and returns a response. The response contains a line with the search word, its number from the file, and also a context, if specified in the request. This response is not suitable for Forgejo, so you need to assemble it yourself. To assemble the response from Zoekt into a form acceptable for Forgejo, I had to write some code and create a new function `searchZoektResult`, since the existing `searchResult` function is completely unsuitable for this search engine. I also had to write logic for pagination, highlighting, and correct display of lines in found snippets with a match, but this is a feature of Zoekt.
At the moment, Zoekt does not support deleting a repository index by repo_id, it only supports complete deletion of all repositories. But I still implemented the Delete function, which deletes a specific repository by its ID.

Co-authored-by: Aleksandr Gamzin <gamzin@altlinux.org>
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/8827
Reviewed-by: Shiny Nematoda <snematoda@noreply.codeberg.org>
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
2026-05-28 20:52:34 +02:00
Shiny Nematoda
cdc27b0d62 feat: add support to opt-in for fuzzy search (#10378)
The rationale for keeping it behind a flag is due to fuzzy search being computationally intensive #5261
Admins may opt-in by setting the `[indexer].REPO_INDEXER_FUZZY_ENABLED` flag to true.

Closes #10331

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/10378
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
Co-committed-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
2025-12-17 13:51:48 +01:00
chavacava
99d697263f chore(cleanup): replaces unnecessary calls to formatting functions by non-formatting equivalents (#7994)
This PR replaces unnecessary calls to formatting functions (`fmt.Printf`, `fmt.Errorf`, ...) by non-formatting equivalents.
Resolves #7967

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7994
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: chavacava <chavacava@noreply.codeberg.org>
Co-committed-by: chavacava <chavacava@noreply.codeberg.org>
2025-05-29 17:34:29 +02:00
Gusted
2457f5ff22 chore: branding import path (#7337)
- Massive replacement of changing `code.gitea.io/gitea` to `forgejo.org`.
- Resolves forgejo/discussions#258

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/7337
Reviewed-by: Earl Warren <earl-warren@noreply.codeberg.org>
Reviewed-by: Michael Kriese <michael.kriese@gmx.de>
Reviewed-by: Beowulf <beowulf@beocode.eu>
Reviewed-by: Panagiotis "Ivory" Vasilopoulos <git@n0toose.net>
Co-authored-by: Gusted <postmaster@gusted.xyz>
Co-committed-by: Gusted <postmaster@gusted.xyz>
2025-03-27 19:40:14 +00:00
Shiny Nematoda
3816db68aa feat(code search): replace fuzzy search with union search for indexer (#6947)
Fuzzy searching for code has been known to be problematic #5264 and in my personal opinion isn't very useful.

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/6947
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
Co-committed-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
2025-03-11 21:22:51 +00:00
Shiny Nematoda
ee214cb886 feat: filepath filter for code search (#6143)
Added support for searching content in a specific directory or file.

Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/6143
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Reviewed-by: 0ko <0ko@noreply.codeberg.org>
Co-authored-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
Co-committed-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
2024-12-22 12:24:29 +00:00
6543
d5319feb85 Refactor code_indexer to use an SearchOptions struct for PerformSearch (#29724)
similar to how it's already done for the issue_indexer

---
*Sponsored by Kithara Software GmbH*

Conflicts:
	routers/web/repo/search.go
2024-03-18 12:25:05 +00:00
6543
38c3cc4eb7
Patch in exact search for meilisearch (#29671)
meilisearch does not have an search option to contorl fuzzynes per query
right now:
 - https://github.com/meilisearch/meilisearch/issues/1192
 - https://github.com/orgs/meilisearch/discussions/377
 - https://github.com/meilisearch/meilisearch/discussions/1096

so we have to create a workaround by post-filter the search result in
gitea until this is addressed.

For future works I added an option in backend only atm, to enable
fuzzynes for issue indexer too.
And also refactored the code so the fuzzy option is equal in logic to
code indexer

---
*Sponsored by Kithara Software GmbH*
Conflicts:
	routers/web/repo/search.go
	trivial context confict s/isMatch/isFuzzy/
2024-03-11 23:37:00 +07:00
Jason Song
375fd15fbf
Refactor indexer (#25174)
Refactor `modules/indexer` to make it more maintainable. And it can be
easier to support more features. I'm trying to solve some of issue
searching, this is a precursor to making functional changes.

Current supported engines and the index versions:

| engines | issues | code |
| - | - | - |
| db | Just a wrapper for database queries, doesn't need version | - |
| bleve | The version of index is **2** | The version of index is **6**
|
| elasticsearch | The old index has no version, will be treated as
version **0** in this PR | The version of index is **1** |
| meilisearch | The old index has no version, will be treated as version
**0** in this PR | - |


## Changes

### Split

Splited it into mutiple packages

```text
indexer
├── internal
│   ├── bleve
│   ├── db
│   ├── elasticsearch
│   └── meilisearch
├── code
│   ├── bleve
│   ├── elasticsearch
│   └── internal
└── issues
    ├── bleve
    ├── db
    ├── elasticsearch
    ├── internal
    └── meilisearch
```

- `indexer/interanal`: Internal shared package for indexer.
- `indexer/interanal/[engine]`: Internal shared package for each engine
(bleve/db/elasticsearch/meilisearch).
- `indexer/code`: Implementations for code indexer.
- `indexer/code/internal`: Internal shared package for code indexer.
- `indexer/code/[engine]`: Implementation via each engine for code
indexer.
- `indexer/issues`: Implementations for issues indexer.

### Deduplication

- Combine `Init/Ping/Close` for code indexer and issues indexer.
- ~Combine `issues.indexerHolder` and `code.wrappedIndexer` to
`internal.IndexHolder`.~ Remove it, use dummy indexer instead when the
indexer is not ready.
- Duplicate two copies of creating ES clients.
- Duplicate two copies of `indexerID()`.


### Enhancement

- [x] Support index version for elasticsearch issues indexer, the old
index without version will be treated as version 0.
- [x] Fix spell of `elastic_search/ElasticSearch`, it should be
`Elasticsearch`.
- [x] Improve versioning of ES index. We don't need `Aliases`:
- Gitea does't need aliases for "Zero Downtime" because it never delete
old indexes.
- The old code of issues indexer uses the orignal name to create issue
index, so it's tricky to convert it to an alias.
- [x] Support index version for meilisearch issues indexer, the old
index without version will be treated as version 0.
- [x] Do "ping" only when `Ping` has been called, don't ping
periodically and cache the status.
- [x] Support the context parameter whenever possible.
- [x] Fix outdated example config.
- [x] Give up the requeue logic of issues indexer: When indexing fails,
call Ping to check if it was caused by the engine being unavailable, and
only requeue the task if the engine is unavailable.
- It is fragile and tricky, could cause data losing (It did happen when
I was doing some tests for this PR). And it works for ES only.
- Just always requeue the failed task, if it caused by bad data, it's a
bug of Gitea which should be fixed.

---------

Co-authored-by: Giteabot <teabot@gitea.io>
2023-06-23 12:37:56 +00:00