Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
Find a file
2022-01-28 15:55:55 +09:00
.github Using stable in CI as rustc nightly seems broken 2021-12-10 18:45:23 +09:00
benches fix clippy 2021-07-01 17:41:53 +02:00
bitpacker Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
ci Moving queyr grammar to a different crate. (#645) 2019-09-05 09:37:28 +09:00
common Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
doc Rename quickwit-inc -> quickwit-oss 2022-01-27 15:37:09 +09:00
examples Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
fastfield_codecs Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
ownedbytes Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
query-grammar Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
src Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
tests Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
.gitattributes Mark "cpp" folder as linguist-vendored in .gitattributes 2017-03-30 13:43:03 +01:00
.gitignore Remove the broken panic on drop unit test. (#1200) 2021-11-10 18:39:37 +09:00
appveyor.yml Change Footer version handling, Make compression dynamic (#1060) 2021-05-28 14:57:20 +09:00
ARCHITECTURE.md link collector header in introduction section (#1036) 2021-05-17 22:15:48 +09:00
AUTHORS Added an AUTHORS file. Closes #315 (#316) 2018-06-11 22:21:58 +09:00
Cargo.toml Rename quickwit-inc -> quickwit-oss 2022-01-27 15:37:09 +09:00
CHANGELOG.md Rename quickwit-inc -> quickwit-oss 2022-01-27 15:37:09 +09:00
LICENSE Added an AUTHORS file. Closes #315 (#316) 2018-06-11 22:21:58 +09:00
Makefile Moving queyr grammar to a different crate. (#645) 2019-09-05 09:37:28 +09:00
README.md Rename quickwit-inc -> quickwit-oss 2022-01-27 15:37:09 +09:00
run-tests.sh Failrs (#600) 2019-07-22 13:17:21 +09:00
rustfmt.toml Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00

Docs Build Status codecov Join the chat at https://discord.gg/MT27AG5EVE License: MIT Crates.io

Tantivy

Tantivy is a full text search engine library written in Rust.

It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.

Tantivy is, in fact, strongly inspired by Lucene's design.

Benchmark

The following benchmark break downs performance for different type of queries / collection.

Your mileage WILL vary depending on the nature of queries and their load.

Features

  • Full-text search
  • Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese (tantivy-jieba and cang-jie), Japanese (lindera and tantivy-tokenizer-tiny-segmenter) and Korean (lindera + lindera-ko-dic-builder)
  • Fast (check out the 🐎 benchmark 🐎)
  • Tiny startup time (<10ms), perfect for command line tools
  • BM25 scoring (the same as Lucene)
  • Natural query language (e.g. (michael AND jackson) OR "king of pop")
  • Phrase queries search (e.g. "michael jackson")
  • Incremental indexing
  • Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
  • Mmap directory
  • SIMD integer compression when the platform/CPU includes the SSE2 instruction set
  • Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
  • &[u8] fast fields
  • Text, i64, u64, f64, dates, and hierarchical facet fields
  • LZ4 compressed document store
  • Range queries
  • Faceted search
  • Configurable indexing (optional term frequency and position indexing)
  • Cheesy logo with a horse

Non-features

  • Distributed search is out of the scope of Tantivy. That being said, Tantivy is a library upon which one could build a distributed search. Serializable/mergeable collector state for instance, are within the scope of Tantivy.

Getting started

Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.

How can I support this project?

There are many ways to support this project.

  • Use Tantivy and tell us about your experience on Discord or by email (paul.masurel@gmail.com)
  • Report bugs
  • Write a blog post
  • Help with documentation by asking questions or submitting PRs
  • Contribute code (you can join our Discord server)
  • Talk about Tantivy around you

Contributing code

We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.

Clone and build locally

Tantivy compiles on stable Rust but requires Rust >= 1.27. To check out and run tests, you can simply run:

    git clone https://github.com/quickwit-oss/tantivy.git
    cd tantivy
    cargo build

Run tests

Some tests will not run with just cargo test because of fail-rs. To run the tests exhaustively, run ./run-tests.sh.

Debug

You might find it useful to step through the programme with a debugger.

A failing test

Make sure you haven't run cargo clean after the most recent cargo test or cargo build to guarantee that the target/ directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under rust-gdb:

find target/debug/ -maxdepth 1 -executable -type f -name "tantivy*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT_DBG_TANTIVY rust-gdb RECENT_DBG_TANTIVY

Now that you are in rust-gdb, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to cargo test like this:

$gdb run --test-threads 1 --test $NAME_OF_TEST

An example

By default, rustc compiles everything in the examples/ directory in debug mode. This makes it easy for you to make examples to reproduce bugs:

rust-gdb target/debug/examples/$EXAMPLE_NAME
$ gdb run