Commit graph

2,618 commits

Author SHA1 Message Date
Pascal Seitz
1e859fd78d fix term aggregation u32::MAX overflow issue 2026-06-18 17:07:43 +08:00
Pascal Seitz
f451fa938f explain why naive scorer must accumulate scores in WAND order 2026-06-17 18:58:58 +08:00
Pascal Seitz
2a82dd6f64 fix flaky test 2026-06-17 18:58:58 +08:00
Pascal Seitz
c096b2ad89 aggregation/terms: charge fused term_counts to the memory limit
term_counts (one u32/term) was allocated but not charged to
AggregationLimitsGuard, so a memory limit could be exceeded silently.
Charge it, skip allocating it when unbounded, and add a regression test.
2026-06-16 21:23:23 +08:00
Pascal Seitz
ac7a3d347c add comment, hoist variables 2026-06-16 21:23:23 +08:00
Pascal Seitz
03520a0719 add top level comment 2026-06-16 21:23:23 +08:00
Pascal Seitz
86a4c47bed merge loops, histo with bounds may benefit from single vec opt 2026-06-16 21:23:23 +08:00
Pascal Seitz
3ca510dff0 aggregation/terms: tidy fused term×histogram grid construction
Rename the value threaded through build_segment_term_collector and
maybe_build_collector from max_term_id to col_max_val/max_column_val — it
is the column's max value, only later reused as the max term id. Make the
grid-size arithmetic overflow-/zero-safe (saturating_add, checked_div).
2026-06-16 21:23:23 +08:00
Pascal Seitz
3cb400c300 clarify counts/term_counts field docs
Spell out that `counts` is the flattened per-term × time-bucket grid (each
term's own contiguous slice) and that `term_counts` is only needed when the
per-term total can't be derived from that grid (i.e. with hard bounds).
2026-06-16 21:23:23 +08:00
Pascal Seitz
ef13489d63 skip hard_bounds that can't exclude any value
When a histogram's hard_bounds are wider than the column's value range, the
per-doc `bounds.contains` check can never fail. Collapse such bounds to the
unbounded sentinel in `normalize_histogram_req`, so both the general histogram
hot loop and the fused term×histogram path skip the check — the latter then
derives per-term counts from the grid (the ~17% win) instead of falling back to
per-doc counting just because `bounds != [MIN, MAX]`.

Only the collect-time filter is affected: empty-bucket emission reads
`req.hard_bounds` directly, and hard_bounds only ever clips that range, so a
wider-than-data bound leaves results unchanged. Covered by new tests on the
general and fused paths, including mid-interval (bucket-splitting) bounds.

Also tighten the fused-path u32-overflow guard to bound on `num_vals()` (the
per-value increment count) rather than `num_docs()`, and document why the fused
collector's hot-loop fields are hoisted into locals (re-reading them from memory
each iteration measured ~15% slower).
2026-06-16 21:23:23 +08:00
Pascal Seitz
9f7aea4765 derive term counts 2026-06-16 21:23:23 +08:00
Pascal Seitz
2c8536ab11 add specialized TermHistogram 2026-06-16 21:23:23 +08:00
Pascal Seitz
05f4c02ac5 add dense histogram, optional sub-buckets 2026-06-16 21:23:23 +08:00
Pascal Seitz
d137779219 add no sub-gg fastpath 2026-06-16 21:23:23 +08:00
Pascal Seitz
8f9846ac80 use get_range when possible 2026-06-16 21:23:23 +08:00
Mohammad Dashti
799f7b4646 Built SUM final result in each branch directly.
Keeps the empty-bucket coercion visible at the boundary instead of a
shared binding, following the reviewer's suggested shape.
2026-06-16 03:10:30 +08:00
Mohammad Dashti
fc88d80726 docs: drop downstream-specific name from none_if_no_match doc
The flag's purpose is described well enough by "SQL-style consumers";
no need to call out a specific downstream.
2026-06-16 03:10:30 +08:00
Mohammad Dashti
6a684e7c38 feat: opt-in none_if_no_match flag on SumAggregation for SQL-style null
Switch the default serialized output of `sum` on empty / all-missing
buckets back to `"value": 0` to match Elasticsearch, and gate the
SQL-style `"value": null` behavior behind a new
`none_if_no_match: Option<bool>` flag on `SumAggregation`.

`IntermediateSum::finalize` still returns `Option<f64>` internally so
the Rust API stays parallel to min/max/avg, but the ES-vs-SQL choice is
made at the boundary in `IntermediateMetricResult::into_final_metric_result`:
`None` is coerced to `Some(0.0)` unless `none_if_no_match` is set on the
aggregation request.

Adds `AggregationVariants::as_sum()` accessor for that boundary check
and two end-to-end tests covering both the default ES behavior and the
opt-in null behavior on an empty index.
2026-06-16 03:10:30 +08:00
Mohammad Dashti
94fe52cc67 docs: clarify SUM finalize returning None diverges from Elasticsearch
Surface the trade-off in the doc comment so future reviewers see why
this differs from ES (which returns "value": 0 for sum over
empty/all-missing buckets) and what consumers (ParadeDB SQL NULL) the
None variant is meant to serve.
2026-06-16 03:10:30 +08:00
Mohammad Dashti
2ff39f6f7f fix: return None from SUM when no values were collected
IntermediateSum::finalize() returned Some(0.0) even when count==0
(all documents had missing/NULL values). This differs from MIN, MAX,
and AVG which all return None for count==0.

The 0.0 came from IntermediateStats' default sum initialization.
Consumers (like ParadeDB) that map None to SQL NULL were incorrectly
getting 0 for SUM on all-NULL groups.

Fixes paradedb/paradedb#4621
2026-06-16 03:10:30 +08:00
Windforce17
1d06328cb3 Add BlockSegmentPostings::rank() for skip-list-based positional counting
Add a public rank(target) method on BlockSegmentPostings that returns the
number of docs with a doc id strictly smaller than target. It jumps to the
candidate block through the skip list and decodes a single block, so the cost
is O(skip-list entries) + one block decode rather than O(doc_freq).

This is a useful primitive for range counting over a posting list (e.g. number
of matches in a [lo, hi) doc-id window) without iterating every matched doc.

To support it, expose SkipReader::remaining_docs() (pub(crate)). Like seek(),
rank() advances the cursor forward only and must be called with non-decreasing,
valid (<= TERMINATED) targets. Adds a unit test covering multi-block lists and
the below-first / above-last / empty edge cases.
2026-06-15 18:56:49 +08:00
Pascal Seitz
b19f0ddc77 fix clippy 2026-06-09 23:14:12 +08:00
Pascal Seitz
b4acfcf881 cleanup AggregationsSegmentCtx
The metric/cardinality/histogram _mut getters had no callers needing
mutation; their two uses already pass the resulting reference as &T.

simplify req_data ownership: clone into collectors, Rc only for filter BitSet

Replace Vec<Option<Box<T>>> + take/put-back round-trip with Vec<T> +
direct clone into collector. Collectors now own their per-segment
request data outright, removing the borrow-checker dance that the
take/put-back pattern existed to satisfy.

The structural clones are cheap (Column<u64> is Arc-internal) except
for the filter aggregation, whose DocumentQueryEvaluator carries a
precomputed per-segment BitSet sized by max_doc. Wrap that in
Rc<DocumentQueryEvaluator> so FilterAggReqData::clone() bumps a
refcount instead of duplicating the BitSet. Move SegmentFilterCollector's
matching_docs_buffer out of FilterAggReqData so its pre-allocated
capacity is preserved per collector instead of being lost on every clone.
2026-06-09 23:14:12 +08:00
Kanishk Sachan
70a8e56ee5 test(postings): add unit tests for TermFrequencyRecorder
Closes #2285

The TermFrequencyRecorder was completely untested. Add five focused tests:

- term_frequency_recorder_has_term_freq: verifies the recorder
  correctly advertises term-frequency support via has_term_freq()
- term_frequency_recorder_zero_docs: term_doc_freq() returns Some(0)
  before any documents are recorded
- term_frequency_recorder_term_doc_freq_single_doc: one document with
  two occurrences yields term_doc_freq() == Some(1)
- term_frequency_recorder_term_doc_freq_multiple_docs: three documents
  with varying term frequencies yield term_doc_freq() == Some(3),
  confirming the count tracks documents, not occurrences
- term_frequency_recorder_single_occurrence_per_doc: each of three
  documents has exactly one occurrence
- term_frequency_recorder_high_frequency_doc: a single document with
  1000 occurrences still yields term_doc_freq() == Some(1)
2026-06-06 14:44:51 +08:00
Paul Masurel
a27c64998f
Cargo clippy fix (#2943)
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
2026-06-01 14:39:44 +02:00
Paul Masurel
46b3fb9ed3
Relying on upstream version of datasketch and stop using HLL 4. (#2936)
We were relying on a fork for:

a bugfix in LIST serialization
a better API exposing a new Coupon type, required for caching coupons.
We also stop using HLL8 in hope to fix
https://datadoghq.atlassian.net/browse/CLOUDPREM-625

Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
2026-05-19 13:29:35 +02:00
Mohammad Dashti
d99a5d4e91 Rename validate_aggregation_fields to validate_aggregation_fields_exist
Applies @PSeitz's review suggestion to make the function name more
descriptive of what it checks. Also adds a doc note clarifying why
validation is opt-in rather than enforced by default.
2026-05-16 15:45:20 +08:00
Mohammad Dashti
2de6f075ce Fixed the example 2026-05-16 15:45:20 +08:00
Mohammad Dashti
18080067c7 Applied PR comment:
I would move it outside of the aggregation. You can fetch the fields from the aggregation request and do a validation in a helper function
2026-05-16 15:45:20 +08:00
Mohammad Dashti
95db7d2e5c Revert "Revert all impl."
This reverts commit d5e0991549a05bf80f19f853f7689ad69f96e7e5.
2026-05-16 15:45:20 +08:00
Mohammad Dashti
fc017c4c74 Applied PR comments. 2026-05-16 15:45:20 +08:00
Mohammad Dashti
141c91d028 Added a flag: strict_validation 2026-05-16 15:45:20 +08:00
Mohammad Dashti
36a83e7c1a Fixed agg validation 2026-05-16 15:45:20 +08:00
jinhelin
be11f8a6a1 Fix opening positions file error 2026-05-14 15:55:59 +08:00
Pascal Seitz
edfb02b47e switch to enum, fix mixed types for cardinality agg 2026-05-05 16:39:51 +08:00
Pascal Seitz
d0fad88bac use bitsets for card agg 2026-05-05 16:39:51 +08:00
James Sewell
4480cf0a98
Enable BMW for single-scorer boolean queries by removing early return in scorer_union (#2915)
The early return for `scorers.len() == 1` in `scorer_union` short-circuits a single TermScorer into `SpecializedScorer::Other`, bypassing the `TermUnion` path that enables block-max WAND (BMW) in `for_each_pruning`.

This was originally addressed in PR #2898 (backed out), which added a special case in `BooleanWeight::for_each_pruning`. PR #2912 (merged as d27ca164a) added a single-scorer fast path inside `block_wand` itself, but did not remove this early return — so a single SHOULD TermScorer still never reaches the BMW path.

Removing the early return lets a single TermScorer with freq reading flow through to `SpecializedScorer::TermUnion`, where `block_wand` → `block_wand_single_scorer` handles it efficiently.
2026-04-28 14:49:53 -07:00
Pascal Seitz
d47abdf104 early cut off for order by sub agg in term agg 2026-04-28 16:59:59 +02:00
trinity-1686a
ca139d8eb1
Merge pull request #2910 from quickwit-oss/abdul.andha/composite-agg-after
Composite aggregations: send after key on last page
2026-04-27 23:38:52 +02:00
Abdul Andha
ac508108aa
address pr comment 2026-04-27 12:39:38 -04:00
Paul Masurel
63da5a21b2
Optimizing top K using Adrien Grand's ideas (#2865)
* Optimizing top K using Adrien Grand's ideas

https://jpountz.github.io/2025/08/28/compiled-vs-vectorized-search-engine-edition.html

* Suffix-sum pruning for multi-term intersection candidates

After scoring each secondary in Phase 2, check whether remaining
secondaries' block_max scores can still beat the threshold. Skip
to the next candidate early if impossible, avoiding expensive seeks
into later secondaries.

Improves three-term intersection by ~8% on the balanced benchmark
while keeping two-term performance neutral.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Claude CR comment

* Removed 16 term scorer limit.

---------

Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 12:14:40 +02:00
lif
54cd5bba98
fix: skip sentinel facet ords in harvest to prevent wrong root (#2867)
When a document has the exact registered facet path (not a child),
compute_collapse_mapping_one maps it to a sentinel (u64::MAX, 0).
Without filtering, harvest() passes u64::MAX to ord_to_term which
resolves to the last dictionary entry, producing a spurious facet
from an unrelated branch.

Skip entries where facet_ord == u64::MAX in harvest().

Closes #2494

Signed-off-by: majiayu000 <1835304752@qq.com>
2026-04-25 22:23:30 +02:00
Paul Masurel
d27ca164a9
block_wand: use single-scorer path when there is only one scorer 2026-04-25 16:35:00 +02:00
James Sewell
322286ee16
Tighen Block-Max in single-scorer (#2897)
In the Block-Max WAND single-scorer, it uses block_max_score() < threshold,
whereas the multi-term one uses  block_max_score_upperbound <= threshold.

As both of these are guarded later on with if score > threshold we can
use the more efficent form in single-scorer.

Single-scorer block skip (<, should be <=): https://github.com/quickwit-oss/tantivy/blob/main/src/query/boolean_query/block_wand.rs#L231
Multi-scorer block skip (already <=): https://github.com/quickwit-oss/tantivy/blob/main/src/query/boolean_query/block_wand.rs#L179
Single-scorer per-doc guard (>): https://github.com/quickwit-oss/tantivy/blob/main/src/query/boolean_query/block_wand.rs#L246
Multi-scorer per-doc guard (>): https://github.com/quickwit-oss/tantivy/blob/main/src/query/boolean_query/block_wand.rs#L206

This will improve performance when there are many identical scores.
2026-04-25 14:13:07 +02:00
RJ Barman
73ad18fa1e
fix: Add space for missing sentinel in allowed bitset when a missing key is provided (#119) (#2907)
## Bug Overview
Under certain conditions, a `terms` aggregation request can cause a
bounds-check panic. Those conditions are:
- The queried field must be a text field
- There must be a segment where the number of distinct terms in it's
dictionary for the queried field is divisible by 64 (i.e.e where
`count(term_dict.keys) % 64 == 0`)
- That same segment must contain at least one document that does not
contain this field.
- The request contain a `missing` key that is a string.
- The request must contain an `include` or `exclude` filter.
For example:
```json
{
    "my_bool": {
        "terms": {
            "field": "title",
            "include": "foo",
            "missing": "__NULL__",
        }
    }
}
```
Check out the added tests in `src/aggregation/bucket/term_agg.rs` to see
this in action

## How the bug happens
### Preparation
While preparing the aggregation nodes:
1) When we've provided a `missing` key, we derive a missing sentinel.
For string keys this column's max value (which for string keys is always
the number of terms in this segment) + 1.
2) for string columns only, we optionally prep an "allowed" `BitSet` for
allowed term ids. (`build_allowed_term_ids_for_str` in
`src/aggregation/agg_data.rs`)
- If no `include` or `exclude` filter is provided, this just returns
`None`, causing this check to be skipped down the line
- Otherwise the bitset is initialized to be able to hold the exact
number of terms in the segments term dictionary, and the bits are set to
signify which terms are to be included in the results.

### Collection
If we have an "allowed" `BitSet`, filter documents against that. For
each document, we check if the `BitSet` contains the documents term id.
For documents without the field, this is the missing sentinel we derived
earlier, minus 1 (to account for zero-based indexing): `(num_terms + 1)
- 1`.However, the `BitSet`s size is only `num_terms`. Normally, this
slips by without a problem, but if `num_terms % 64 == 0`, this will
cause a panic.

### Why `BitSet` panics
`BitSet` is represented under the hood by a boxed slice of `u64`s. When
you go to check a bit using `BitSet::contains`, it must determine which
of those `u64`s the bit is in, and then the position within that `u64`
of the bit.

In cases where the number of terms is not divisible by 64, the `BitSet`
must waste some bits. When we then look up the missing sentinel's bit,
it happens to be one of those wasted bits, for which `BitSet` is happy
to return the value of. For example, if the number of terms was 63:
```rust
let bitset_init_size = 63; // so BitSet's boxed slice has a length of 1, capable of holding 64 bits, term id [0, 62]
let missing_sentinel = 63; // num_terms + 1 - 1;
let byte_pos = missing_sentinel / 64; // 0 - within the valid slice
let bit_pos = missing_sentinel % 64; // 63 - hits the 1 wasted bit
```

But if the number of terms is indeed divisible by 64, then the `BitSet`
is perfectly aligned to the byte boundary:
```rust
let bitset_init_size = 64; // so BitSet's boxed slice has a length of 1, capable of holding 64 bits, term ids [0, 63]
let missing_sentinel = 64; // num_terms + 1 - 1, 
let byte_pos = missing_sentinel / 64; // 1 - idx 1 >= slice length 1
let bit_pos = missing_sentinel % 64; // 0 
```
We try to access a byte outside of the bounds of the boxed slice,
causing a panic from the bounds check to failing.

## Fixing it
The fix is simple. If we need to account for the missing sentinel,
initialize the `BitSet` with capacity for one more bit.

## Tests
- Added a bunch of unit tests that hit these conditions. I ensured they
failed without the fix, and that they now pass.
- All unit tests pass with the fix in place

## Other
- The investigation that led to finding this bug began with
https://github.com/paradedb/paradedb/issues/4746.
2026-04-25 14:11:47 +02:00
Abdul Andha
4fbae92187
send after key on last page 2026-04-24 15:33:26 -04:00
Pascal Seitz
2e16243f9a fix memory consumption for histogram 2026-04-21 13:58:39 +02:00
Pascal Seitz
73c711ec74 perf(agg): only measure active parent bucket in composite collect
Same change as 26a589e for SegmentCompositeCollector: get_memory_consumption
summed across all parent_buckets on every block, scaling with outer bucket
cardinality. Pass parent_bucket_id and index the single bucket.
2026-04-21 07:26:58 +02:00
Pascal Seitz
cb037c8079 add inline 2026-04-21 07:26:58 +02:00
Pascal Seitz
ed3453606b agg fix: compute memory consumption only for current bucket 2026-04-21 07:26:58 +02:00