[8.19](backport #50137) filestream: Fix shutdown logic and improve benchmark#50182
[8.19](backport #50137) filestream: Fix shutdown logic and improve benchmark#50182mergify[bot] wants to merge 1 commit into8.19from
Conversation
Fix filestream benchmark correctness and shutdown, add high-file-count sub-benchmarks **Shutdown fix (`notifyObserver`):** During shutdown the watcher goroutine that drains `notifyChan` exits before harvesters finish. The old blocking send in `notifyObserver` stalled every closing harvester until the task group's 1-minute timeout expired. Replace the blocking send with a `select` on `canceler.Done()` so harvesters unblock immediately when the input is cancelled. **Benchmark fixes (correctness):** The inode-mode benchmarks were silently broken: `file_identity` defaulted to `fingerprint` even though `prospector.scanner.fingerprint.enabled` was `false`, so every file received the same empty-fingerprint identity and only one harvester was started out of N files. Explicitly set `file_identity.native` / `file_identity.fingerprint` to match the scanner mode so each file gets its own identity and harvester. **Benchmark fixes (hangs / timeouts):** Without `close.reader.on_eof: true` harvesters waited for more data after EOF, preventing the pipeline from closing until the 60-second `task.Group.Stop` timeout expired. Combined with a 1-second `check_interval` this made multi-file benchmarks extremely slow. Set `close.reader.on_eof: true` and lower `check_interval` to 100 ms so harvesters close promptly. **Benchmark refactoring:** - Consolidate duplicated single-file / multi-file / inode / fingerprint sub-benchmarks into a table-driven loop. - Add 1 000-file and 10 000-file fingerprint sub-benchmarks to stress per-file overhead (logger cloning, reader pipeline setup, fingerprint I/O). - Replace deprecated logging setup with local loggers. - Only buffer the event channel when events are actually collected, avoiding a 10 000-slot channel allocation in benchmarks that discard events. (cherry picked from commit 2977528) # Conflicts: # filebeat/input/filestream/input_test.go
|
Cherry-pick of 2977528 has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
🤖 GitHub commentsJust comment with:
|
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
TL;DRThe failing Buildkite step is caused by unresolved Git conflict markers committed in Remediation
Investigation detailsRoot Cause
The PR head content at commit
This is consistent with the backport conflict note already present on the PR ( Evidence
Verification
Follow-upAfter resolving and pushing, this Buildkite failure should clear; any remaining failures (if present) can then be analyzed independently. What is this? | From workflow: PR Buildkite Detective Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
|
This pull request has not been merged yet. Could you please review and merge it @orestisfl? 🙏 |
Proposed commit message
Fix filestream benchmark correctness and shutdown, add high-file-count sub-benchmarks
Shutdown fix (
notifyObserver):During shutdown the watcher goroutine that drains
notifyChanexits beforeharvesters finish. The old blocking send in
notifyObserverstalled everyclosing harvester until the task group's 1-minute timeout expired. Replace the
blocking send with a
selectoncanceler.Done()so harvesters unblockimmediately when the input is cancelled.
Benchmark fixes (correctness):
The inode-mode benchmarks were silently broken:
file_identitydefaulted tofingerprinteven thoughprospector.scanner.fingerprint.enabledwasfalse,so every file received the same empty-fingerprint identity and only one
harvester was started out of N files. Explicitly set
file_identity.native/file_identity.fingerprintto match the scanner mode so each file gets its ownidentity and harvester.
Benchmark fixes (hangs / timeouts):
Without
close.reader.on_eof: trueharvesters waited for more data after EOF,preventing the pipeline from closing until the 60-second
task.Group.Stoptimeout expired. Combined with a 1-second
check_intervalthis made multi-filebenchmarks extremely slow. Set
close.reader.on_eof: trueand lowercheck_intervalto 100 ms so harvesters close promptly.Benchmark refactoring:
sub-benchmarks into a table-driven loop.
overhead (logger cloning, reader pipeline setup, fingerprint I/O).
10 000-slot channel allocation in benchmarks that discard events.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability.I have added an entry in./changelog/fragmentsusing the changelog tool.How to test this PR locally