This guide covers development workflows, testing strategies, and contribution guidelines for the Semantic Code Search Indexer.
- Node.js 20+
- Docker with Compose v2 plugin
- Elasticsearch 9.x (for integration tests)
- Git
# Clone the repository
git clone <repo-url>
cd semantic-code-search-indexer
# Install dependencies
npm install
# Build the project
npm run build
# Run unit tests
npm test- Numeric CLI flags like
--concurrency,--batch-size,--delete-documents-page-size, and--parse-concurrencymust be positive integers. Invalid values fail fast with a clear error message.
We use Vitest as our testing framework, offering fast test execution, first-class TypeScript support, and an excellent developer experience.
Fast tests with no external dependencies. These test individual functions and classes in isolation using mocks.
Commands:
npm test # Run all unit tests
npm run test:watch # Watch mode for TDD
npm run test:ui # Interactive UI mode (recommended for debugging)
npm run test:unit # Explicitly run unit tests onlyQuick sanity check (manual):
If you have an index available in Elasticsearch, you can run a quick semantic query from the CLI:
npm run search -- "where is config loaded?" --index <your-index> --limit 5CLI help tip:
When using npm run, pass -- before flags so they reach the underlying command:
npm run search -- --helpInteractive UI Mode:
The test:ui command opens Vitest's web-based UI in your browser, providing:
- Real-time test results with rich diffs
- Interactive filtering and search
- Code coverage visualization
- File-based test navigation
- Re-run on file changes
- Click-to-source code navigation
Perfect for debugging flaky tests or understanding test failures.
Configuration:
- Unit test config:
vitest.config.ts - Test files:
tests/unit/**/*.test.ts - Integration test config:
vitest.integration.config.ts - Integration test files:
tests/integration/**/*.test.ts - Setup:
tests/setup.ts(unit) /tests/integration-setup.ts(integration)
Parallelization:
- Local: Tests run in parallel across all CPU cores for maximum speed
- CI: Tests run serially (
maxWorkers: 1) for stability and reproducibility
Integration tests validate the full indexing pipeline against a real Elasticsearch instance with the ELSER model deployed.
npm run test:integrationThis is a complete, isolated test run that:
- Sets up Elasticsearch 9.2.0 via Docker Compose (~15s)
- Deploys the ELSER sparse embedding model
- Runs all integration tests (~5s)
- Tears down the infrastructure (always, even on failure)
Total time: ~22 seconds per run. Clean slate every time!
Use this for:
- Pre-commit validation
- Quick one-off tests
- When you want guaranteed clean state
- Any time you don't mind the ~15s ES startup cost
For debugging or running tests repeatedly without ES startup overhead:
# Setup Elasticsearch once
npm run test:integration:setup
# Run integration tests (fast - no ES startup cost)
npm run test:integration:run # 5s
npm run test:integration:run # 5s (again)
npm run test:integration:run # 5s (many times)
# Inspect Elasticsearch manually if needed
curl -u elastic:testpassword http://localhost:9200/_cat/indices
curl -u elastic:testpassword http://localhost:9200/test-*/_count
# Teardown when done
npm run test:integration:teardownSaves ~17s per run after initial setup!
# Full workflow - single isolated run (most common)
npm run test:integration # setup → run → teardown (clean slate each time)
# Manual control - keep ES running between test runs (for development)
npm run test:integration:run # Run tests only (ES must be running)
npm run test:integration:setup # Setup Elasticsearch environment
npm run test:integration:teardown # Teardown Elasticsearch environment| Command | ES Lifecycle | Time per Run | Best For | Cleanup |
|---|---|---|---|---|
test:integration |
Setup + Teardown every time | ~22s | Pre-commit, one-off tests, clean state | Automatic (always) |
setup → run (×N) → teardown |
Setup once, persist | ~5s each | Active dev, debugging, iteration | Manual (when done) |
- Docker Compose v2 (
docker composecommand)- Local: Docker Desktop (Mac/Windows) or Docker Engine 20.10+ with Compose plugin (Linux)
- CI: GitHub Actions
ubuntu-latestrunners include Docker Compose v2
- 4GB+ RAM available for Elasticsearch
- Port 9200 not in use
The docker-compose.integration.yml configuration:
- Runs Elasticsearch 9.2.0 in single-node mode
- Configures authentication (
elastic/testpassword) - Enables trial license for inference API
- Deploys ELSER model to
elser-inference-testendpoint - Includes health checks
Integration tests run automatically in GitHub Actions with separate steps for better visibility:
integration_tests:
runs-on: ubuntu-latest
steps:
- name: Setup Elasticsearch
run: npm run test:integration:setup
- name: Run integration tests
run: npm run test:integration:run
- name: Teardown Elasticsearch
if: always() # Always runs, even if tests fail
run: npm run test:integration:teardownThe if: always() ensures cleanup happens even if tests fail, preventing resource leaks in CI.
- Unit tests: Use
pool: 'threads'for better memory efficiency - Integration tests: Also use threads, with configurable parallelization
- Global teardown: Ensures Elasticsearch client is properly closed to prevent hanging connections
Integration tests use .env.test for configuration:
ELASTICSEARCH_ENDPOINT=http://localhost:9200
ELASTICSEARCH_USERNAME=elastic
ELASTICSEARCH_PASSWORD=testpassword
SCS_IDXR_ELASTICSEARCH_INFERENCE_ID=elser-inference-test
# Disable semantic search for tests (semantic_text + ELSER inference)
# This keeps the tests focused on correctness of the indexing pipeline and makes them much faster.
SCS_IDXR_DISABLE_SEMANTIC_TEXT=true
# Timeout for bulk operations
SCS_IDXR_ELASTICSEARCH_REQUEST_TIMEOUT=120000Why SCS_IDXR_DISABLE_SEMANTIC_TEXT=true?
- It turns off semantic search for these tests by disabling the
semantic_textfield type and the associated ELSER inference at ingest time - Indices created with
SCS_IDXR_DISABLE_SEMANTIC_TEXT=truedo not have asemantic_textmapping, so semantic search queries will fail against those indices until they are recreated with semantic text enabled. - It also makes the test suite much faster and less flaky
- Full ELSER/semantic search behavior is validated separately (see
tests/integration/semantic_text_semantic_search.integration.test.ts)
- Location:
tests/fixtures/ - Contains sample files in multiple languages (TypeScript, Python, Go, Java, etc.)
- Integration tests create a temporary Git repo from fixtures
- Ensures consistent, reproducible test data
Integration tests explicitly fail with setup instructions if Elasticsearch is not running.
Solutions:
- Easiest: Use
npm run test:integration(handles everything) - Manual: Run
npm run test:integration:setupfirst - Verify: Check Docker is running:
docker ps | grep elasticsearch - Health check:
curl -u elastic:testpassword http://localhost:9200/_cluster/health
Common causes:
- ELSER inference slowness: Set
SCS_IDXR_DISABLE_SEMANTIC_TEXT=truein.env.test - Worker backpressure bug: Fixed in this branch (see
src/utils/indexer_worker.ts) - Bulk indexing timeout: Increase
SCS_IDXR_ELASTICSEARCH_REQUEST_TIMEOUTif needed - Watch mode enabled: Integration tests set
watch: falseexplicitly
Debug with logging:
# Integration test setup includes logging by default
# No additional flags needed - logs show automatically
npm run test:integration:runAlready fixed:
- Changed Vitest pool from
forkstothreadsfor better memory efficiency - Added
globalTeardownto close Elasticsearch client connections - These fixes resolved
Worker exited unexpectedlyerrors
# Stop all integration test containers
npm run test:integration:teardown
# Verify no containers running
docker ps -a | grep semantic
# Force cleanup if needed
docker compose -f docker-compose.integration.yml down -v# Find what's using the port
lsof -i :9200
# Stop existing Elasticsearch
docker ps | grep elasticsearch | awk '{print $1}' | xargs docker stopWe use ESLint with TypeScript-specific rules:
npm run lint # Check for issues
npm run lint -- --fix # Auto-fix issuesPrettier is configured for consistent code style:
npm run format # Format all files
npm run format:check # Check formatting without changesConsider setting up pre-commit hooks for automatic linting and formatting.
- TypeScript: Strict mode enabled, avoid
anyand type assertions - Imports: Use named imports, ESM syntax
- Functions: Keep under 50 lines, use early returns, avoid deep nesting
- Tests: BDD style with
describe('WHEN ...'),describe('AND ...'),it('SHOULD ...') - Async: Use
async/awaitinstead of.then()chains - Comments: JSDoc for public APIs, inline comments for complex logic
.
├── src/
│ ├── commands/ # CLI commands (index, setup, etc.)
│ ├── utils/ # Core utilities (ES client, logger, workers)
│ ├── languages/ # Language-specific parsers
│ └── config.ts # Environment configuration
├── tests/
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── fixtures/ # Test data (sample code files)
│ ├── setup.ts # Unit test setup
│ └── integration-setup.ts # Integration test setup
├── scripts/ # Shell scripts (integration test lifecycle)
├── docs/ # Documentation
├── vitest.config.ts # Unit test configuration
├── vitest.integration.config.ts # Integration test configuration
└── docker-compose.integration.yml # ES for integration tests
-
Create a feature branch:
git checkout -b feature/your-feature-name
-
Make changes with tests:
- Add unit tests for new functions/classes
- Add integration tests for new commands or ES interactions
- Run tests frequently:
npm run test:watch
-
Ensure quality:
npm run lint npm run format npm test npm run test:integration -
Commit with conventional commits:
git commit -m "feat: add new feature" git commit -m "fix: resolve bug in worker" git commit -m "test: add integration test for indexing"
-
Push and create PR:
git push -u origin feature/your-feature-name # Open PR on GitHub
We use Conventional Commits:
feat:- New featurefix:- Bug fixtest:- Add or update testsdocs:- Documentation changeschore:- Maintenance tasksrefactor:- Code refactoringperf:- Performance improvements
- Provide a clear description of the change
- Link related issues
- Include screenshots/logs for UI or behavior changes
- Ensure all tests pass in CI
- Request review from maintainers
- Manual Test Plan - Detailed E2E testing procedures
- Elasticsearch Deployment Guide - Production ES setup
- GCP Deployment Guide - Cloud deployment instructions
- Queue Recovery - Queue management and recovery procedures