smart_ide/docs/repo/script-remote-data-ssh-sync.md
Nicolas Cantu 58cc2493e5 chore: consolidate ia_dev module, sync tooling, and harden gateways (0.0.5)
Initial state:
- ia_dev was historically referenced as ./ia_dev in docs and integrations, while the vendored module lives under services/ia_dev.
- AnythingLLM sync and hook installation had error masking / weak exit signaling.
- Proxy layers did not validate proxy path segments, allowing path normalization tricks.

Motivation:
- Make the IDE-oriented workflow usable (sync -> act -> deploy/preview) with explicit errors.
- Reduce security footguns in proxying and script automation.

Resolution:
- Standardize IA_DEV_ROOT usage and documentation to services/ia_dev.
- Add SSH remote data mirroring + optional AnythingLLM ingestion.
- Extend AnythingLLM pull sync to support upload-all/prefix and fail on upload errors.
- Harden smart-ide-sso-gateway and smart-ide-global-api proxying with safe-path checks and non-leaking error responses.
- Improve ia-dev-gateway runner validation and reduce sensitive path leakage.
- Add site scaffold tool (Vite/React) with OIDC + chat via sso-gateway -> orchestrator.

Root cause:
- Historical layout changes (submodule -> vendored tree) and missing central contracts for path resolution.
- Missing validation for proxy path traversal patterns.
- Overuse of silent fallbacks (|| true, exit 0 on partial failures) in automation scripts.

Impacted features:
- Project sync: git pull + AnythingLLM sync + remote data mirror ingestion.
- Site frontends: SSO gateway proxy and orchestrator intents (rag.query, chat.local).
- Agent execution: ia-dev-gateway script runner and SSE output.

Code modified:
- scripts/remote-data-ssh-sync.sh
- scripts/anythingllm-pull-sync/sync.mjs
- scripts/install-anythingllm-post-merge-hook.sh
- cron/git-pull-project-clones.sh
- services/smart-ide-sso-gateway/src/server.ts
- services/smart-ide-global-api/src/server.ts
- services/smart-ide-orchestrator/src/server.ts
- services/ia-dev-gateway/src/server.ts
- services/ia_dev/tools/site-generate.sh

Documentation modified:
- docs/** (architecture, API docs, ia_dev module + integration, scripts)

Configurations modified:
- config/services.local.env.example
- services/*/.env.example

Files in deploy modified:
- services/ia_dev/deploy/*

Files in logs impacted:
- logs/ia_dev.log (runtime only)
- .logs/* (runtime only)

Databases and other sources modified:
- None

Off-project modifications:
- None

Files in .smartIde modified:
- .smartIde/agents/*.md
- services/ia_dev/.smartIde/**

Files in .secrets modified:
- None

New patch version in VERSION:
- 0.0.5

CHANGELOG.md updated:
- yes
2026-04-04 18:36:43 +02:00

62 lines
1.6 KiB
Markdown

# remote-data-ssh-sync (`scripts/remote-data-ssh-sync.sh`)
Pulls **deployed environment data** over SSH into a **local mirror** (not versioned in Git), then optionally ingests that mirror into **AnythingLLM**.
## Configuration source (per project)
`projects/<id>/conf.json`:
- `smart_ide.remote_data_access.environments.<env>.ssh_host_alias`
- `smart_ide.remote_data_access.environments.<env>.remote_data_directories[]`
- `smart_ide.anythingllm_workspace_slug[env]` (optional; required for ingestion)
## Mirror location
Default:
- `<smart_ide_root>/.data/remote-data/<projectId>/<env>/<role>/`
This directory is ignored by Git (see `.gitignore`).
Override:
- `SMART_IDE_REMOTE_DATA_MIRROR_ROOT=/abs/path`
## AnythingLLM ingestion
By default, the script attempts ingestion and skips explicitly if config is missing.
Inputs:
- `~/.config/4nk/anythingllm-sync.env` (optional): provides `ANYTHINGLLM_BASE_URL` + `ANYTHINGLLM_API_KEY`
- `projects/<id>/conf.json`: provides the workspace slug for the selected env
Implementation:
- calls `scripts/anythingllm-pull-sync/sync.mjs` with `--upload-all` on each mirrored role directory
## Usage
```bash
./scripts/remote-data-ssh-sync.sh --project enso --env test
```
Fetch only (no ingestion):
```bash
./scripts/remote-data-ssh-sync.sh --project enso --env test --no-anythingllm
```
Ingest only specific roles:
```bash
./scripts/remote-data-ssh-sync.sh --project enso --env test --roles docv_dp_git_data
```
Dry-run (prints rsync command lines):
```bash
./scripts/remote-data-ssh-sync.sh --project enso --env test --dry-run
```