From d6a61e7cbe8ad851fb5122a451414e4a7abad3dc Mon Sep 17 00:00:00 2001
From: 4NK <nicolas.cantu@pm.me>
Date: Fri, 3 Apr 2026 22:28:20 +0200
Subject: [PATCH] chandra: document and script local HuggingFace install (hf
 extra, run-chandra-hf)

- Add install-local-hf.sh (uv sync --extra hf or pip install -e .[hf])
- Add run-chandra-hf.sh defaulting to --method hf
- Expand .env.example for upstream/local.env (MODEL_CHECKPOINT, TORCH_*)
---
 docs/features/chandra-ocr-documents.md |  3 +-
 docs/repo/service-chandra.md           |  4 +-
 services/chandra/.env.example          | 21 +++++++---
 services/chandra/README.md             | 57 +++++++++++++++++---------
 services/chandra/install-local-hf.sh   | 22 ++++++++++
 services/chandra/run-chandra-hf.sh     | 10 +++++
 6 files changed, 90 insertions(+), 27 deletions(-)
 create mode 100755 services/chandra/install-local-hf.sh
 create mode 100755 services/chandra/run-chandra-hf.sh
diff --git a/docs/features/chandra-ocr-documents.md b/docs/features/chandra-ocr-documents.md
index 3009ebd..827dccd 100644
--- a/docs/features/chandra-ocr-documents.md
+++ b/docs/features/chandra-ocr-documents.md
@@ -7,7 +7,8 @@
 ## Intégration smart_ide
 
 - Répertoire : **`services/chandra/`** avec sous-module **`upstream/`**.
-- Commande : **`./run-chandra.sh`** (délègue à **`chandra`** dans le venv **`upstream/.venv`** ou à **`uv run chandra`**).
+- **Inférence locale Hugging Face** : **`./install-local-hf.sh`** puis **`./run-chandra-hf.sh <pdf|dossier> <dossier_sortie>`** ; configuration **`upstream/local.env`** (voir **`services/chandra/.env.example`**).
+- Autres modes : **`./run-chandra.sh`** avec **`--method vllm`** ou **`hf`** selon l’installation.
 
 ## Chaînage possible
 
diff --git a/docs/repo/service-chandra.md b/docs/repo/service-chandra.md
index e3b9e7e..f455ff2 100644
--- a/docs/repo/service-chandra.md
+++ b/docs/repo/service-chandra.md
@@ -16,7 +16,9 @@ OCR et extraction **structurée** (PDF / images → Markdown, HTML, JSON avec mi
 
 Voir **[`services/chandra/README.md`](../../services/chandra/README.md)** et **[features/chandra-ocr-documents.md](../features/chandra-ocr-documents.md)**.
 
-Configuration : variables d’environnement ou **`upstream/local.env`** — gabarit **`services/chandra/.env.example`**.
+**Hugging Face local (recommandé pour un poste de dev avec GPU)** : depuis **`services/chandra/`**, exécuter **`./install-local-hf.sh`**, copier **`.env.example`** vers **`upstream/local.env`**, puis **`./run-chandra-hf.sh <entrée> <sortie>`**.
+
+Configuration : **`upstream/local.env`** (chargé par l’amont) — gabarit **`services/chandra/.env.example`** (`MODEL_CHECKPOINT`, `TORCH_DEVICE`, `MAX_OUTPUT_TOKENS`, `TORCH_ATTN`, `HF_TOKEN` si besoin).
 
 ## Voir aussi
 
diff --git a/services/chandra/.env.example b/services/chandra/.env.example
index e7fa238..15a8c91 100644
--- a/services/chandra/.env.example
+++ b/services/chandra/.env.example
@@ -1,10 +1,21 @@
-# Optional: copy to services/chandra/upstream/local.env (see upstream README).
-# Or export before running ./run-chandra.sh
+# Copy to services/chandra/upstream/local.env (loaded by pydantic-settings via find_dotenv).
+# https://github.com/datalab-to/chandra — local Hugging Face inference
 
-# MODEL_CHECKPOINT=datalab-to/chandra-ocr-2
-# MAX_OUTPUT_TOKENS=12384
+# Hugging Face model id (weights downloaded on first run)
+MODEL_CHECKPOINT=datalab-to/chandra-ocr-2
 
-# vLLM (default inference path for lightweight pip install)
+# Optional: force device, e.g. cuda:0, cpu
+# TORCH_DEVICE=cuda:0
+
+# Optional: flash attention — requires compatible GPU + flash-attn installed
+# TORCH_ATTN=flash_attention_2
+
+MAX_OUTPUT_TOKENS=12384
+
+# If the checkpoint were gated (not the case for chandra-ocr-2 by default), use:
+# HF_TOKEN=
+
+# --- vLLM only (not used with --method hf) ---
 # VLLM_API_BASE=http://localhost:8000/v1
 # VLLM_MODEL_NAME=chandra
 # VLLM_GPUS=0
diff --git a/services/chandra/README.md b/services/chandra/README.md
index 2e0f7f8..22111ac 100644
--- a/services/chandra/README.md
+++ b/services/chandra/README.md
@@ -5,34 +5,51 @@
 Ce répertoire **`services/chandra/`** contient :
 
 - **`upstream/`** : sous-module Git vers **datalab-to/chandra**.
-- **`run-chandra.sh`** : lance la CLI **`chandra`** depuis l’environnement installé dans **`upstream/`** (`uv` ou `.venv`).
-- **`.env.example`** : variables usuelles (vLLM, modèle) ; l’amont charge aussi **`local.env`** dans **`upstream/`** (non versionné).
+- **`install-local-hf.sh`** : installe les dépendances **Hugging Face** (Torch, Transformers, etc.) dans **`upstream/.venv`**.
+- **`run-chandra-hf.sh`** : lance la CLI avec **`--method hf`** (inférence locale).
+- **`run-chandra.sh`** : lance **`chandra`** tel quel (passer **`--method vllm`** ou **`hf`**).
+- **`.env.example`** : variables pour **`upstream/local.env`** (modèle, GPU, tokens).
 
-## Installation (une fois par poste)
+## Configuration locale (Hugging Face)
 
-Depuis les sources du sous-module (recommandé ici) :
+Une fois le sous-module présent :
+
+```bash
+cd services/chandra
+./install-local-hf.sh
+cp .env.example upstream/local.env
+# Éditer upstream/local.env si besoin (TORCH_DEVICE, MAX_OUTPUT_TOKENS, HF_TOKEN).
+```
+
+- **GPU** : laisser `TORCH_DEVICE` vide pour **`device_map="auto"`** (comportement amont), ou fixer par ex. **`TORCH_DEVICE=cuda:0`**.
+- **CPU** : possible mais lent ; indiquer **`TORCH_DEVICE=cpu`**.
+- Le modèle **`MODEL_CHECKPOINT`** est téléchargé depuis Hugging Face au premier run (connexion requise ; espace disque important).
+
+L’amont recommande [flash-attention](https://github.com/Dao-AILab/flash-attention) pour de meilleures perfs GPU ; après installation, **`TORCH_ATTN=flash_attention_2`** dans **`local.env`**.
+
+## Usage (HF local)
+
+```bash
+cd services/chandra
+./run-chandra-hf.sh /chemin/document.pdf ./sortie_ocr
+# répertoire d’entrée :
+./run-chandra-hf.sh /chemin/documents ./sortie_ocr
+```
+
+Équivalent : **`./run-chandra.sh … --method hf`**.
+
+## Usage (vLLM, optionnel)
+
+Si tu préfères un serveur vLLM plutôt que le chargement local du modèle :
 
 ```bash
 cd services/chandra/upstream
 uv sync
-# optionnel : modèle local Hugging Face (lourd)
-# uv sync --extra hf
-```
-
-Sans **`uv`** : créer un venv, puis `pip install -e ".[hf]"` ou `pip install -e .` selon le mode d’inférence (voir [README amont](https://github.com/datalab-to/chandra/blob/master/README.md)).
-
-**Inférence vLLM** (léger côté client si le serveur tourne ailleurs) : démarrer le serveur comme documenté amont (`chandra_vllm` après install du paquet).
-
-## Usage
-
-```bash
-cd services/chandra
+# puis démarrer chandra_vllm selon README amont
+cd ..
 ./run-chandra.sh input.pdf ./output --method vllm
-# ou --method hf si dépendances HF installées
 ```
 
-Options CLI (`--page-range`, `--max-workers`, etc.) : même interface que la commande **`chandra`** amont.
-
 ## Rôle dans smart_ide
 
 - **OCR / numérisation structurée** pour pipelines documentaires, en amont de **PageIndex** ([PageIndex](../pageindex/README.md)) ou d’**AnythingLLM** / **docv**.
@@ -43,4 +60,4 @@ Documentation : [docs/repo/service-chandra.md](../../docs/repo/service-chandra.m
 ## Ressources amont
 
 - Dépôt : [datalab-to/chandra](https://github.com/datalab-to/chandra)
-- Paquet PyPI : `chandra-ocr` (alternative à l’installation depuis **`upstream/`**)
+- Paquet PyPI : `chandra-ocr` (alternative : `pip install chandra-ocr[hf]`)
diff --git a/services/chandra/install-local-hf.sh b/services/chandra/install-local-hf.sh
new file mode 100755
index 0000000..2d61e98
--- /dev/null
+++ b/services/chandra/install-local-hf.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+# Install Chandra with Hugging Face / Transformers backend (local GPU or CPU).
+set -euo pipefail
+ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+UP="${ROOT}/upstream"
+if [[ ! -d "${UP}/chandra" ]]; then
+	echo "Missing ${UP}/chandra — run: git submodule update --init services/chandra/upstream" >&2
+	exit 1
+fi
+cd "${UP}"
+if command -v uv >/dev/null 2>&1; then
+	uv sync --extra hf
+else
+	if [[ ! -d .venv ]]; then
+		python3 -m venv .venv
+	fi
+	"${UP}/.venv/bin/pip" install -U pip
+	"${UP}/.venv/bin/pip" install -e ".[hf]"
+fi
+echo ""
+echo "OK. Configure model (optional): cp ${ROOT}/.env.example ${UP}/local.env && edit"
+echo "Run OCR (local HF): ${ROOT}/run-chandra-hf.sh <input.pdf|dir> <output_dir>"
diff --git a/services/chandra/run-chandra-hf.sh b/services/chandra/run-chandra-hf.sh
new file mode 100755
index 0000000..774a4b3
--- /dev/null
+++ b/services/chandra/run-chandra-hf.sh
@@ -0,0 +1,10 @@
+#!/usr/bin/env bash
+# Run Chandra CLI with --method hf (Hugging Face local inference).
+set -euo pipefail
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+for a in "$@"; do
+	if [[ "$a" == "--method" ]]; then
+		exec "${SCRIPT_DIR}/run-chandra.sh" "$@"
+	fi
+done
+exec "${SCRIPT_DIR}/run-chandra.sh" "$@" --method hf