algo/docs/fixKnowledge/collatz_scission_palier_inference_and_output_dirs.md
ncantu 6d64ca1a50 Fix scission palier inference and create output dirs
**Motivations:**
- Make certificates reproducible when CSV columns do not encode the palier
- Avoid FileNotFoundError when writing certificates into new folders
- Reuse scission in the local H6 generator to avoid duplicated certificate logic

**Root causes:**
- palier inference relied on max residue value when the class column was generic
- scission assumed output directories already exist
- empty CSV fields were coerced to 0

**Correctifs:**
- Infer palier from explicit columns (palier/m) or filename, keep heuristic fallback
- Create parent directory for output JSON
- Skip empty class/sister values instead of adding residue 0

**Evolutions:**
- Use collatz_scission for certificate generation in local H6 artefacts generator

**Pages affectées:**
- applications/collatz/collatz_k_scripts/collatz_scission.py
- applications/collatz/collatz_k_scripts/collatz_generate_local_h6_artefacts.py
- docs/fixKnowledge/collatz_scission_palier_inference_and_output_dirs.md
2026-03-09 01:18:36 +01:00

2.2 KiB
Raw Blame History

collatz_scission palier inference and output directory creation

Problem

The helper script applications/collatz/collatz_k_scripts/collatz_scission.py can produce incorrect certificates in two cases:

  • Incorrect palier inference: when the CSV class column is named generically (e.g. classe_mod_2^m) and the covered classes are sparse / small (e.g. values <2^8 while the target modulus is 2^{13}), palier was inferred from the maximum class value. This yields a wrong modulus power.
  • Missing output directories: run_scission() writes out_json_path without creating parent directories, which can raise FileNotFoundError when callers pass a new path under a non-existing folder.

Root cause

  • infer_palier() only supported:
    • parsing 2^m from the class column name, or
    • a fallback heuristic based on the maximum covered residue value. This heuristic is not reliable when the class column name does not encode the modulus power.
  • run_scission() assumed the output directory exists.

Corrective actions

  • Prefer explicit palier columns:
    • If the CSV contains a numeric palier column, use it.
    • If the CSV contains a numeric m / modulus_power column (used as exponent in some pipelines), use it.
  • Fallback from filename: parse palier2p<m> from the CSV path when available.
  • Keep legacy fallback: keep the max-value heuristic as a last resort.
  • Create output directories: ensure out_json_path.parent exists before writing.
  • Do not add spurious residue 0: skip empty strings instead of coercing to 0 when parsing the class / sister columns.

Impact

  • Certificates generated via collatz_scission.py now carry a palier that matches the CSVs intended modulus power when the CSV provides it (or when the filename encodes it).
  • Callers can write certificates to new directories without pre-creating them.

Analysis modalities

  • For any certificate JSON, verify:
    • palier matches the intended modulus power 2^m,
    • clauses and covered sets do not contain a spurious 0,
    • directory creation does not fail when writing under a fresh path.

Deployment

  • No environment changes are required.
  • The fix is local to:
    • applications/collatz/collatz_k_scripts/collatz_scission.py