Hypercausal Chain Demo#

Introduction#

This example demonstrates a multi-node hyper-causal chain simulation composed of three connected nodes (HCNode) sharing a sequential temporal flow. Each node uses a parametric backend (ParametricBackend) and cooperates through causal propagation and gradient-based optimization.

—

General Flow Structure#

The model represents a temporal hyper-causal system where each node contributes to a sequential information chain:

Parametric Backend: transforms inputs via \(S_t = \tanh(w \cdot x_t + b)\) and projects multiple possible futures.
Linear Projector: expands \(S_t\) into \(K\) candidate branches \(S_{t+1}^{(k)}\).
Loss functions: combine task accuracy, temporal consistency, and branch coherence for optimization.
Gradient Descent: updates all backends using finite-difference approximations.

How to Run#

# From the project root
python -m examples.ex_hypercausal_chain_demo

# Or directly
python examples/ex_hypercausal_chain_demo.py

Relevant Code Snippets#

Definition of the ParametricBackend class (tanh transformation and projection)#

from qmlhc.predictors import LinearProjector
# Losses
from qmlhc.loss import MSELoss, ConsistencyLoss, CoherenceLoss
# Optimizer
from qmlhc.optim import make_gradient_descent


# ============================================================================
# 1) Parametric backend with projection via LinearProjector
# ============================================================================
class ParametricBackend(BaseBackend):
    """
    Deterministic backend with per-node parameters (w, b).

    The backend applies a tanh transformation and generates future projections
    centered around the current state.

    Methods
    -------
    run(params=None)
        Computes ``S_t = tanh(w * x + b)``.
    project_future(S_t, K)
        Uses a LinearProjector centered on ``S_t`` to generate K future branches.
    """

    def __init__(
        self,
        config: BackendConfig,
        w: float = 0.9,
        b: float = 0.05,
        proj_span: float = 0.25,
    ):
        super().__init__(config)
        self.w = float(w)
        self.b = float(b)
        # Internal linear projector: uses S_t as projection base (not x)
        self._projector = LinearProjector(weight=1.0, bias=0.0, span=float(proj_span))

    def get_params(self) -> dict:
        """
        Return parameters as arrays for compatibility with the Optimizer API.

        Returns
        -------
        dict
            Dictionary with keys ``"w"`` and ``"b"`` as NumPy arrays.
        """
        return {
            "w": np.array([self.w], dtype=float),
            "b": np.array([self.b], dtype=float),
        }

    def set_params(self, params: dict) -> None:
        """
        Update backend parameters if provided.

        Parameters
        ----------
        params : dict
            Dictionary that may contain keys ``"w"`` and/or ``"b"``.
        """
        if "w" in params:
            self.w = float(np.asarray(params["w"]).reshape(()))
        if "b" in params:
            self.b = float(np.asarray(params["b"]).reshape(()))

    def run(self, params: dict | None = None) -> np.ndarray:
        """
        Apply the backend transformation ``S_t = tanh(w * x + b)``.

        Parameters
        ----------
        params : dict or None, optional
            Optional parameter override for this run.

        Returns
        -------
        np.ndarray
            Transformed state vector ``S_t``.
        """
        if params:
            self.set_params(params)
        x = self._require_input()
        s_t = np.tanh(self.w * x + self.b)
        s_t = self._validate_state(s_t)
        return s_t

    def project_future(self, s_t: np.ndarray, branches: int = 2) -> np.ndarray:
        """
        Generate future states around ``s_t`` using a linear projector.

        Parameters
        ----------
        s_t : np.ndarray
            Current state vector.
        branches : int, optional
            Number of future branches (K). Default is 2.

Main function chain_demo_step() (chain flow, loss computation, and optimization)#

    grads = {}
    base_params = {k: v.copy() for k, v in params.items()}
    apply_params_fn(base_params)
    base_loss = loss_fn()

    for k, v in base_params.items():
        perturbed = {kk: vv.copy() for kk, vv in base_params.items()}
        perturbed[k] = v + eps
        apply_params_fn(perturbed)
        l_eps = loss_fn()
        grad = (l_eps - base_loss) / eps
        grads[k] = np.array([grad], dtype=float)

    apply_params_fn(base_params)
    return grads


def dict_to_scalars(d: dict) -> dict:
    """
    Convert scalar ndarray values to safe Python floats for printing.

    Parameters
    ----------
    d : dict
        Dictionary of parameter arrays.

    Returns
    -------
    dict
        Dictionary with all values converted to floats.
    """
    out = {}
    for k, v in d.items():
        arr = np.asarray(v)
        if arr.shape == () or arr.size == 1:
            out[k] = float(arr.reshape(()).item())
        else:
            out[k] = arr.tolist()
    return out


def grad_l2_norm(grads: dict) -> float:
    """
    Compute L2 norm of all scalar gradients.

    Parameters
    ----------
    grads : dict
        Dictionary with scalar gradient arrays.

    Returns
    -------
    float
        L2 norm of gradients.
    """
    sq_sum = 0.0
    for v in grads.values():
        g = float(np.asarray(v).reshape(()).item())
        sq_sum += g * g
    return float(np.sqrt(sq_sum))


# ============================================================================
# 3) Hyper-causal chain demo + optimization step
# ============================================================================
def chain_demo_step():
    """
    Run a hyper-causal chain demo with one optimization step.

    Builds a sequential model of three nodes, executes multiple time steps,
    computes task, consistency, and coherence losses, and applies a single
    gradient-descent update using finite-difference gradients.

    Returns
    -------
    tuple
        (losses_before, losses_after)
    """
    D = 3
    K = 5
    T = 6  # Temporal sequence length

    model, nodes, backends = build_model_chain(D=D, K=K)

    # Input data (simple oscillatory pattern) and task targets
    t = np.arange(T, dtype=float)
    x_seq = np.stack(
        [
            0.3 * np.sin(0.7 * t + 0.0),
            0.2 * np.sin(0.7 * t + 0.8),
            0.1 * np.cos(0.7 * t + 0.3),
        ],
        axis=1,
    )

    target_seq = np.zeros((T, D), dtype=float)

    mse = MSELoss()
    cons = ConsistencyLoss(alpha=0.8, beta=1.2)
    coh = CoherenceLoss(mode="variance")

Functional Explanation#

The hypercausal chain operates as a multi-node causal model, where each node processes, projects, and corrects its state based on local losses and temporal dependencies.

Parametric Transformation

Each node computes its local state:

\[S_t = \tanh(w \cdot x_t + b)\]

Here, \(w\) and \(b\) are node-specific parameters learned through gradient descent. The nonlinear \(\tanh\) activation ensures numerical stability, bounding all internal states within \((-1, 1)\).
Future Projection (Linear Projector)

Each state generates \(K\) possible futures using a linear projector centered at the current state:

\[S_{t+1}^{(k)} = \text{LinearProjector}(S_t), \quad k \in \{1, \dots, K\}\]

This projection expands the local state into a hypercausal “fan” of possibilities, representing multiple potential outcomes for the next time step.
Loss Composition

The total loss combines three complementary objectives:

\[\mathcal{L}_{total} = \mathcal{L}_{task} + 0.5 \, \mathcal{L}_{consistency} + 0.3 \, \mathcal{L}_{coherence}\]
- Task Loss (MSE):
  
  \[\mathcal{L}_{task} = \frac{1}{T} \sum_{t=1}^{T} \| S_t - Y_t \|^2\]
  
  Measures how close the node’s output is to the desired target trajectory.
- Consistency Loss (Triadic):
  
  \[\mathcal{L}_{consistency} = \alpha \| S_t - S_{t-1} \|^2 + \beta \| S_t - \hat{S}_{t+1} \|^2\]
  
  Ensures smooth temporal evolution between past, present, and predicted future states.
- Coherence Loss:
  
  \[\mathcal{L}_{coherence} = \text{Var}(S_{t+1}^{(k)})\]
  
  Penalizes excessive divergence among projected branches, maintaining causal stability.
Gradient Estimation and Parameter Update

Instead of backpropagation, the example uses a finite-difference gradient estimator:

\[g_i = \frac{\mathcal{L}(\theta_i + \epsilon) - \mathcal{L}(\theta_i)}{\epsilon}\]

Each parameter update follows a simple gradient-descent rule:

\[\theta_i \leftarrow \theta_i - \eta \, g_i\]

where \(\eta\) is the learning rate.
Optimization Loop
- The model runs for multiple time steps (\(T = 6\)), accumulating losses.
- The optimizer (make_gradient_descent) applies one parameter update across all nodes.
- Losses and parameters before/after the update are displayed for interpretability.

Exact Output#

=== Hypercausal Chain Demo ===
D=3, K=5, T=6

Parameters (before):
{'b0_w': 0.9, 'b0_b': 0.05, 'b1_w': 0.95, 'b1_b': 0.02, 'b2_w': 1.05, 'b2_b': 0.0}

Losses BEFORE update:
{'task': 0.02533261887044361, 'cons': 0.006637497192550465, 'coh': 0.029526745779767466, 'total': 0.037509391200649084}

Updating parameters with GD (finite-diff grads)...
||grad||_2 ≈ 3.556270e-01

Parameters (after):
{'b0_w': 0.8979223132592518, 'b0_b': 0.040790416564867205, 'b1_w': 0.9474490593467247, 'b1_b': 0.009730820141449475, 'b2_w': 1.0473899389089636, 'b2_b': -0.010405162840116874}

Losses AFTER update:
{'task': 0.019773491535848887, 'cons': 0.006664418126248786, 'coh': 0.02973869208514816, 'total': 0.03202730822451773}

Summary:
total BEFORE:  0.037509
total AFTER :  0.032027

Hypercausal Chain Demo

Contents