Trust Region#
Trust-Region (KL over States)#
Wrapper that enforces a trust-region constraint measured as a (symmetric) KL proxy over state branches. If the KL bound is exceeded, performs backtracking line-search along the update direction until satisfied.
- Interface:
initialize(params) -> state
step_params(model, params, context) -> (new_params, state)
- Requires:
base optimizer exposing step_params(…)
context[“kl_fn”](old_info, new_info) -> float (KL or proxy)
context[“info”]: current state’s info dict (with ‘branches’ if possible)
context[“refresh_info”](model, params, context) -> info (callable)