Surrogates

surrogates

Classes

MultiOutputGP

MultiOutputGP(X_train, Y_train, *, kernel='matern52', ard=True, noise_variance=1e-06, update_every=10, n_retrain_max=20)

Multi-output GP implemented as independent single-output GPs.

This class represents a vector-valued regression function by training one SingleOutputGP per output dimension. Outputs are therefore conditionally independent given inputs.

Shapes

Training: - X_train: shape (n_train, n_dim) - Y_train: shape (n_train, n_out) Prediction: - X: shape (n, n_dim) - mean: shape (n, n_out) - var: shape (n, n_out) Update: - X_new: shape (1, n_dim) - y_new: shape (n_out,) or (1, n_out)

Parameters:

Name	Type	Description	Default
`X_train`	`ArrayLike`	Training inputs of shape `(n_train, n_dim)`.	required
`Y_train`	`ArrayLike`	Training targets of shape `(n_train, n_out)` (or `(n_train,)` for `n_out=1`).	required
`kernel`	`KernelName`	Passed to each `SingleOutputGP`.	`'matern52'`
`ard`	`KernelName`	Passed to each `SingleOutputGP`.	`'matern52'`
`noise_variance`	`KernelName`	Passed to each `SingleOutputGP`.	`'matern52'`
`update_every`	`KernelName`	Passed to each `SingleOutputGP`.	`'matern52'`
`n_retrain_max`	`KernelName`	Passed to each `SingleOutputGP`.	`'matern52'`

Notes

This is a pragmatic design that keeps the implementation simple and robust. If you need correlated multi-output GPs (coregionalisation), a different model class should be introduced.
Online updates append one joint observation by updating each output GP.

Attributes:

Name	Type	Description
`n_out`		Number of output dimensions.

Attributes

n_train `property`

n_train

Number of training points stored (shared across outputs).

Functions

predict

predict(X)

Predict mean and marginal variance for all outputs.

Parameters:

Name	Type	Description	Default
`X`	`ArrayLike`	Query inputs of shape `(n, n_dim)` or `(n_dim,)`.	required

Returns:

Type	Description
`mean`	Array of shape `(n, n_out)` of predictive means.
`var`	Array of shape `(n, n_out)` of predictive marginal variances.

update

update(X_new, y_new)

Append one new joint observation for all outputs.

Parameters:

Name	Type	Description	Default
`X_new`	`ArrayLike`	New input of shape `(1, n_dim)` (or `(n_dim,)` interpreted as one point).	required
`y_new`	`ArrayLike`	New outputs of shape `(n_out,)` or `(1, n_out)`.	required

Raises:

Type	Description
`ValueError`	If the shapes do not describe exactly one new joint observation.

log_likelihood

log_likelihood()

Return the sum of marginal log-likelihoods across output GPs.

SingleOutputGP

SingleOutputGP(X_train, y_train, *, kernel='matern52', ard=True, noise_variance=1e-06, update_every=10, n_retrain_max=20)

Single-output GP regression with input/output standardisation.

This is a thin wrapper around GPy.models.GPRegression that standardises inputs and outputs using sklearn.preprocessing.StandardScaler.

Standardisation is useful in practice because:

it makes kernel lengthscales easier to learn (inputs have comparable scales),
it stabilises optimisation of hyperparameters when outputs vary in magnitude.

Parameters:

Name	Type	Description	Default
`X_train`	`ArrayLike`	Training inputs of shape `(n_train, n_dim)` (or `(n_dim,)` for a single point).	required
`y_train`	`ArrayLike`	Training targets of shape `(n_train,)` or `(n_train, 1)`.	required
`kernel`	`KernelName`	Kernel name. Supported: `"rbf"`, `"matern32"`, `"matern52"`.	`'matern52'`
`ard`	`bool`	If True, use ARD (one lengthscale per input dimension).	`True`
`noise_variance`	`float`	Initial observation noise variance for the GP likelihood.	`1e-06`
`update_every`	`int`	Retraining period: after every `update_every` calls to `update`, the GP hyperparameters are re-optimised (until `n_retrain_max` is reached). Set `update_every<=0` to disable re-optimisation.	`10`
`n_retrain_max`	`int`	Maximum number of hyperparameter re-optimisations triggered by updates.	`20`

Notes

update appends exactly one new observation with shape (1, n_dim) -> (1, 1).
Hyperparameter optimisation uses GPy's .optimize() routine.
This class intentionally avoids any I/O and does not expose plotting.

Attributes

n_train `property`

n_train

Number of training points currently stored by the GP.

Functions

predict

predict(X)

Predict mean and marginal variance.

Parameters:

Name	Type	Description	Default
`X`	`ArrayLike`	Query inputs of shape `(n, n_dim)` or `(n_dim,)`.	required

Returns:

Type	Description
`mean`	Predictive mean of shape `(n,)` in the original (unstandardised) output space.
`var`	Predictive marginal variance of shape `(n,)` in the original output space.

Notes

GPy returns mean and variance in standardised output space; this method transforms them back to the original units. If y = a * y_std + b, then Var[y] = Var[a * y_std] = a^2 Var[y_std].

update

update(X_new, y_new)

Append one new observation and optionally retrain hyperparameters.

Parameters:

Name	Type	Description	Default
`X_new`	`ArrayLike`	New input with shape `(1, n_dim)` (or `(n_dim,)` which is interpreted as one point).	required
`y_new`	`ArrayLike`	New target with shape `(1, 1)`, `(1,)`, or scalar-like.	required

Raises:

Type	Description
`ValueError`	If the provided data cannot be interpreted as exactly one new observation.

Notes

This method:

standardises (X_new, y_new) using the scalers fit at construction time,
appends the standardised data to the underlying GPy model,
triggers internal re-optimisation according to the update schedule.

log_likelihood

log_likelihood()

Return the GP marginal log-likelihood in standardised space.

POD `dataclass`

POD(rank, randomized=True, n_oversamples=10, n_iter=2, random_state=0, mean_=None, components_=None, singular_values_=None, explained_energy_=None)

Proper Orthogonal Decomposition (POD) estimator.

The POD basis is computed via an SVD of the mean-centered snapshot matrix:

compute the column-wise mean mean,
form the centered matrix Yc = Y - mean,
take the first rank right singular vectors as POD modes.

Parameters:

Name	Type	Description	Default
`rank`	`int`	Number of modes to retain. The effective rank is clipped to `min(n_snapshots, n_obs)`.	required
`randomized`	`bool`	If True, use randomized SVD (often faster for large snapshot matrices). If False, use a deterministic full SVD (`numpy.linalg.svd`).	`True`
`n_oversamples`	`int`	Randomized SVD parameters passed to `sklearn.utils.extmath.randomized_svd`.	`10`
`n_iter`	`int`	Randomized SVD parameters passed to `sklearn.utils.extmath.randomized_svd`.	`10`
`random_state`	`int`	Randomized SVD parameters passed to `sklearn.utils.extmath.randomized_svd`.	`10`

Attributes:

Name	Type	Description
`mean_`	`FloatArray \| None`	Mean snapshot over the training set, shape `(n_obs,)`.
`components_`	`FloatArray \| None`	POD modes, shape `(rank, n_obs)` (rows are modes).
`singular_values_`	`FloatArray \| None`	Retained singular values, shape `(rank,)`.
`explained_energy_`	`FloatArray \| None`	Cumulative explained energy, shape `(rank,)` in `[0, 1]`.

Notes

The explained energy is based on squared singular values. If S denotes the singular values of Yc, then the cumulative explained energy is

.. math::

E(k) = \frac{\sum_{i=1}^{k} S_i^2}{\sum_{i} S_i^2}.

This is commonly used to choose a rank by thresholding E(k) (e.g. 0.99).

Attributes

is_fitted `property`

is_fitted

Whether fit has been called and fitted attributes are available.

n_time_ `property`

n_time_

Observation dimension of the fitted snapshots.

The name n_time_ is kept for compatibility with trajectory use cases, but it should be interpreted as the number of observation components (n_obs).

rank_ `property`

rank_

Effective retained rank after fitting.

Functions

fit

fit(Y)

Fit a POD basis from snapshots.

Parameters:

Name	Type	Description	Default
`Y`	`ArrayLike`	Snapshot matrix with shape `(n_snapshots, n_obs)`.	required

Returns:

Type	Description
`self`	Fitted instance.

Raises:

Type	Description
`ValueError`	If `rank` is not positive or if `Y` is not 2D.

transform

transform(Y)

Project snapshots onto the POD basis.

Parameters:

Name	Type	Description	Default
`Y`	`ArrayLike`	Snapshot matrix of shape `(n_snapshots, n_obs)`.	required

Returns:

Type	Description
`A`	POD coefficients of shape `(n_snapshots, rank)`.

Raises:

Type	Description
`RuntimeError`	If the POD instance is not fitted.
`ValueError`	If `Y` has an incompatible observation dimension.

inverse_transform

inverse_transform(A)

Reconstruct snapshots from POD coefficients.

Parameters:

Name	Type	Description	Default
`A`	`ArrayLike`	POD coefficient matrix of shape `(n_snapshots, rank)`.	required

Returns:

Type	Description
`Y_hat`	Reconstructed snapshots of shape `(n_snapshots, n_obs)`.

Raises:

Type	Description
`RuntimeError`	If the POD instance is not fitted.
`ValueError`	If `A` has an incompatible coefficient dimension.

fit_transform

fit_transform(Y)

Fit POD and return coefficients for the same snapshots.

Parameters:

Name	Type	Description	Default
`Y`	`ArrayLike`	Snapshot matrix of shape `(n_snapshots, n_obs)`.	required

Returns:

Type	Description
`A`	POD coefficients of shape `(n_snapshots, rank)`.

PODGPSurrogate `dataclass`

PODGPSurrogate(pod, gp, coeff_var_floor=1e-12, y_var_floor=1e-14)

POD-GP surrogate for trajectory- or field-valued model outputs.

This surrogate implements a common two-step construction for high-dimensional outputs:

Compression (POD): map each snapshot/trajectory y to a low-dimensional coefficient vector a ∈ R^r.
Regression (GP): learn the mapping θ → a using a Gaussian process.

The surrogate then provides predictions in the original observation space by reconstructing:

Functions

predict

predict(theta)

Predict mean output and pointwise variance in observation space.

Parameters:

Name	Type	Description	Default
`theta`	`ArrayLike`	Parameters of shape `(d,)` or batch of shape `(n, d)`.	required

Returns:

Type	Description
`y_mean`	Predictive mean in observation space. shape `(n_obs,)` if `theta` is 1D shape `(n, n_obs)` if `theta` is 2D
`y_var`	Predictive pointwise variance in observation space, same shape as `y_mean`.

Notes

The returned variance is obtained by propagating coefficient-wise variances through the POD reconstruction under an independence approximation.

update

update(theta, y_true)

Update the surrogate with one new high-fidelity observation.

This method projects the new HF snapshot into POD coefficient space and updates the GP with a single new training point.

Parameters:

Name	Type	Description	Default
`theta`	`ArrayLike`	Parameter vector of shape `(d,)` (or `(1, d)`).	required
`y_true`	`ArrayLike`	HF snapshot/trajectory in observation space, shape `(n_obs,)` (or `(1, n_obs)`).	required

Raises:

Type	Description
`ValueError`	If a batch (more than one observation) is provided.
`RuntimeError`	If the POD object is not fitted.

log_likelihood

log_likelihood()

Return the summed marginal log-likelihood of the underlying GP(s).

copy

copy()

Return a deep copy of the surrogate.

This is useful when a workflow needs independent surrogate state, e.g. when running truly independent chains or experiments.

Notes

Many active-learning workflows instead prefer shared state across deepcopies (see AdaptiveMetropolisShared).

Functions

pod_energy

pod_energy(Y, *, r_max=None, randomized=True, n_oversamples=10, n_iter=2, random_state=0)

Compute the cumulative POD energy curve for a snapshot matrix.

This is a lightweight helper for choosing a POD rank without instantiating a POD object.

Parameters:

Name	Type	Description	Default
`Y`	`ArrayLike`	Snapshot matrix with shape `(n_snapshots, n_obs)`.	required
`r_max`	`int \| None`	Maximum rank to compute. If None, uses `min(n_snapshots, n_obs)`.	`None`
`randomized`	`bool`	If True, use randomized SVD. If False, use deterministic SVD.	`True`
`n_oversamples`	`int`	Randomized SVD parameters.	`10`
`n_iter`	`int`	Randomized SVD parameters.	`10`
`random_state`	`int`	Randomized SVD parameters.	`10`

Returns:

Type	Description
`explained_energy`	1D array of length `r` (the computed rank). Entry `k` is the cumulative explained energy up to mode `k` (0-indexed). Values are in `[0, 1]`.

Raises:

Type	Description
`ValueError`	If `Y` is not 2D or if `r_max` is not positive when provided.

Surrogates

surrogates

Classes

MultiOutputGP

Attributes

n_train property

Functions

predict

update

log_likelihood

SingleOutputGP

Attributes

n_train property

Functions

predict

update

log_likelihood

POD dataclass

Attributes

is_fitted property

n_time_ property

rank_ property

Functions

fit

transform

inverse_transform

fit_transform

PODGPSurrogate dataclass

Functions

predict

update

log_likelihood

copy

Functions

pod_energy

n_train `property`

n_train `property`

POD `dataclass`

is_fitted `property`

n_time_ `property`

rank_ `property`

PODGPSurrogate `dataclass`