Skip to content

Surrogates

surrogates

Classes

MultiOutputGP

MultiOutputGP(X_train, Y_train, *, kernel='matern52', ard=True, noise_variance=1e-06, update_every=10, n_retrain_max=20)

Multi-output GP implemented as independent single-output GPs.

This class represents a vector-valued regression function by training one SingleOutputGP per output dimension. Outputs are therefore conditionally independent given inputs.

Shapes

Training: - X_train: shape (n_train, n_dim) - Y_train: shape (n_train, n_out) Prediction: - X: shape (n, n_dim) - mean: shape (n, n_out) - var: shape (n, n_out) Update: - X_new: shape (1, n_dim) - y_new: shape (n_out,) or (1, n_out)

Parameters:

Name Type Description Default
X_train ArrayLike

Training inputs of shape (n_train, n_dim).

required
Y_train ArrayLike

Training targets of shape (n_train, n_out) (or (n_train,) for n_out=1).

required
kernel KernelName

Passed to each SingleOutputGP.

'matern52'
ard KernelName

Passed to each SingleOutputGP.

'matern52'
noise_variance KernelName

Passed to each SingleOutputGP.

'matern52'
update_every KernelName

Passed to each SingleOutputGP.

'matern52'
n_retrain_max KernelName

Passed to each SingleOutputGP.

'matern52'
Notes
  • This is a pragmatic design that keeps the implementation simple and robust. If you need correlated multi-output GPs (coregionalisation), a different model class should be introduced.
  • Online updates append one joint observation by updating each output GP.

Attributes:

Name Type Description
n_out

Number of output dimensions.

See Also

SingleOutputGP Underlying scalar GP used per output dimension.

Attributes
n_train property
n_train

Number of training points stored (shared across outputs).

Functions
predict
predict(X)

Predict mean and marginal variance for all outputs.

Parameters:

Name Type Description Default
X ArrayLike

Query inputs of shape (n, n_dim) or (n_dim,).

required

Returns:

Type Description
mean

Array of shape (n, n_out) of predictive means.

var

Array of shape (n, n_out) of predictive marginal variances.

update
update(X_new, y_new)

Append one new joint observation for all outputs.

Parameters:

Name Type Description Default
X_new ArrayLike

New input of shape (1, n_dim) (or (n_dim,) interpreted as one point).

required
y_new ArrayLike

New outputs of shape (n_out,) or (1, n_out).

required

Raises:

Type Description
ValueError

If the shapes do not describe exactly one new joint observation.

log_likelihood
log_likelihood()

Return the sum of marginal log-likelihoods across output GPs.

SingleOutputGP

SingleOutputGP(X_train, y_train, *, kernel='matern52', ard=True, noise_variance=1e-06, update_every=10, n_retrain_max=20)

Single-output GP regression with input/output standardisation.

This is a thin wrapper around GPy.models.GPRegression that standardises inputs and outputs using sklearn.preprocessing.StandardScaler.

Standardisation is useful in practice because:

  • it makes kernel lengthscales easier to learn (inputs have comparable scales),
  • it stabilises optimisation of hyperparameters when outputs vary in magnitude.

Parameters:

Name Type Description Default
X_train ArrayLike

Training inputs of shape (n_train, n_dim) (or (n_dim,) for a single point).

required
y_train ArrayLike

Training targets of shape (n_train,) or (n_train, 1).

required
kernel KernelName

Kernel name. Supported: "rbf", "matern32", "matern52".

'matern52'
ard bool

If True, use ARD (one lengthscale per input dimension).

True
noise_variance float

Initial observation noise variance for the GP likelihood.

1e-06
update_every int

Retraining period: after every update_every calls to update, the GP hyperparameters are re-optimised (until n_retrain_max is reached). Set update_every<=0 to disable re-optimisation.

10
n_retrain_max int

Maximum number of hyperparameter re-optimisations triggered by updates.

20
Notes
  • update appends exactly one new observation with shape (1, n_dim) -> (1, 1).
  • Hyperparameter optimisation uses GPy's .optimize() routine.
  • This class intentionally avoids any I/O and does not expose plotting.
See Also

MultiOutputGP Multi-output wrapper implemented as independent single-output GPs.

Attributes
n_train property
n_train

Number of training points currently stored by the GP.

Functions
predict
predict(X)

Predict mean and marginal variance.

Parameters:

Name Type Description Default
X ArrayLike

Query inputs of shape (n, n_dim) or (n_dim,).

required

Returns:

Type Description
mean

Predictive mean of shape (n,) in the original (unstandardised) output space.

var

Predictive marginal variance of shape (n,) in the original output space.

Notes

GPy returns mean and variance in standardised output space; this method transforms them back to the original units. If y = a * y_std + b, then Var[y] = Var[a * y_std] = a^2 Var[y_std].

update
update(X_new, y_new)

Append one new observation and optionally retrain hyperparameters.

Parameters:

Name Type Description Default
X_new ArrayLike

New input with shape (1, n_dim) (or (n_dim,) which is interpreted as one point).

required
y_new ArrayLike

New target with shape (1, 1), (1,), or scalar-like.

required

Raises:

Type Description
ValueError

If the provided data cannot be interpreted as exactly one new observation.

Notes

This method:

  1. standardises (X_new, y_new) using the scalers fit at construction time,
  2. appends the standardised data to the underlying GPy model,
  3. triggers internal re-optimisation according to the update schedule.
log_likelihood
log_likelihood()

Return the GP marginal log-likelihood in standardised space.

POD dataclass

POD(rank, randomized=True, n_oversamples=10, n_iter=2, random_state=0, mean_=None, components_=None, singular_values_=None, explained_energy_=None)

Proper Orthogonal Decomposition (POD) estimator.

The POD basis is computed via an SVD of the mean-centered snapshot matrix:

  • compute the column-wise mean mean,
  • form the centered matrix Yc = Y - mean,
  • take the first rank right singular vectors as POD modes.

Parameters:

Name Type Description Default
rank int

Number of modes to retain. The effective rank is clipped to min(n_snapshots, n_obs).

required
randomized bool

If True, use randomized SVD (often faster for large snapshot matrices). If False, use a deterministic full SVD (numpy.linalg.svd).

True
n_oversamples int

Randomized SVD parameters passed to sklearn.utils.extmath.randomized_svd.

10
n_iter int

Randomized SVD parameters passed to sklearn.utils.extmath.randomized_svd.

10
random_state int

Randomized SVD parameters passed to sklearn.utils.extmath.randomized_svd.

10

Attributes:

Name Type Description
mean_ FloatArray | None

Mean snapshot over the training set, shape (n_obs,).

components_ FloatArray | None

POD modes, shape (rank, n_obs) (rows are modes).

singular_values_ FloatArray | None

Retained singular values, shape (rank,).

explained_energy_ FloatArray | None

Cumulative explained energy, shape (rank,) in [0, 1].

Notes

The explained energy is based on squared singular values. If S denotes the singular values of Yc, then the cumulative explained energy is

.. math::

E(k) = \frac{\sum_{i=1}^{k} S_i^2}{\sum_{i} S_i^2}.

This is commonly used to choose a rank by thresholding E(k) (e.g. 0.99).

Attributes
is_fitted property
is_fitted

Whether fit has been called and fitted attributes are available.

n_time_ property
n_time_

Observation dimension of the fitted snapshots.

The name n_time_ is kept for compatibility with trajectory use cases, but it should be interpreted as the number of observation components (n_obs).

rank_ property
rank_

Effective retained rank after fitting.

Functions
fit
fit(Y)

Fit a POD basis from snapshots.

Parameters:

Name Type Description Default
Y ArrayLike

Snapshot matrix with shape (n_snapshots, n_obs).

required

Returns:

Type Description
self

Fitted instance.

Raises:

Type Description
ValueError

If rank is not positive or if Y is not 2D.

transform
transform(Y)

Project snapshots onto the POD basis.

Parameters:

Name Type Description Default
Y ArrayLike

Snapshot matrix of shape (n_snapshots, n_obs).

required

Returns:

Type Description
A

POD coefficients of shape (n_snapshots, rank).

Raises:

Type Description
RuntimeError

If the POD instance is not fitted.

ValueError

If Y has an incompatible observation dimension.

inverse_transform
inverse_transform(A)

Reconstruct snapshots from POD coefficients.

Parameters:

Name Type Description Default
A ArrayLike

POD coefficient matrix of shape (n_snapshots, rank).

required

Returns:

Type Description
Y_hat

Reconstructed snapshots of shape (n_snapshots, n_obs).

Raises:

Type Description
RuntimeError

If the POD instance is not fitted.

ValueError

If A has an incompatible coefficient dimension.

fit_transform
fit_transform(Y)

Fit POD and return coefficients for the same snapshots.

Parameters:

Name Type Description Default
Y ArrayLike

Snapshot matrix of shape (n_snapshots, n_obs).

required

Returns:

Type Description
A

POD coefficients of shape (n_snapshots, rank).

PODGPSurrogate dataclass

PODGPSurrogate(pod, gp, coeff_var_floor=1e-12, y_var_floor=1e-14)

POD-GP surrogate for trajectory- or field-valued model outputs.

This surrogate implements a common two-step construction for high-dimensional outputs:

  1. Compression (POD): map each snapshot/trajectory y to a low-dimensional coefficient vector a ∈ R^r.
  2. Regression (GP): learn the mapping θ → a using a Gaussian process.

The surrogate then provides predictions in the original observation space by reconstructing:

Functions
predict
predict(theta)

Predict mean output and pointwise variance in observation space.

Parameters:

Name Type Description Default
theta ArrayLike

Parameters of shape (d,) or batch of shape (n, d).

required

Returns:

Type Description
y_mean

Predictive mean in observation space.

  • shape (n_obs,) if theta is 1D
  • shape (n, n_obs) if theta is 2D
y_var

Predictive pointwise variance in observation space, same shape as y_mean.

Notes

The returned variance is obtained by propagating coefficient-wise variances through the POD reconstruction under an independence approximation.

update
update(theta, y_true)

Update the surrogate with one new high-fidelity observation.

This method projects the new HF snapshot into POD coefficient space and updates the GP with a single new training point.

Parameters:

Name Type Description Default
theta ArrayLike

Parameter vector of shape (d,) (or (1, d)).

required
y_true ArrayLike

HF snapshot/trajectory in observation space, shape (n_obs,) (or (1, n_obs)).

required

Raises:

Type Description
ValueError

If a batch (more than one observation) is provided.

RuntimeError

If the POD object is not fitted.

log_likelihood
log_likelihood()

Return the summed marginal log-likelihood of the underlying GP(s).

copy
copy()

Return a deep copy of the surrogate.

This is useful when a workflow needs independent surrogate state, e.g. when running truly independent chains or experiments.

Notes

Many active-learning workflows instead prefer shared state across deepcopies (see AdaptiveMetropolisShared).

Functions

pod_energy

pod_energy(Y, *, r_max=None, randomized=True, n_oversamples=10, n_iter=2, random_state=0)

Compute the cumulative POD energy curve for a snapshot matrix.

This is a lightweight helper for choosing a POD rank without instantiating a POD object.

Parameters:

Name Type Description Default
Y ArrayLike

Snapshot matrix with shape (n_snapshots, n_obs).

required
r_max int | None

Maximum rank to compute. If None, uses min(n_snapshots, n_obs).

None
randomized bool

If True, use randomized SVD. If False, use deterministic SVD.

True
n_oversamples int

Randomized SVD parameters.

10
n_iter int

Randomized SVD parameters.

10
random_state int

Randomized SVD parameters.

10

Returns:

Type Description
explained_energy

1D array of length r (the computed rank). Entry k is the cumulative explained energy up to mode k (0-indexed). Values are in [0, 1].

Raises:

Type Description
ValueError

If Y is not 2D or if r_max is not positive when provided.