Surrogates
surrogates
Classes
MultiOutputGP
MultiOutputGP(X_train, Y_train, *, kernel='matern52', ard=True, noise_variance=1e-06, update_every=10, n_retrain_max=20)
Multi-output GP implemented as independent single-output GPs.
This class represents a vector-valued regression function by training one
SingleOutputGP per output dimension.
Outputs are therefore conditionally independent given inputs.
Shapes
Training:
- X_train: shape (n_train, n_dim)
- Y_train: shape (n_train, n_out)
Prediction:
- X: shape (n, n_dim)
- mean: shape (n, n_out)
- var: shape (n, n_out)
Update:
- X_new: shape (1, n_dim)
- y_new: shape (n_out,) or (1, n_out)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X_train
|
ArrayLike
|
Training inputs of shape |
required |
Y_train
|
ArrayLike
|
Training targets of shape |
required |
kernel
|
KernelName
|
Passed to each |
'matern52'
|
ard
|
KernelName
|
Passed to each |
'matern52'
|
noise_variance
|
KernelName
|
Passed to each |
'matern52'
|
update_every
|
KernelName
|
Passed to each |
'matern52'
|
n_retrain_max
|
KernelName
|
Passed to each |
'matern52'
|
Notes
- This is a pragmatic design that keeps the implementation simple and robust. If you need correlated multi-output GPs (coregionalisation), a different model class should be introduced.
- Online updates append one joint observation by updating each output GP.
Attributes:
| Name | Type | Description |
|---|---|---|
n_out |
Number of output dimensions. |
See Also
SingleOutputGP
Underlying scalar GP used per output dimension.
Attributes
n_train
property
n_train
Number of training points stored (shared across outputs).
Functions
predict
predict(X)
Predict mean and marginal variance for all outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ArrayLike
|
Query inputs of shape |
required |
Returns:
| Type | Description |
|---|---|
mean
|
Array of shape |
var
|
Array of shape |
update
update(X_new, y_new)
Append one new joint observation for all outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X_new
|
ArrayLike
|
New input of shape |
required |
y_new
|
ArrayLike
|
New outputs of shape |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the shapes do not describe exactly one new joint observation. |
log_likelihood
log_likelihood()
Return the sum of marginal log-likelihoods across output GPs.
SingleOutputGP
SingleOutputGP(X_train, y_train, *, kernel='matern52', ard=True, noise_variance=1e-06, update_every=10, n_retrain_max=20)
Single-output GP regression with input/output standardisation.
This is a thin wrapper around GPy.models.GPRegression that standardises inputs
and outputs using sklearn.preprocessing.StandardScaler.
Standardisation is useful in practice because:
- it makes kernel lengthscales easier to learn (inputs have comparable scales),
- it stabilises optimisation of hyperparameters when outputs vary in magnitude.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X_train
|
ArrayLike
|
Training inputs of shape |
required |
y_train
|
ArrayLike
|
Training targets of shape |
required |
kernel
|
KernelName
|
Kernel name. Supported: |
'matern52'
|
ard
|
bool
|
If True, use ARD (one lengthscale per input dimension). |
True
|
noise_variance
|
float
|
Initial observation noise variance for the GP likelihood. |
1e-06
|
update_every
|
int
|
Retraining period: after every |
10
|
n_retrain_max
|
int
|
Maximum number of hyperparameter re-optimisations triggered by updates. |
20
|
Notes
updateappends exactly one new observation with shape(1, n_dim) -> (1, 1).- Hyperparameter optimisation uses
GPy's.optimize()routine. - This class intentionally avoids any I/O and does not expose plotting.
See Also
MultiOutputGP
Multi-output wrapper implemented as independent single-output GPs.
Attributes
n_train
property
n_train
Number of training points currently stored by the GP.
Functions
predict
predict(X)
Predict mean and marginal variance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ArrayLike
|
Query inputs of shape |
required |
Returns:
| Type | Description |
|---|---|
mean
|
Predictive mean of shape |
var
|
Predictive marginal variance of shape |
Notes
GPy returns mean and variance in standardised output space; this method
transforms them back to the original units. If y = a * y_std + b, then
Var[y] = Var[a * y_std] = a^2 Var[y_std].
update
update(X_new, y_new)
Append one new observation and optionally retrain hyperparameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X_new
|
ArrayLike
|
New input with shape |
required |
y_new
|
ArrayLike
|
New target with shape |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provided data cannot be interpreted as exactly one new observation. |
Notes
This method:
- standardises
(X_new, y_new)using the scalers fit at construction time, - appends the standardised data to the underlying
GPymodel, - triggers internal re-optimisation according to the update schedule.
log_likelihood
log_likelihood()
Return the GP marginal log-likelihood in standardised space.
POD
dataclass
POD(rank, randomized=True, n_oversamples=10, n_iter=2, random_state=0, mean_=None, components_=None, singular_values_=None, explained_energy_=None)
Proper Orthogonal Decomposition (POD) estimator.
The POD basis is computed via an SVD of the mean-centered snapshot matrix:
- compute the column-wise mean
mean, - form the centered matrix
Yc = Y - mean, - take the first
rankright singular vectors as POD modes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rank
|
int
|
Number of modes to retain. The effective rank is clipped to
|
required |
randomized
|
bool
|
If True, use randomized SVD (often faster for large snapshot matrices).
If False, use a deterministic full SVD ( |
True
|
n_oversamples
|
int
|
Randomized SVD parameters passed to |
10
|
n_iter
|
int
|
Randomized SVD parameters passed to |
10
|
random_state
|
int
|
Randomized SVD parameters passed to |
10
|
Attributes:
| Name | Type | Description |
|---|---|---|
mean_ |
FloatArray | None
|
Mean snapshot over the training set, shape |
components_ |
FloatArray | None
|
POD modes, shape |
singular_values_ |
FloatArray | None
|
Retained singular values, shape |
explained_energy_ |
FloatArray | None
|
Cumulative explained energy, shape |
Notes
The explained energy is based on squared singular values. If S denotes the singular
values of Yc, then the cumulative explained energy is
.. math::
E(k) = \frac{\sum_{i=1}^{k} S_i^2}{\sum_{i} S_i^2}.
This is commonly used to choose a rank by thresholding E(k) (e.g. 0.99).
Attributes
is_fitted
property
is_fitted
Whether fit has been called and fitted attributes are available.
n_time_
property
n_time_
Observation dimension of the fitted snapshots.
The name n_time_ is kept for compatibility with trajectory use cases, but it
should be interpreted as the number of observation components (n_obs).
rank_
property
rank_
Effective retained rank after fitting.
Functions
fit
fit(Y)
Fit a POD basis from snapshots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Y
|
ArrayLike
|
Snapshot matrix with shape |
required |
Returns:
| Type | Description |
|---|---|
self
|
Fitted instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
transform
transform(Y)
Project snapshots onto the POD basis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Y
|
ArrayLike
|
Snapshot matrix of shape |
required |
Returns:
| Type | Description |
|---|---|
A
|
POD coefficients of shape |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the POD instance is not fitted. |
ValueError
|
If |
inverse_transform
inverse_transform(A)
Reconstruct snapshots from POD coefficients.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
A
|
ArrayLike
|
POD coefficient matrix of shape |
required |
Returns:
| Type | Description |
|---|---|
Y_hat
|
Reconstructed snapshots of shape |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the POD instance is not fitted. |
ValueError
|
If |
fit_transform
fit_transform(Y)
Fit POD and return coefficients for the same snapshots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Y
|
ArrayLike
|
Snapshot matrix of shape |
required |
Returns:
| Type | Description |
|---|---|
A
|
POD coefficients of shape |
PODGPSurrogate
dataclass
PODGPSurrogate(pod, gp, coeff_var_floor=1e-12, y_var_floor=1e-14)
POD-GP surrogate for trajectory- or field-valued model outputs.
This surrogate implements a common two-step construction for high-dimensional outputs:
- Compression (POD): map each snapshot/trajectory
yto a low-dimensional coefficient vectora ∈ R^r. - Regression (GP): learn the mapping
θ → ausing a Gaussian process.
The surrogate then provides predictions in the original observation space by reconstructing:
Functions
predict
predict(theta)
Predict mean output and pointwise variance in observation space.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta
|
ArrayLike
|
Parameters of shape |
required |
Returns:
| Type | Description |
|---|---|
y_mean
|
Predictive mean in observation space.
|
y_var
|
Predictive pointwise variance in observation space, same shape as |
Notes
The returned variance is obtained by propagating coefficient-wise variances through the POD reconstruction under an independence approximation.
update
update(theta, y_true)
Update the surrogate with one new high-fidelity observation.
This method projects the new HF snapshot into POD coefficient space and updates the GP with a single new training point.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
theta
|
ArrayLike
|
Parameter vector of shape |
required |
y_true
|
ArrayLike
|
HF snapshot/trajectory in observation space, shape |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If a batch (more than one observation) is provided. |
RuntimeError
|
If the POD object is not fitted. |
log_likelihood
log_likelihood()
Return the summed marginal log-likelihood of the underlying GP(s).
copy
copy()
Return a deep copy of the surrogate.
This is useful when a workflow needs independent surrogate state, e.g. when running truly independent chains or experiments.
Notes
Many active-learning workflows instead prefer shared state across deepcopies
(see AdaptiveMetropolisShared).
Functions
pod_energy
pod_energy(Y, *, r_max=None, randomized=True, n_oversamples=10, n_iter=2, random_state=0)
Compute the cumulative POD energy curve for a snapshot matrix.
This is a lightweight helper for choosing a POD rank without instantiating a
POD object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
Y
|
ArrayLike
|
Snapshot matrix with shape |
required |
r_max
|
int | None
|
Maximum rank to compute. If None, uses |
None
|
randomized
|
bool
|
If True, use randomized SVD. If False, use deterministic SVD. |
True
|
n_oversamples
|
int
|
Randomized SVD parameters. |
10
|
n_iter
|
int
|
Randomized SVD parameters. |
10
|
random_state
|
int
|
Randomized SVD parameters. |
10
|
Returns:
| Type | Description |
|---|---|
explained_energy
|
1D array of length |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |