Creative Commons License
www.cgranade.com/research/talks/quics/07-2016

Ferrie, Granade, Cory tfx

Granade et al. s87

Wiebe et al. tf3

Wiebe et al. 7nx

Wiebe et al. tdk

Granade, Combes, Cory bhdw

Wiebe and Granade bk9d

Liu and West 8c2

Doucet et al. bmch

Ferrie and Granade tdj

Granade 10012/9217

figure: Ferrie tb4

Svore et al. 1304.0741

Isard and Blake cc76f6

qinfer.org

Rejection and Particle Filtering for Hamiltonian Learning


Cassandra E. Granade
Centre for Engineered Quantum Systems $ \newcommand{\ee}{\mathrm{e}} \newcommand{\ii}{\mathrm{i}} \newcommand{\dd}{\mathrm{d}} \newcommand{\id}{𝟙} \newcommand{\TT}{\mathrm{T}} \newcommand{\defeq}{\mathrel{:=}} \newcommand{\Tr}{\operatorname{Tr}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\expect}{\mathbb{E}} \newcommand{\sket}[1]{|#1\rangle\negthinspace\rangle} \newcommand{\sbraket}[1]{\langle\negthinspace\langle#1\rangle\negthinspace\rangle} \newcommand{\Gini}{\operatorname{Ginibre}} \newcommand{\supp}{\operatorname{supp}} \newcommand{\ket}[1]{\left|#1\right\rangle} \newcommand{\bra}[1]{\left\langle#1\right|} \newcommand{\braket}[1]{\left\langle#1\right\rangle} $

Learning Hamiltonians is critical to a range of QIP tasks:

Metrology
Learning magnetic fields, etc.
Calibration
Static field / pulse power / crosstalk, etc.
Debugging/Diagnosis
$T_2$ estimation, other noise finding
Verification/Validation
Analog and digital quantum simulation
### **Example**: Ramsey Estimation ### Suppose $H = \omega \sigma_z / 2$ for some unknown $\omega$. Traditional approach: - Prepare $\ket{+} \propto \ket{0} + \ket{1}$, measure “click” w/ pr.: $ \|\bra{+} \ee^{\ii \omega t \sigma_z / 2} \ket{+}\|^2 = \cos^2(\omega t / 2) $. - Repeat for many “shots” to estimate click pr. - Repeat for many times to estimate signal.
You'll get something that looks a bit like this:
What's $\omega$? Fourier transform and look at the peak.
We can do better. # $H = H(\vec{x})$. # Hamiltonian learning is a special case of *parameter estimation*: given data $D$, what is $\vec{x}$?
--- We want an approach that can work for small and large quantum devices alike.

To get there, we consider several different approaches to parameter estimation.

Analytic Sergeevich et al. c4vv95
  Ferrie, Granade, and Cory tfx
Rejection filter Wiebe and Granade bk9d
Particle filter Doucet et al. bmch
Likelihood-free particle filter Ferrie and Granade tdj
### The Likelihood Function ### $\|\bra{+} \ee^{\ii \omega t \sigma_z / 2} \ket{+}\|^2$ defines probability $\Pr(d | \omega; t)$ for every outcome $d$, model $\omega$ and experiment $t$. Basis for both maximum-likelihood and Bayesian methods.
### Bayesian Parameter Estimation ### The likelihood tells us what we learn from data: $$ \Pr(\vec{x} | d; e) = \frac{\Pr(d | \vec{x}; e)}{\Pr(d | e)} \Pr(\vec{x}) $$ --- Estimate $\hat{x} = \expect[\vec{x} | d; e] = \int \vec{x} \Pr(\vec{x} | d; e)\dd \vec{x}$. - **Optimal** for mean—squared error.
Each posterior $\Pr(\vec{x} | d; e)$ encodes our uncertainty about $x$. What should we expect squared-error to be? **Scalar case**: $ \expect[(x - \hat{x})^2 | d; e] = \mathbb{V}[x \mid d; e]. $ **Vector case**: $\expect[(\vec{x} - \hat{x}) (\vec{x} - \hat{x})^\TT | d; e] = \Cov(\vec{x} | d; e)$.
We can use our posteriors to make *adaptive* decisions: $$ e_* = \operatorname{arg min}_e \expect_d[\mathbb{V}(x | d; e)] $$
### **Example**: $x = (\omega)$ ### Can analytically find posterior for Gaussian priors, use to adaptively choose $t_k$. ![](./figures/how-t2.png)

**Problem**: may be intractable to analytically compute $$ \hat{x} \defeq \int \Pr(\vec{x} | d; e) \dd\vec{x} = \int \frac{ \Pr(d | \vec{x}; e) }{ \int \Pr(d | \vec{x}; e) \Pr(\vec{x}) \dd\vec{x} } \Pr(\vec{x}) \dd\vec{x}. $$

--- **Answer**: numerically approximate $\int f(\vec{x}) \Pr(\vec{x} | d)\dd\vec{x}$. *Efficient* to use Monte Carlo integration if we can sample from $\Pr(\vec{x})$.
## Rejection Sampling ## Given samples from $\Pr(\vec{x})$ and likelihood function $\Pr(d | \vec{x}; e)$, how do we sample from posterior for datum $d$? - Draw $\vec{x} \sim \Pr(\vec{x})$, accept $\vec{x}$ w/ $\Pr(d | \vec{x}; e)$. Accepted samples are distributed according to posterior.

Rejection Sampling Isn't Enough

Let $D = {d_1, \dots, d_k}$ be a set of data.

$$ \Pr(\text{accept} | \vec{x}) = \Pr(D | \vec{x}) = \prod_{d \in D} \Pr(d | \vec{x}) \overset{k \to \infty}{\longrightarrow} 0. $$


Example: Biased Coin $x = (p)$

$\Pr(H | p) = p$, $d \in \{H, T\}$.

$p \approx 0.5 \Longrightarrow \Pr(d_1, \dots, d_k | p) \approx 1 / 2^k$.

We will accept exponentially few samples!

### Gaussian Resampling ### For each datum $d$, use rejection sampling to approximate posterior moments: - $\bar{x} \gets \expect[\vec{x} | d]$. - $\Sigma \gets \operatorname{Cov}[\vec{x} | d] = \expect[\vec{x} \vec{x}^\TT | d] - \bar{x} \bar{x}^\TT$.
--- At the next iteration, draw prior samples from Gaussian with these moments: $$ \vec{x} \mid d \sim \mathcal{N}(\bar{x}, \Sigma) $$ Keeps $\Pr(\text{accept}) \approx \text{constant}$.

Can compute $\bar{x}$, $\Sigma$ from one sample at a time by accumulating

$$ x_{\Sigma} = \sum x \text{ and } x^2_{\Sigma} = \sum x^2. $$

\begin{align} \bar{x} & = x_{\Sigma} / n_{\text{accept}} \\ \Sigma & = x^2_{\Sigma} / n_{\text{accept}} - \bar{x}^2. \end{align}

Welford's algorithm: numerically-stable modification.

Rejection Filtering (RejF)

Input: Prior mean $\bar{x}$, prior covariance $\Sigma$, number of attempts $m$.

  • For each datum $d$ and experiment $e$:
    • $n, \bar{x}', M_2 \gets 0$ Initialize Welford.
    • While $n < m$:
      • Draw $\vec{x} \sim \mathcal{N}(\bar{x}, \Sigma)$. Sample f/ prior.
      • Accept $\vec{x}$ w/ $\Pr(d | \vec{x}; e)$.
      • If accepted, update $n$, $\bar{x}'$, $M_2$.
    • $\bar{x} \gets \bar{x}'$, $\Sigma \gets M_2 / (n - 1)$. Est. moments.
## Advantages of RejF ## - Accept pr based on a single datum ⇒ likelihood $\not\to 0$. - Easy to implement - Never needed to remember each accepted $x$! - Very low-memory (constant # of registers), ideal for FPGA use. - Easily parallelizable - “Reset rule” can abort from approximation failures.

Example: Phase Estimation, $x = (\phi)$

Prepare state $\ket{\phi}$ s. t. $U\ket{\phi} = \ee^{\ii \phi}\ket{\phi}$, measure to learn $\phi$.

$\theta = 0 \Rightarrow$ freq. est likelihood, w/ $\phi = \omega$, $M = t$.

Applications

Interferometry / metrology Higgins et al. crwd6w
Gate calibration / robust PE Kimmel et al. bmrg
Quantum simulation and chemistry Reiher et al. 1605.03590

Example: Phase Estimation, $x = (\phi)$

**Drawback**: RejF requires posterior after each datum to be $\approx$ Gaussian.

We can solve this by using a more general approach: - Weaken Gaussian assumption. - Generalize the rejection sampling step.

Liu-West Resampler

If we remember each sample $\vec{x}$, we can use them to relax RejF assumptions.

Input: $a, h \in [0, 1]$ s.t. $a^2 + h^2 = 1$, distribution $p(\vec{x})$.

  • Approximate $\bar{x} \gets \expect[\vec{x}]$, $\Sigma \gets \operatorname{Cov}(\vec{x})$
  • Draw parent $\vec{x}$ from $p(\vec{x})$.
  • Draw $\vec{\epsilon} \sim \mathcal{N}(0, \Sigma)$.
  • Return new sample $\vec{x}' \gets a \vec{x} + (1 - a) \bar{x} + h \vec{\epsilon}$.
### Why Does Liu-West Work? ### \begin{align} \vec{x}' & \gets a \vec{x} + (1 - a) \bar{x} + h \vec{\epsilon} \\\\ \expect[\vec{x}'] & = [a + (1 - a)] \bar{x} \\\\ \Cov(\vec{x}') & = (a^2 + h^2) \Cov(\vec{x}). \\\\ \Longrightarrow a^2 + h^2 & = 1 \text{ preserves } \expect[\vec{x}], \Cov(\vec{x}). \end{align} --- Mixes original approximation with $1 - a$ of a Gaussian matching moments. - $a \to 0$: RejF (assumed density) approx - $a \to 1$: Bootstrap - $a = 0.98$: typical case (2% Gaussian).
### From Samples to Particles ### Define a particle $(w_i, \vec{x}_i)$ as a sample $\vec{x}_i$ and a weight $w_i \in [0, 1]$. - $\expect[\vec{x}] = \sum_i w_i \vec{x}_i$ - $\Cov(\vec{x}) = \sum_i w_i \vec{x}_i \vec{x}_i^\TT - \expect[\vec{x}]\expect^\TT[\vec{x}]$ - $\expect[f(\vec{x})] = \sum_i w_i f(\vec{x}_i)$ --- Corresponds to $ p(\vec{x}) \approx \sum_i w_i \delta(\vec{x} - \vec{x}_i). $

Particles can represent distributions using either
weights or positions.

Particle Filter

  • Draw $N$ initial samples $\vec{x}_i$ from the prior $\Pr(\vec{x})$ w/ uniform weights.
  • Instead of rej. sampling, update weights by \begin{align} \tilde{w}_i & = w_i \times \Pr(d | \vec{x}_i; e) \end{align}
  • Renormalize. \begin{align} w_i & \mapsto \tilde{w}_i / \sum_i \tilde{w}_i \end{align}
  • Periodically use Liu-West to draw new $\{\vec{x}_i\}$ with uniform weights. Store posterior in positions.

Useful for Hamiltonian models...

Rabi/Ramsey/Phase est. Granade et al. s87
Swap spectroscopy Stenberg et al. 7nw

...as well as other QIP tasks.

Tomography Huszár and Holsby s86
  Struchalin et al. bmg5
  Ferrie 7nt
  Granade et al. bhdw, 1605.05039
Randomized benchmarking Granade et al. zmz
Continuous measurement Chase and Geremia chk4q7
Interferometry/metrology Granade 10012/9217
### **Example**: $\vec{x} = (\omega, T_2)$ ### ![](figures/unknown-t2.png)

Example: $\vec{x} =$ COSY / Crosstalk

\( H(\omega_1, \omega_2, J) = \frac12 \left( \omega_1 \sigma_z \otimes \id + \omega_2 \id \otimes \sigma_z + J \sigma_z \otimes \sigma_z \right) \)

Example: $\vec{x} =$ COSY / Crosstalk

Error Bars

Particle filters also yield credible regions $X_\alpha$ s.t. $$ \Pr(\vec{x} \in X_\alpha | d; e) \ge \alpha. $$

E.g.: Posterior covariance ellipse, convex hull, MVEE.

#### **QInfer**: Particle Filter Implementation for Quantum Info #### We provide an open-source implementation for use in experiment and theory alike. Written in Python, works with MATLAB and Julia.

Assessing Performance

Risk: average error given $\color{white}{\vec{x}}$
Bayes risk: average error over $\color{white}{\vec{x}}$
--- - Exact $\hat{x}$ optimal for Bayes risk / squared error. - Difficult to *prove* particle filter risk; find in special cases or by running many trials.

Simulation Costs

\begin{align} \tilde{w}_i & = w_i \times \color{red}{\Pr(d | \vec{x}_i; e)} \\ w_i & \mapsto \tilde{w}_i / \sum_i \tilde{w}_i \end{align}

- $\Pr(d | \vec{x}_i; e)$ is a *simulation*. - Need to simulate for each particle (~1000s calls/datum). - Infeasible for large quantum models. Let's do better: use *quantum* simulation instead.
### Two Kinds of Simulation ### ![](figures/simulators.png) Weak simulators produce *plausible datasets*.
### Likelihood-Free RejF ### Replace rej. sampling step by drawing datum from likelihood instead of computing exact value: - Draw datum $d'$ from $\Pr(d | \vec{x}; e)$. - Accept $\vec{x}$ if $d = d'$.
### Likelihood-Free Particle Filtering ### Can also use frequencies $f$ from weak simulation to approximate likelihoods in particle filtering. $$ \widehat{\Pr}(d | \vec{x}; e) = \frac{f(d | \vec{x}; e)}{k} $$ $k$: number of weak simulation calls used.
#### **Example**: Noisy Coin #### How well can we learn the bias $x = (p)$ of a noisy coin? $$ \Pr(\text{click} | p) = 0.95 p + 0.1 (1 - p) $$
### Quantum Hamiltonian Learning ### ![](figures/qle.png)
### **Example**: Nearest-Neighbor 1D Ising ### ![](figures/qle-line-ising.png)
We can do more with access to a *trusted* simulator. ### Quantum Interactivity ### ![](figures/iqle.png) Perform weak simulations on trusted device only.

We design experiments using the

PGH: Particle Guess Heuristic

  • Draw $\vec{x}_-, \vec{x}_-'$ from current posterior.
  • Let $t = 1 / |\vec{x}_- - \vec{x}_-'|$.
  • Return $e = (\vec{x}_-, t)$.

Adaptively chooses experiments such that
$t |\vec{x}_- - \vec{x}_-'| \approx\,$ constant.

### **Example**: Ising on Complete Graph ### ![](figures/iqle-complete-ising.png)
Robust even to wrong model. ($0.5$ NN + $10^{-4}$ Complete) ![](figures/qhl-bad-model.png)

One important approximation: physical locality.

Approximation quality can be bounded if Lieb-Robinson velocity is finite.

Scan trusted device across untrusted.

Run particle filter only on supported parameters.

50 qubit Ising chain, 8 qubit simulator, 4 qubit observable

Going Further

Hyperparameterization Granade et al. s87
$\Pr(d | y) = \expect_x[\Pr(d | x) \Pr(x | y)]$.
Allows composing w/ noise, inhomogeneity, etc.
Time-dependence Isard and Blake cc76f6
Adding timestep update allows learning stochastic processes.
Model selection Ferrie 7nt
Using acceptance ratio or normalizations enables comparing models.
Quantum filtering Wiebe and Granade 1512.03145
Rejection filtering can be quantized using Harrow et al. bcz3hc.

Thank you!

Welford's Algorithm

Can compute $\bar{x}$, $\Sigma$ from one sample at a time. Numerically stable.

  • $n, \bar{x}, M_2 \gets 0$.
  • For each sample $x$:
    • $n \gets n + 1$ Record # of samples
    • $\Delta \gets x - \mu$ Diff to running mean
    • $\bar{x} \gets \bar{x} + \Delta / n$ Update running mean
    • $M_2 \gets M_2 + \Delta (x - \bar{x})$ Update running var
  • Return mean $\bar{x}$, variance $M_2 / (n - 1)$.

Vector case is similar.