Cassandra E. Granade

Centre for Engineered Quantum Systems

joint work with Joshua Combes and D. G. Cory • arXiv coming soon

https://www.cgranade.com/research/talks/iqc/09-2015/
$
\newcommand{\ee}{\mathrm{e}}
\newcommand{\ii}{\mathrm{i}}
\newcommand{\dd}{\mathrm{d}}
\newcommand{\id}{𝟙}
\newcommand{\TT}{\mathrm{T}}
\newcommand{\defeq}{\mathrel{:=}}
\newcommand{\Tr}{\operatorname{Tr}}
\newcommand{\Var}{\operatorname{Var}}
\newcommand{\Cov}{\operatorname{Cov}}
\newcommand{\rank}{\operatorname{rank}}
\newcommand{\expect}{\mathbb{E}}
\newcommand{\sket}[1]{|#1\rangle\negthinspace\rangle}
\newcommand{\sbraket}[1]{\langle\negthinspace\langle#1\rangle\negthinspace\rangle}
\newcommand{\Gini}{\operatorname{Ginibre}}
\newcommand{\supp}{\operatorname{supp}}
$

This talk can be summarized in one slide.

\begin{equation} \Pr(\text{model} | \text{data}) = \frac{\Pr(\text{data} | \text{model})}{\Pr(\text{data})} \Pr(\text{model}) \end{equation}

Bayesian methods for experimental data processing *work*.

**Quantum tomography:** Characterizing quantum states and processes from experimental data.

From a statistical perspective, this is a *parameter estimation*
problem.

We want tomographic methods that: - Provide accurate estimates. - Characterize their uncertainty. - Allow inclusion of prior knowledge. - Can track changes with time. - Come with “off-the-shelf” implementations.

We get the first two for free from taking a Bayesian perspective.

- What prior should we use? - How should we update estimates in time? - How can we efficiently implement Bayesian tomography?

In this talk, we will address all three challenges.

See also: - Jones [10/fpbsfw](https://dx.doi.org/10/fpbsfw) - Blume-Kohout [10/cn772j](https://dx.doi.org/10/cn772j)

Suppose we want to learn $x$ from data $D$.
Then
\[
\Pr(x | D) = \frac{\Pr(D | x)}{\Pr(D)} \Pr(x)
\]
is the *posterior* distribution over $x$.

$\Pr(x | D)$ describes what we know about $x$.

We estimate $x$ by the expectation value \[ \hat{x} = \expect[x | D] = \int x\,\Pr(x|D)\, \dd x. \]

This estimator is *optimal* for the mean squared
error.

We estimate our error in the same way: \begin{align*} \expect[(x - \hat{x}(D))^2 | D] & = \expect[(x - \expect[x | D])^2 | D] \\ & = \Var(x | D). \end{align*} For multiple parameters, \[ \expect[(\vec{x} - \hat{\vec{x}})^\TT (\vec{x} - \hat{\vec{x}}) | D] = \Tr(\Cov(\vec{x} | D)). \]

More generally, \[ \hat{f} = \expect[f(x) | D]. \]

$\Pr(D | x)$ is a *likelihood function*
that specifies an experimental model.

For state tomography, our likelihood is
*Born's rule*,
\[
\Pr(E | \rho) = \Tr[E \rho],
\]
where $\rho$ is a state and $E$ is a measurement effect.

Thus, Bayes' rule allows us to estimate $\rho$.

Same techniques as before apply to enable learning *snapshots* of dynamics.
Choi-Jamiołkowski isomorphism lets us rewrite process tomography as (ancilla-assisted)
state tomography.

\[ \Tr[E \Lambda(\rho)] = \Tr[(\id \otimes E) J(\Lambda) (\rho^\TT \otimes \id)] = \sbraket{\rho^\TT, E | J(\Lambda)} \]

We parameterize a state $\rho$ as a real vector $\vec{x}$, \[ x_i = \sbraket{B_i | \rho} = \Tr(B_i^\dagger \rho), \] where $\{B_i\}$ is a basis of Hermitian operators.

By convention, $\Tr(B_i) = \delta_{i0} / \sqrt{D}$. E.g.:

- Pauli basis
- Gell-Mann basis

For state tomography, the BME is approximately optimal for the fidelity (Ferrie and Keung [1503.00677](https://arxiv.org/abs/1503.00677)).

The error in an observable $X$ is given by the covariance superoperator $\Sigma\rho = \Cov(\sket{\rho})$, \begin{align*} \Var(X) & = \Var_{\rho}[\langle X\rangle_{\rho}] + \langle X^2\rangle_{\expect[\rho]} - \langle X\rangle_{\expect[\rho]}^2 \\ & = \sbraket{X | \Sigma\rho | X} + \sbraket{X | X | \hat\rho} - \langle X\rangle_{\expect[\rho]}. \end{align*} (Blume-Kohout 10/cn772j)

In practice, Bayesian mean estimation is not tractable in the exact case.
We thus use *particle filtering* to approximate.

- Huszár and Houlsby [10/s86](https://dx.doi.org/10/s86) - Ferrie [10/7nt](https://dx.doi.org/10/7nt)

Implemented by the QInfer library for Python.

\begin{align} \Pr(\rho) & \approx \sum_{p\in\text{particles}} w_p \delta(\rho - \rho_p) \\ w_p & \mapsto w_p \times \Pr(E | \rho_p) / \mathcal{N} \end{align}

Big advantage: we only need *samples* from the prior!

As we collect data, this becomes unstable, so we must *resample*.

Particle filtering is used in a range of quantum information applications.

- Hamiltonian learning: Granade *et al.* [10/s87](https://dx.doi.org/10/s87), Stenberg *et al.* [10/7nw](https://dx.doi.org/10/7nw) - Randomized benchmarking: Granade *et al.* [10/zmz](https://dx.doi.org/10/zmz) - Quantum Hamiltonian learning and bootstrapping: Wiebe *et al.* [10/tdk](https://dx.doi.org/10/tdk), Wiebe *et al.* [10/7nx](https://dx.doi.org/10/7nx) - Phase estimation: Wiebe and Granade [1508.00869](https://arxiv.org/abs/1508.00869)

(You Forgot My Favorite Algorithm)

- MCMC: works, but isn't streaming or time-dependent. - Rejection filtering (Wiebe and Granade [1508.00869](https://arxiv.org/abs/1508.00869)): only keeps sufficient statistics; need more expressive instrumental distribution.

Expressing as Bayesian parameter estimation / particle filtering problem affords us a few other advantages.

A credible region $R_\alpha$ for $\rho$ satisfies \[ \Pr(\rho \in R_{\alpha} | D) \ge \alpha. \]

Construct from particle approximation to posterior, covariance regions (Granade *et al* [10/s87](https://dx.doi.org/10/s87)), convex hull or minimum-volume enclosing ellipse (Ferrie [10/tb4](https://dx.doi.org/10/tb4)).

Built-in to QInfer.

- Akaike / Bayesian information criterion: Guţă *et al.* [10/7nz](https://dx.doi.org/10/7nz), van Enk and Blume-Kohout [10/7n2](https://dx.doi.org/10/7n2) - Bayes factors: Wiebe *et al.* [10/tdk](https://dx.doi.org/10/tdk) \begin{equation} \text{BF}(A : B) = \frac{\Pr(A | \text{data})}{\Pr(B | \text{data})} \end{equation} - Model averaging: Ferrie [10/7nt](https://dx.doi.org/10/7nt)

Bayes factor–based model selection built-in to QInfer.

\begin{align*} \text{Suppose } \vec{x} & \sim \Pr(\vec{x} | \vec{y}) \\ \text{Then, } \Pr(D | \vec{y}) & = \expect_{\vec{x} | \vec{y}} [\Pr(D | \vec{x}, \vec{y})] \\ & = \int \Pr(D | \vec{x}, \vec{y}) \Pr(\vec{x} | \vec{y})\,\dd\vec{x}. \end{align*} Marginalizing gives us a likelihood for the hyperparameters $\vec{y}$!

Allows us to include effects outside of Born's rule.
For instance, non-Cauchy decoherence (Granade *et al.* 10/s87).

But how do we choose our prior? Let's get to the meat of things.

In order to be useful, a prior over states should: - Have support over all states, - Allow us to encode our assumptions, and - Be efficient to draw samples from.

It helps to consider an *uninformative* prior first.

- Let $X$ be a $N \times K$ matrix with complex Gaussian entries.
- $\rho \gets XX^\dagger / \Tr(XX^\dagger)$.

$\rank(\rho) = K$. If $K = 1$, $\rho$ is pure. If $K = N$, Hilbert-Schmidt prior.

**NB:** Choosing $X$ to be real gives Ginibre over redits.

```
import qinfer as qi
basis = qi.tomography.pauli_basis(1)
prior = qi.tomography.GinibreReditDistribution(basis)
qi.tomography.plot_rebit_prior(prior, rebit_axes=[1, 3])
```

What makes $\Gini(N, K)$ uninformative? \[ \expect[\rho] = \id / N. \] The mean is the maximally-mixed state.

- Haar, uniform over pure states - Bures, uniform over mixed states - BCSZ, uniform over CPTP maps

How do we add information to the prior, specifically the prior estimate state $\rho_\mu$?

**Big idea:** Use an ensemble of
amplitude damping channels to transform a uniform prior.

Let $\phi$ be a *fiducial prior*.
Then, for scalars $\alpha,\beta$ and a state $\rho_*$,
draw $\rho_{\text{sample}}$ by:

- Draw $\rho \sim \phi$.
- Draw $\epsilon \sim \mathrm{Beta}(\alpha, \beta)$.
- $\rho_{\text{sample}} \gets (1 - \epsilon) \rho + \epsilon \rho_*$

**NB:** $\supp \pi \supseteq \supp \phi$

Choose $\rho_*$ s. t. $\expect_{\rho\sim\pi}[\rho] = \rho_\mu$:

\[ \rho_* = \left(\frac{\alpha + \beta}{\alpha}\right) \left( \rho_\mu - \frac{\beta}{\alpha+\beta} \frac{\id}{N} \right) \]

Choose $\alpha,\beta$ s. t. $\expect[\epsilon]$ is minimized: \[ \alpha = 1 \qquad \beta = \frac{\lambda_\min}{1 - N \lambda_\min}, \] where $\lambda_\min$ is the smallest eigenvalue of $\rho_\mu$.

*
That is, we contract the fiducial prior as little as possible
to obtain the desired mean.
*

This construction preserves the support of our “flat” prior, takes $\rho_\mu$ as an input and can be easily sampled.

Inherits other assumptions by convexity (e.g.: rebit, CPTP, positivity, etc.).

```
import qinfer as qi
import qutip as qt
I, X, Y, Z = qt.qeye(2), qt.sigmax(), qt.sigmay(), qt.sigmaz()
prior_mean = (I + (2/3) * Z + (1/3) * X) / 2
basis = qi.tomography.pauli_basis(1)
fiducial_prior = qi.tomography.GinibreReditDistribution(basis)
prior = qi.tomography.GADFLIDistribution(fiducial_prior, prior_mean)
```

QInfer's tomography support is backed by QuTiP.

Posterior covariance characterizes uncertainty.

Principal channels tell us which components dominate our uncertianty.

(a quick tutorial)

All models are wrong, some are useful. —Chris Ferrie

We've seen how to create bases and priors,
to finish we need a *model*, an
*updater* and a *heuristic*.

```
model = qi.BinomialModel(qi.tomography.TomographyModel(basis))
```

The sequential Monte Carlo updater performs Bayes updates using particle filtering.

```
updater = qi.smc.SMCUpdater(model, 2000, prior)
heuristic = qi.tomography.RandomPauliHeuristic(updater,
other_fields={'n_meas': 40}
)
```

This heuristic measures random Paulis 40 times each.

```
for idx_exp in xrange(50):
experiment = heuristic()
# This is where your data goes! 💗
# For now, we'll simulate. 💔
datum = model.simulate_experiment(true_state, experiment)
updater.update(datum, experiment)
```

Interlace Bayesian updates with
*diffusion*
\[
\rho(t_{k+1}) = \rho(t_k) + \Delta \eta.
\]
and *truncation*.

Draw each traceless parameter of the diffusion step $\Delta \eta$ from a Gaussian.

(Isard and Blake [10/cc76f6](https://dx.doi.org/10/cc76f6))

We don't need to assume that the "true" state follows a random walk.