Quantum Circuit Born Machine

Quantum circuit born machine is a fresh approach to quantum machine learning. It use a parameterized quantum circuit to learning machine learning tasks with gradient based optimization. In this tutorial, we will show how to implement it with Yao (幺) framework.

using Yao, Plots

@doc 幺

  Extensible Framework for Quantum Algorithm Design for Humans.

  简单易用可扩展的量子算法设计框架。

  幺 means unitary in Chinese.

Training target

a gaussian distribution

function gaussian_pdf(n, μ, σ)
    x = collect(1:1<<n)
    pl = @. 1 / sqrt(2pi * σ^2) * exp(-(x - μ)^2 / (2 * σ^2))
    pl / sum(pl)
end

gaussian_pdf (generic function with 1 method)

\[f(x \left| \mu, \sigma^2\right) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]

const n = 6
const maxiter = 20
pg = gaussian_pdf(n, 2^5-0.5, 2^4)
fig = plot(0:1<<n-1, pg)

Build Circuits

Building Blocks

Gates are grouped to become a layer in a circuit, this layer can be Arbitrary Rotation or CNOT entangler. Which are used as our basic building blocks of Born Machines.

Arbitrary Rotation

Arbitrary Rotation is built with Rotation Gate on Z, Rotation Gate on X and Rotation Gate on Z:

\[Rz(\theta) \cdot Rx(\theta) \cdot Rz(\theta)\]

Since our input will be a $|0\dots 0\rangle$ state. The first layer of arbitrary rotation can just use $Rx(\theta) \cdot Rz(\theta)$ and the last layer of arbitrary rotation could just use $Rz(\theta)\cdot Rx(\theta)$

In 幺, every Hilbert operator is a block type, this includes all quantum gates and quantum oracles. In general, operators appears in a quantum circuit can be divided into Composite Blocks and Primitive Blocks.

We follow the low abstraction principle and thus each block represents a certain approach of calculation. The simplest Composite Block is a Chain Block, which chains other blocks (oracles) with the same number of qubits together. It is just a simple mathematical composition of operators with same size. e.g.

\[\text{chain(X, Y, Z)} \iff X \cdot Y \cdot Z\]

We can construct an arbitrary rotation block by chain $Rz$, $Rx$, $Rz$ together.

chain(Rz(0), Rx(0), Rz(0))

Total: 1, DataType: Complex{Float64}
chain
├─ Rot Z gate: 0.0
├─ Rot X gate: 0.0
└─ Rot Z gate: 0.0

chain(X)

Total: 1, DataType: Complex{Float64}
chain
└─ X gate

Rx, Ry and Rz will construct new rotation gate, which are just shorthands for rot(X, 0.0), etc.

Then, let's pile them up vertically with another method called rollrepeat

layer(x::Symbol) = layer(Val(x))
layer(::Val{:first}) = rollrepeat(chain(Rx(0), Rz(0)))
rollrepeat(chain(Rx(0), Rz(0)))(4)

Total: 4, DataType: Complex{Float64}
roller
├─ chain
│  ├─ Rot X gate: 0.0
│  └─ Rot Z gate: 0.0
├─ chain
│  ├─ Rot X gate: 0.0
│  └─ Rot Z gate: 0.0
├─ chain
│  ├─ Rot X gate: 0.0
│  └─ Rot Z gate: 0.0
└─ chain
   ├─ Rot X gate: 0.0
   └─ Rot Z gate: 0.0

In 幺, the factory method rollrepeat will construct a block called Roller. It is mathematically equivalent to the kronecker product of all operators in this layer:

\[rollrepeat(n, U) \iff roll(n, \text{i=>U for i = 1:n}) \iff kron(n, \text{i=>U for i=1:n}) \iff U \otimes \dots \otimes U\]

@doc Val

  Val(c)

  Return Val{c}(), which contains no run-time data. Types like this can be
  used to pass the information between functions through the value c, which
  must be an isbits value. The intent of this construct is to be able to
  dispatch on constants directly (at compile time) without having to test the
  value of the constant at run time.

  Examples
  ≡≡≡≡≡≡≡≡≡≡

  julia> f(::Val{true}) = "Good"
  f (generic function with 1 method)

  julia> f(::Val{false}) = "Bad"
  f (generic function with 2 methods)

  julia> f(Val(true))
  "Good"

roll(4, i=>X for i = 1:4)

Total: 4, DataType: Complex{Float64}
roller
├─ X gate
├─ X gate
├─ X gate
└─ X gate

rollrepeat(4, X)

Total: 4, DataType: Complex{Float64}
roller
├─ X gate
├─ X gate
├─ X gate
└─ X gate

kron(4, i=>X for i = 1:4)

Total: 4, DataType: Complex{Float64}
kron
├─ 1=>X gate
├─ 2=>X gate
├─ 3=>X gate
└─ 4=>X gate

However, kron is calculated differently comparing to roll. In principal, Roller will be able to calculate small blocks with same size with higher efficiency. But for large blocks Roller may be slower. In 幺, we offer you this freedom to choose the most suitable solution.

all factory methods will lazy evaluate the first arguements, which is the number of qubits. It will return a lambda function that requires a single interger input. The instance of desired block will only be constructed until all the information is filled.

rollrepeat(X)

#33 (generic function with 1 method)

rollrepeat(X)(4)

Total: 4, DataType: Complex{Float64}
roller
├─ X gate
├─ X gate
├─ X gate
└─ X gate

When you filled all the information in somewhere of the declaration, 幺 will be able to infer the others.

chain(4, rollrepeat(X), rollrepeat(Y))

Total: 4, DataType: Complex{Float64}
chain
├─ roller
│  ├─ X gate
│  ├─ X gate
│  ├─ X gate
│  └─ X gate
└─ roller
   ├─ Y gate
   ├─ Y gate
   ├─ Y gate
   └─ Y gate

We will now define the rest of rotation layers

layer(::Val{:last}) = rollrepeat(chain(Rz(0), Rx(0)))
layer(::Val{:mid}) = rollrepeat(chain(Rz(0), Rx(0), Rz(0)))

layer (generic function with 4 methods)

CNOT Entangler

Another component of quantum circuit born machine is several CNOT operators applied on different qubits.

entangler(pairs) = chain(control([ctrl, ], target=>X) for (ctrl, target) in pairs)

entangler (generic function with 1 method)

We can then define such a born machine

function QCBM(n, nlayer, pairs)
    circuit = chain(n)
    push!(circuit, layer(:first))

    for i = 1:(nlayer - 1)
        push!(circuit, cache(entangler(pairs)))
        push!(circuit, layer(:mid))
    end

    push!(circuit, cache(entangler(pairs)))
    push!(circuit, layer(:last))

    circuit
end

QCBM (generic function with 1 method)

We use the method cache here to tag the entangler block that it should be cached after its first run, because it is actually a constant oracle. Let's see what will be constructed

QCBM(4, 1, [1=>2, 2=>3, 3=>4, 4=>1])

Total: 4, DataType: Complex{Float64}
chain
├─ roller
│  ├─ chain
│  │  ├─ Rot X gate: 0.0
│  │  └─ Rot Z gate: 0.0
│  ├─ chain
│  │  ├─ Rot X gate: 0.0
│  │  └─ Rot Z gate: 0.0
│  ├─ chain
│  │  ├─ Rot X gate: 0.0
│  │  └─ Rot Z gate: 0.0
│  └─ chain
│     ├─ Rot X gate: 0.0
│     └─ Rot Z gate: 0.0
├─ chain (Cached)
│  ├─ control(1)
│  │  └─ (2,)=>X gate
│  ├─ control(2)
│  │  └─ (3,)=>X gate
│  ├─ control(3)
│  │  └─ (4,)=>X gate
│  └─ control(4)
│     └─ (1,)=>X gate
└─ roller
   ├─ chain
   │  ├─ Rot Z gate: 0.0
   │  └─ Rot X gate: 0.0
   ├─ chain
   │  ├─ Rot Z gate: 0.0
   │  └─ Rot X gate: 0.0
   ├─ chain
   │  ├─ Rot Z gate: 0.0
   │  └─ Rot X gate: 0.0
   └─ chain
      ├─ Rot Z gate: 0.0
      └─ Rot X gate: 0.0

MMD Loss & Gradients

The MMD loss is describe below:

\[\begin{aligned} \mathcal{L} &= \left| \sum_{x} p \theta(x) \phi(x) - \sum_{x} \pi(x) \phi(x) \right|^2\\ &= \langle K(x, y) \rangle_{x \sim p_{\theta}, y\sim p_{\theta}} - 2 \langle K(x, y) \rangle_{x\sim p_{\theta}, y\sim \pi} + \langle K(x, y) \rangle_{x\sim\pi, y\sim\pi} \end{aligned}\]

We will use a squared exponential kernel here.

struct Kernel
    sigma::Float64
    matrix::Matrix{Float64}
end

function Kernel(nqubits, sigma)
    basis = collect(0:(1<<nqubits - 1))
    Kernel(sigma, kernel_matrix(basis, basis, sigma))
end

expect(kernel::Kernel, px::Vector{Float64}, py::Vector{Float64}) = px' * kernel.matrix * py
loss(qcbm, kernel::Kernel, ptrain) = (p = get_prob(qcbm) - ptrain; expect(kernel, p, p))

loss (generic function with 1 method)

Next, let's define the kernel matrix

function kernel_matrix(x, y, sigma)
    dx2 = (x .- y').^2
    gamma = 1.0 / (2 * sigma)
    K = exp.(-gamma * dx2)
    K
end

kernel_matrix (generic function with 1 method)

Gradients

the gradient of MMD loss is

\[\begin{aligned} \frac{\partial \mathcal{L}}{\partial \theta^i_l} &= \langle K(x, y) \rangle_{x\sim p_{\theta^+}, y\sim p_{\theta}} - \langle K(x, y) \rangle_{x\sim p_{\theta}^-, y\sim p_{\theta}}\\ &- \langle K(x, y) \rangle _{x\sim p_{\theta^+}, y\sim\pi} + \langle K(x, y) \rangle_{x\sim p_{\theta^-}, y\sim\pi} \end{aligned}\]

We have to update one parameter of each rotation gate each time, and calculate its gradient then collect them. Since we will need to calculate the probability from the state vector frequently, let's define a shorthand first.

Firstly, you have to define a quantum register. Each run of a QCBM's input is a simple $|00\cdots 0\rangle$ state. We provide string literal bit to help you define one-hot state vectors like this

r = register(bit"0000")

DefaultRegister{1, Complex{Float64}}
    active qubits: 4/4

circuit = QCBM(6, 10, [1=>2, 3=>4, 5=>6, 2=>3, 4=>5, 6=>1]);

Now, we define its shorthand

get_prob(qcbm) = apply!(register(bit"0"^nqubits(qcbm)), qcbm) |> statevec .|> abs2

get_prob (generic function with 1 method)

We will first iterate through each layer contains rotation gates and allocate an array to store our gradient

function gradient(n, nlayers, qcbm, kernel, ptrain)
    prob = get_prob(qcbm)
    grad = zeros(real(datatype(qcbm)), nparameters(qcbm))
    idx = 0
    for ilayer = 1:2:(2 * nlayers + 1)
        idx = grad_layer!(grad, idx, prob, qcbm, qcbm[ilayer], kernel, ptrain)
    end
    grad
end

gradient (generic function with 1 method)

Then we iterate through each rotation gate.

function grad_layer!(grad, idx, prob, qcbm, layer, kernel, ptrain)
    count = idx
    for each_line in blocks(layer)
        for each in blocks(each_line)
            gradient!(grad, count+1, prob, qcbm, each, kernel, ptrain)
            count += 1
        end
    end
    count
end

grad_layer! (generic function with 1 method)

We update each parameter by rotate it $-\pi/2$ and $\pi/2$

function gradient!(grad, idx, prob, qcbm, gate, kernel, ptrain)
    dispatch!(+, gate, pi / 2)
    prob_pos = get_prob(qcbm)

    dispatch!(-, gate, pi)
    prob_neg = get_prob(qcbm)

    dispatch!(+, gate, pi / 2) # set back

    grad_pos = expect(kernel, prob, prob_pos) - expect(kernel, prob, prob_neg)
    grad_neg = expect(kernel, ptrain, prob_pos) - expect(kernel, ptrain, prob_neg)
    grad[idx] = grad_pos - grad_neg
    grad
end

gradient! (generic function with 1 method)

Optimizer

We will use the Adam optimizer. Since we don't want you to install another package for this, the following code for this optimizer is copied from Knet.jl

Reference: Kingma, D. P., & Ba, J. L. (2015). Adam: a Method for Stochastic Optimization. International Conference on Learning Representations, 1–13.

using LinearAlgebra

mutable struct Adam
    lr::AbstractFloat
    gclip::AbstractFloat
    beta1::AbstractFloat
    beta2::AbstractFloat
    eps::AbstractFloat
    t::Int
    fstm
    scndm
end

Adam(; lr=0.001, gclip=0, beta1=0.9, beta2=0.999, eps=1e-8)=Adam(lr, gclip, beta1, beta2, eps, 0, nothing, nothing)

function update!(w, g, p::Adam)
    gclip!(g, p.gclip)
    if p.fstm===nothing; p.fstm=zero(w); p.scndm=zero(w); end
    p.t += 1
    lmul!(p.beta1, p.fstm)
    BLAS.axpy!(1-p.beta1, g, p.fstm)
    lmul!(p.beta2, p.scndm)
    BLAS.axpy!(1-p.beta2, g .* g, p.scndm)
    fstm_corrected = p.fstm / (1 - p.beta1 ^ p.t)
    scndm_corrected = p.scndm / (1 - p.beta2 ^ p.t)
    BLAS.axpy!(-p.lr, @.(fstm_corrected / (sqrt(scndm_corrected) + p.eps)), w)
end

function gclip!(g, gclip)
    if gclip == 0
        g
    else
        gnorm = vecnorm(g)
        if gnorm <= gclip
            g
        else
            BLAS.scale!(gclip/gnorm, g)
        end
    end
end

gclip! (generic function with 1 method)

The training of the quantum circuit is simple, just iterate through the steps.

function train!(qcbm, ptrain, optim; learning_rate=0.1, niter=50)

initialize the parameters

    params = 2pi * rand(nparameters(qcbm))
    dispatch!(qcbm, params)
    kernel = Kernel(nqubits(qcbm), 0.25)

    n, nlayers = nqubits(qcbm), (length(qcbm)-1)÷2
    history = Float64[]

    for i = 1:niter
        grad = gradient(n, nlayers, qcbm, kernel, ptrain)
        curr_loss = loss(qcbm, kernel, ptrain)
        push!(history, curr_loss)
        params = collect(parameters(qcbm))
        update!(params, grad, optim)
        dispatch!(qcbm, params)
    end
    history
end

train! (generic function with 1 method)

optim = Adam(lr=0.1)
his = train!(circuit, pg, optim, niter=50, learning_rate=0.1)
plot(1:50, his, xlabel="iteration", ylabel="loss")

p = get_prob(circuit)
p = plot(0:1<<n-1, p, xlabel="x", ylabel="p")
plot!(p, 0:1<<n-1, pg)

This page was generated using Literate.jl.