Developer guide Grouping QIS

QAT Application: Logic separation

Architecture considerations

In this section, we will discuss some classical architectures and then make considerations regarding quantum architectures.

Motivation: Classical co-processors

Before diving into quantum architectures, we first make a brief note on classical architectures: For decades we have seen classical architectures composed of multiple processing units. For instance, in the nineties it was common to have a co-processor supporting the main x86 architecture with such tasks as floating point operations. Other co-processing architectures were built to enhance the computers sound capabilities and in 2001, we saw graphical processing units (GPUs) emerge. To this day, GPUs are still an essentially co-processor for many computing tasks including gaming, physics simulation, and graphics to mention a few.

While floating point operation co-processors eventually became integrated into the CPU, other co-processors such as GPU remains stand-alone units today. On a high-level, the use of a GPU for simulation roughly follows this diagram:

┌─────────────┐     Programming     ┌─────────────┐
│             │  State preparation  │             │
│             ├─────────────────────▶             │
│             │                     │             │
│   Primary   │                     │  Secondary  │
│ processing  │                     │ processing  │
│    unit     │                     │    unit     │
│             │                     │             │
│             ◀─────────────────────┤             │
│             │   Result readout    │             │
└─────────────┘                     └─────────────┘

First the secondary processing unit (SPU) is programmed and as a part of this process the initial state (i.e. registers and memory) is prepared. The SPU then executes the program and the results are read out by the primary processing unit (PPU).

Classical-quantum processing architecture

While one could imagine a quantum architecture in which everything is integrated into a single processing unit that both can process classical and quantum registers, it is unlikely that this kind of architecture will emerge in the near future due to the vastly different environments in which the two processes take place. Therefore, one feasible architecture is one in which we consider the quantum processing unit (QPU) as SPU and the CPU the PPU much like is the case for CPUs and GPUs today.

Disregarding the more advanced scenarios where PPUs and SPUs work in parallel, the basic flow of a classical-quantum program execution follows a pattern as given below:

      Classical             PPU-SPU             Quantum
   processing unit  │      interface      │ processing unit
                    │                     │
                  │ │                     │
Classical program │ │                     │
        execution │ │                     │
                  │ │  State preparation  │
                  ▼ │     Programming     │
                    ├─────────────────────▶
                    │                     │ │
                    │                     │ │ Quantum program
                    │                     │ │ execution
                    │   Result readout    │ │
                    ◀─────────────────────┤ ▼
                  │ │                     │
Classical program │ │                     │
        execution │ │  State preparation  │
                  │ │     Programming     │
                  ▼ ├─────────────────────▶
                    │                     │ │
                    │                     │ │ Quantum program
                    │                     │ │ execution
                    │   Result readout    │ │
                    ◀─────────────────────┤ ▼
                    │                     │
                    ▼                     ▼
                    Time

We note that in some cases, either the state programming or the readout process may be not contain instructions. In these cases, blocks with state programming and/or readout will simply be empty and therefore it has no influence on the general structure.

Problem definition

The QIR specification does not put any requirements for a generic QIR to provide a separation of instructions intended for the CPU and QPU. However, under the assumption that quantum architectures follow the PPU-SPU pattern described in the previous section such a separation is necessary to perform the program execution. Our aim in this document will be to define how the separation should be performed and what qualifies an entity to be a CPU or QPU element.

We will assume that branching and jumping is not possible within the QPU since this is a classical phenomenon. The consequence is that CPU-QPU separation must happen at the level of BasicBlocks in LLVM rather than at a function or even program level. While it may be that some QPUs have basic branching support, we note that it will be manageable to identify multi-block segments which are compatible with a given QPU target after the separation.

Definitions

Registers

For traditional PPU-SPU architectures, registers are limited in scope to the processing unit on which they reside. That is, a register on a CPU cannot be accessed from the GPU and vice versa. The value of a given register is only available through data transport through the two units. For CPU-QPU architectures, this limitation is the same. While the CPU may hold a reference to a qubit, the physical qubit itself resides on the QPU and can only be manipulated through the CPU-QPU interface.

Quantum register types

LLVM type name	Description
`Qubit*`	Qubit register amounting to a physical qubit or simulation data type
`Result*`	Result register

Classical register types

LLVM type name	Description
`i1`
`i2`
`i8`
`i16`
`i32`
`i64`
`struct`

On the CPU, Qubit* and Result* are treated as integers. This means that they should never be dereferenced nor should the "memory address" be written to.

Instructions and instruction classification

The QIR specification has two high-level classes of instructions: Those which are calls to functions with names starting with __quantum__ and all other instructions. Function calls to functions starting with __quantum__ are expected to follow the QIR specification and these calls are further divided into two groups: Intrinsic functions (__quantum__qis__) and runtime functions (__quantum__rt__). The latter are expected to be provided by the runtime library given by the frontend and/or the target machine, whereas the former are intrinsic functions. The intrinsic functions may either run on the CPU, the QPU or serve as instruction to transfer data between the two processing units.

We classify __quantum__qis__ functions into four categories: Purely CPU, purely QPU, setup/transport from CPU to QPU and result/transport from QPU to CPU. The classification happens on the basis of the function signature: The function arguments together with the return result determine where the call will be executed according to following rules:

Any void function location is purely determined by its arguments:
- Void functions that only have quantum register types as arguments are classified as pure QPU instructions
- Void functions that only have classical register types as arguments are classified as pure CPU instructions
- Void functions that takes a mixture are classified as CPU to QPU setup/transport instructions
Any function that returns a quantum register is:
- A pure quantum instruction if all of the function arguments are quantum registers
- A CPU to QPU setup/transport instruction in any other case
A function that returns a classical register is:
- A pure classical instruction if all of the arguments are classical registers
- A QPU to CPU result/transport instruction in any other case

Pass definition

As an example we will consider following code block:

define i64 @LogicGrouping(i64 %z) local_unnamed_addr #0 {
entry:
  %0 = icmp slt i64 %z, 0
  %a1.i.i = select i1 %0, %Qubit* null, %Qubit* inttoptr (i64 1 to %Qubit*)
  %a2.i.i = select i1 %0, %Qubit* inttoptr (i64 1 to %Qubit*), %Qubit* null
  tail call void @__quantum__qis__x__body(%Qubit* %a1.i.i)
  tail call void @__quantum__qis__z__body(%Qubit* null)
  tail call void @__quantum__qis__cnot__body(%Qubit* %a1.i.i, %Qubit* %a2.i.i)
  tail call void @__quantum__qis__mz__body(%Qubit* %a2.i.i, %Result* null)
  tail call void @__quantum__qis__reset__body(%Qubit* %a2.i.i)
  %1 = tail call i1 @__quantum__qis__read_result__body(%Result* null)
  %2 = mul i64 %z, 45
  %.op.op.i.i = add i64 %2, 13
  %3 = select i1 %1, i64 -7, i64 %.op.op.i.i
  ret i64 %3
}

In the above 2-qubit (0 and 1) circuit, we use a classical parameter to decide which qubit should be the control qubit (a1) and which is the target (a2). We apply an X gate to the control qubit and then apply a Z gate to the qubit with id 0 regardless of whether it is the control or target qubit. This is followed by a CNOT gate and finally a measurement. The result from the measurement is used to compute a classical result. Our goal is now to separate instructions into blocks of pure quantum instructions, pure classical instructions and data transfer. The data transfer blocks can be thought of as a mix of classical and quantum instructions as these may manipulate registers on either the CPU or QPU.

For the above example code, we envision a separation similar to following code snippet:

define i64 @LogicGrouping(i64 %z) local_unnamed_addr #0 {
entry:
  %0 = icmp slt i64 %z, 0
  %1 = mul i64 %z, 45
  %2 = add i64 %1, 13
  br label %load

load:                                             ; preds = %entry
  %3 = select i1 %0, %Qubit* null, %Qubit* inttoptr (i64 1 to %Qubit*)
  %4 = select i1 %0, %Qubit* inttoptr (i64 1 to %Qubit*), %Qubit* null
  br label %quantum

quantum:                                          ; preds = %load
  tail call void @__quantum__qis__x__body(%Qubit* %3)
  tail call void @__quantum__qis__z__body(%Qubit* null)
  tail call void @__quantum__qis__cnot__body(%Qubit* %3, %Qubit* %4)
  tail call void @__quantum__qis__mz__body(%Qubit* %4, %Result* null)
  tail call void @__quantum__qis__reset__body(%Qubit* %4)
  br label %readout

readout:                                          ; preds = %quantum
  %5 = tail call i1 @__quantum__qis__read_result__body(%Result* null)
  br label %post-classical

post-classical:                                   ; preds = %readout
  %6 = select i1 %5, i64 -7, i64 %2
  br label %exit_quantum_grouping

exit_quantum_grouping:                            ; preds = %post-classical
  ret i64 %6
}

In the above program, the original instructions are rearranged into blocks following the pattern previously discussed with an entry block, a load block, a quantum program block, a readout block and a post-classical processing block. Visualized as above, the blocks are executed as follows:

    Classical             PPU-SPU             Quantum
 processing unit  │      interface      │ processing unit
                  │                     │
           entry│ │                     │
                ▼ │        load         │
                  ├─────────────────────▶
                  │       readout       │ │ quantum
                  ◀─────────────────────┤ ▼
 post-classical │ │                     │
                ▼ │                     │
  exit-grouping │ │                     │
                ▼ │                     │
                  ▼                     ▼
                  Time

With this separation, it becomes straight-forward to execute the program parts in accordance with the required processing unit as well as to ensure that proper setup of the QPU was performed before initiating the quantum calculation.

Block Separation Strategy

Our block separation strategy follows a divide-and-conquer approach: First we sort instructions into new blocks according to the source processing. Each of these blocks are then sub-divided based on the destination processing unit. We illustrate this in the following diagram:

            1st split       2nd split

                                 ┌──────────┐
                                 │Classical │     entry
                 ┌──────────┐ ┌─▶│   Dest   │
                 │          │ │  └──────────┘
                 │Classical │ │  ┌──────────┐
              ┌─▶│  Source  ├─┤  │ Quantum  │      load
              │  │          │ └─▶│   Dest   │
              │  └──────────┘    └──────────┘
              │                  ┌──────────┐
┌──────────┐  │  ┌──────────┐    │ Quantum  │    quantum
│          │  │  │          │ ┌─▶│   Dest   │
│ Original │  │  │ Quantum  │ │  └──────────┘
│  block   │──┼─▶│  Source  ├─┤  ┌──────────┐
│          │  │  │          │ │  │Classical │    readout
└──────────┘  │  └──────────┘ └─▶│   Dest   │
              │                  └──────────┘
              │  ┌──────────┐    ┌──────────┐
              │  │          │    │Classical │ post-classical
              │  │Classical │ ┌─▶│   Dest   │
              └─▶│  Source  │─┤  └──────────┘
                 │          │ │  ┌ ─ ─ ─ ─ ─
                 └──────────┘ │   Empty (QD)│
                              └─▶│
                                  ─ ─ ─ ─ ─ ┘

We note the second split is reversed depending type of instructions it contains.

We order instruction blocks (during the second split) such that instructions whose source and destination processing units are the same come first. We note that the second split is technically easier to make as the blocks processed in the first split already have certain guarantees provided:

We are guaranteed that we only need to make one split
We are guaranteed that the order of all dependencies is preserved when splitting instructions into the two blocks

The first split is more involved as we need to identify dependencies between classical and quantum circuits and may end up with an output similar to:

                 ┌──────────┐
                 │Classical │
              ┌─▶│  Source  │
              │  └──────────┘
              │  ┌──────────┐
              │  │ Quantum  │
              ├─▶│  Source  │
              │  └──────────┘
              │  ┌──────────┐
              │  │Classical │
              ├─▶│  Source  │
┌──────────┐  │  └──────────┘
│          │  │  ┌──────────┐
│ Original │  │  │ Quantum  │
│  block   │──┼─▶│  Source  │
│          │  │  └──────────┘
└──────────┘  │  ┌──────────┐
              │  │Classical │
              ├─▶│  Source  │
              │  └──────────┘
              │       .
              │       .
              │       .
              │  ┌──────────┐
              │  │Classical │
              └─▶│  Source  │
                 └──────────┘

The main reason for this is that we may make a measurement and then use that measurement to decide what the next quantum circuit looks like. As an example, consider following circuit:

define i64 @LogicGrouping(i64 %z) local_unnamed_addr #0 {
entry:
  %0 = icmp slt i64 %z, 0
  %a1.i.i = select i1 %0, %Qubit* null, %Qubit* inttoptr (i64 1 to %Qubit*)
  %a2.i.i = select i1 %0, %Qubit* inttoptr (i64 1 to %Qubit*), %Qubit* null
  tail call void @__quantum__qis__x__body(%Qubit* %a1.i.i)
  tail call void @__quantum__qis__z__body(%Qubit* null)
  tail call void @__quantum__qis__cnot__body(%Qubit* %a1.i.i, %Qubit* %a2.i.i)
  tail call void @__quantum__qis__mz__body(%Qubit* %a2.i.i, %Result* null)
  tail call void @__quantum__qis__reset__body(%Qubit* %a2.i.i)
  %1 = tail call i1 @__quantum__qis__read_result__body(%Result* null)
  %a3.i.i = select i1 %1, %Qubit* inttoptr (i64 1 to %Qubit*), %Qubit* null
  %2 = tail call i1 @__quantum__qis__read_result__body(%Result* null)
  %a4.i.i = select i1 %2, %Qubit* null, %Qubit* inttoptr (i64 1 to %Qubit*)
  tail call void @__quantum__qis__x__body(%Qubit* %a3.i.i)
  tail call void @__quantum__qis__z__body(%Qubit* null)
  tail call void @__quantum__qis__cnot__body(%Qubit* %a3.i.i, %Qubit* %a4.i.i)
  tail call void @__quantum__qis__mz__body(%Qubit* %a4.i.i, %Result* nonnull inttoptr (i64 1 to %Result*))
  tail call void @__quantum__qis__reset__body(%Qubit* %a4.i.i)
  %3 = tail call i1 @__quantum__qis__read_result__body(%Result* nonnull inttoptr (i64 1 to %Result*))
  %4 = mul i64 %z, 9
  %5 = shl i64 %z, 1
  %val.i.i = select i1 %3, i64 %5, i64 %4
  %6 = mul i64 %val.i.i, 5
  %7 = add i64 %6, -7
  ret i64 %7
}

After grouping according to the source processing unit, we arrive at

define i64 @LogicGrouping(i64 %z) local_unnamed_addr #0 {
entry:
  %0 = icmp slt i64 %z, 0
  %1 = select i1 %0, %Qubit* null, %Qubit* inttoptr (i64 1 to %Qubit*)
  %2 = select i1 %0, %Qubit* inttoptr (i64 1 to %Qubit*), %Qubit* null
  br label %quantum

quantum:                                          ; preds = %entry
  tail call void @__quantum__qis__x__body(%Qubit* %1)
  tail call void @__quantum__qis__z__body(%Qubit* null)
  tail call void @__quantum__qis__cnot__body(%Qubit* %1, %Qubit* %2)
  tail call void @__quantum__qis__mz__body(%Qubit* %2, %Result* null)
  tail call void @__quantum__qis__reset__body(%Qubit* %2)
  %3 = tail call i1 @__quantum__qis__read_result__body(%Result* null)
  %4 = tail call i1 @__quantum__qis__read_result__body(%Result* null)
  br label %post-classical

post-classical:                                   ; preds = %quantum
  %5 = select i1 %3, %Qubit* inttoptr (i64 1 to %Qubit*), %Qubit* null
  %6 = select i1 %4, %Qubit* null, %Qubit* inttoptr (i64 1 to %Qubit*)
  %7 = mul i64 %z, 9
  %8 = shl i64 %z, 1
  br label %quantum2

quantum2:                                         ; preds = %post-classical
  tail call void @__quantum__qis__x__body(%Qubit* %5)
  tail call void @__quantum__qis__z__body(%Qubit* null)
  tail call void @__quantum__qis__cnot__body(%Qubit* %5, %Qubit* %6)
  tail call void @__quantum__qis__mz__body(%Qubit* %6, %Result* nonnull inttoptr (i64 1 to %Result*))
  tail call void @__quantum__qis__reset__body(%Qubit* %6)
  %9 = tail call i1 @__quantum__qis__read_result__body(%Result* nonnull inttoptr (i64 1 to %Result*))
  br label %post-classical1

post-classical1:                                  ; preds = %quantum2
  %10 = select i1 %9, i64 %8, i64 %7
  %11 = mul i64 %10, 5
  %12 = add i64 %11, -7
  br label %exit_quantum_grouping

exit_quantum_grouping:                            ; preds = %post-classical1
  ret i64 %12
}

which contains two quantum circuits where the second circuit depends on the first one. That is, in the block post-classical we select which qubits to use for the execute, and hence, we would not be able to run the second circuit prior to this classical calculation.

Further expanding each of the source blocks, we get the load and readout blocks:

define i64 @LogicGrouping(i64 %z) local_unnamed_addr #0 {
entry:
  %0 = icmp slt i64 %z, 0
  br label %load

load:                                             ; preds = %entry
  %1 = select i1 %0, %Qubit* null, %Qubit* inttoptr (i64 1 to %Qubit*)
  %2 = select i1 %0, %Qubit* inttoptr (i64 1 to %Qubit*), %Qubit* null
  br label %quantum

quantum:                                          ; preds = %load
  tail call void @__quantum__qis__x__body(%Qubit* %1)
  tail call void @__quantum__qis__z__body(%Qubit* null)
  tail call void @__quantum__qis__cnot__body(%Qubit* %1, %Qubit* %2)
  tail call void @__quantum__qis__mz__body(%Qubit* %2, %Result* null)
  tail call void @__quantum__qis__reset__body(%Qubit* %2)
  br label %readout

readout:                                          ; preds = %quantum
  %3 = tail call i1 @__quantum__qis__read_result__body(%Result* null)
  %4 = tail call i1 @__quantum__qis__read_result__body(%Result* null)
  br label %post-classical

post-classical:                                   ; preds = %readout
  %5 = mul i64 %z, 9
  %6 = shl i64 %z, 1
  br label %load4

load4:                                            ; preds = %post-classical
  %7 = select i1 %3, %Qubit* inttoptr (i64 1 to %Qubit*), %Qubit* null
  %8 = select i1 %4, %Qubit* null, %Qubit* inttoptr (i64 1 to %Qubit*)
  br label %quantum2

quantum2:                                         ; preds = %load4
  tail call void @__quantum__qis__x__body(%Qubit* %7)
  tail call void @__quantum__qis__z__body(%Qubit* null)
  tail call void @__quantum__qis__cnot__body(%Qubit* %7, %Qubit* %8)
  tail call void @__quantum__qis__mz__body(%Qubit* %8, %Result* nonnull inttoptr (i64 1 to %Result*))
  tail call void @__quantum__qis__reset__body(%Qubit* %8)
  br label %readout3

readout3:                                         ; preds = %quantum2
  %9 = tail call i1 @__quantum__qis__read_result__body(%Result* nonnull inttoptr (i64 1 to %Result*))
  br label %post-classical1

post-classical1:                                  ; preds = %readout3
  %10 = select i1 %9, i64 %6, i64 %5
  %11 = mul i64 %10, 5
  %12 = add i64 %11, -7
  br label %exit_quantum_grouping

exit_quantum_grouping:                            ; preds = %post-classical1
  ret i64 %12
}