quark — circuit_explorer.app

overview the problem how it works benchmarks what's new playground

◆ embeddings for quantum circuits

Quantum circuits,
as vectors.

A small transformer that turns a quantum circuit into a vector, so circuits computing the same unitary land close and different ones land far apart. For deduplication, search, and fast equivalence filtering. Runs on a CPU.

Read the code ↗Try the playground ↓

v1.1.127 tests passingApache-2.0647k paramsCPU-fast

quickstart.py

from qiskit import QuantumCircuit
import torch
from quark import CircuitEncoder, embed
 
m = CircuitEncoder()
m.load_state_dict(torch.load("quark.pt", weights_only=True))
 
a = QuantumCircuit(3); a.h(0); a.cx(0,1); a.cx(1,2)
b = QuantumCircuit(3); b.h(0); b.h(0); b.h(0); b.cx(0,1); b.cx(1,2)
 
ea, eb = embed(m, [a, b])
print((ea * eb).sum().item())   # 0.96 — same unitary, written longer

the problem

“Which of these circuits are actually the same?”

Repositories fill up with circuits written five different ways. The obvious ways to tell them apart all break:

hash the gate stringfooled instantly

reorder two commuting gates and the hash changes, even though the circuit doesn't.

sort gates, then hashmisses order

catches simple reshuffles, blind to anything where gate order actually matters.

count gatesblind to qubits

ignores which qubits gates act on; can't tell H X H from X.

compare full unitariesexact but exponential

the real check, but 2ⁿ × 2ⁿ matrices choke above ~10 qubits.

quark sits in the middle. It learns a similarity from equivalence-preserving rewrites and hands back a fixed-size vector you can drop into any nearest-neighbour index. Wrong sometimes, fast always — seconds for a thousand circuits where the exact unitary check would take days.

what it's for

A search index for circuits, not a way to run them

dedupe

Deduplicate a folder

Group the QASM files that compute the same unitary, even when they're written differently.

$ quark dedupe ./circuits/

Search by example

Find circuits in a library that look like the one you just wrote, by nearest neighbour.

$ quark eval --weights quark.pt

verify

Filter before you prove

Use it as a fast pre-filter, then confirm survivors with an exact unitary check.

$ quark.verify.equiv(a, b)

regress

Catch transpiler drift

Tell whether a pass changed a circuit's meaning, or only its syntax.

$ quark show --n 3 --k 4

how it works

Every gate becomes a token. The transformer reads them and emits one vector.

Tokenize

each gate → (type, q1, q2, cosθ, sinθ)

→

Embed

20-gate vocab, inverses get their own id

→

Transformer

3 layers · d=128 · 4 heads · [CLS]

→

Normalize

L2 → a 128-d unit vector

verified equivalence rewrites it trains on

647k

parameters — ~1000 circuits/sec on a laptop CPU

128-d

unit vector per circuit, ready for any index

benchmarks

Real numbers, including the one that isn't flattering

Every figure reproduces from the repo with one command and a fixed seed. Bars compare quark to the strongest non-ML baseline.

In-distribution

Recall@10

0.000

quark

1.00

baseline

0.88

synthetic circuits, the same recipe it trained on

Held-out rewrite

Recall@10

0.000

quark

0.98

baseline

0.55

an equivalence rule the model never saw in training

Gate vs inverse

separated

0.000

quark

0.67

baseline

0.00

telling S from S† apart — it used to be a flat false positive

QASMBench (OOD)

Recall@10

0.000

quark

0.17

baseline

0.17

ties baselinematching transpiled forms — the honest weak spot

new in v0.4.0

It stopped calling S and S† the same gate

Earlier versions packed a gate and its inverse into the same token, so two circuits differing only by S vs S† embedded identically — a confident false positive. v0.4.0 gives every inverse its own token and trains against corrupted twins, so the model now pulls them apart while keeping genuine equivalences high.

▸distinct tokens for S/S†, T/T†, SX/SX†, CX/CY (vocab 16 → 20)
▸four new verified rewrites: gate↔rotation, Clifford conjugation, gate algebra, Pauli-through-CNOT
▸corrupted-twin hard negatives teach the model to separate near-misses

before · S vs S†

1.0000

confident false positive

now · S vs S†

−0.42

correctly different

playground

Two circuits in, one similarity out

Real embeddings from the released model, precomputed. It cycles on its own — or pick a pair. A high score means the model thinks they compute the same thing.

circuit a

circuit b

model cosine

+0.959

−1 · different0+1 · same

truth: equivalentmodel agrees ✓

H·H·H = H. Same unitary, just written longer.

Quantum circuits,as vectors.

“Which of these circuits are actually the same?”

A search index for circuits, not a way to run them

Deduplicate a folder

Search by example

Filter before you prove

Catch transpiler drift

Every gate becomes a token. The transformer reads them and emits one vector.

Tokenize

Embed

Transformer

Normalize

Real numbers, including the one that isn't flattering

In-distribution

Held-out rewrite

Gate vs inverse

QASMBench (OOD)

It stopped calling S and S† the same gate

Two circuits in, one similarity out

Quantum circuits,
as vectors.