A small transformer that turns a quantum circuit into a vector, so circuits computing the same unitary land close and different ones land far apart. For deduplication, search, and fast equivalence filtering. Runs on a CPU.
from qiskit import QuantumCircuitimport torchfrom quark import CircuitEncoder, embedm = CircuitEncoder()m.load_state_dict(torch.load("quark.pt", weights_only=True))a = QuantumCircuit(3); a.h(0); a.cx(0,1); a.cx(1,2)b = QuantumCircuit(3); b.h(0); b.h(0); b.h(0); b.cx(0,1); b.cx(1,2)ea, eb = embed(m, [a, b])print((ea * eb).sum().item()) # 0.96 — same unitary, written longer
Repositories fill up with circuits written five different ways. The obvious ways to tell them apart all break:
hash the gate stringfooled instantlyreorder two commuting gates and the hash changes, even though the circuit doesn't.
sort gates, then hashmisses ordercatches simple reshuffles, blind to anything where gate order actually matters.
count gatesblind to qubitsignores which qubits gates act on; can't tell H X H from X.
compare full unitariesexact but exponentialthe real check, but 2ⁿ × 2ⁿ matrices choke above ~10 qubits.
quark sits in the middle. It learns a similarity from equivalence-preserving rewrites and hands back a fixed-size vector you can drop into any nearest-neighbour index. Wrong sometimes, fast always — seconds for a thousand circuits where the exact unitary check would take days.
Group the QASM files that compute the same unitary, even when they're written differently.
Find circuits in a library that look like the one you just wrote, by nearest neighbour.
Use it as a fast pre-filter, then confirm survivors with an exact unitary check.
Tell whether a pass changed a circuit's meaning, or only its syntax.
each gate → (type, q1, q2, cosθ, sinθ)
20-gate vocab, inverses get their own id
3 layers · d=128 · 4 heads · [CLS]
L2 → a 128-d unit vector
Every figure reproduces from the repo with one command and a fixed seed. Bars compare quark to the strongest non-ML baseline.
synthetic circuits, the same recipe it trained on
an equivalence rule the model never saw in training
telling S from S† apart — it used to be a flat false positive
ties baselinematching transpiled forms — the honest weak spot
Earlier versions packed a gate and its inverse into the same token, so two circuits differing only by S vs S† embedded identically — a confident false positive. v0.4.0 gives every inverse its own token and trains against corrupted twins, so the model now pulls them apart while keeping genuine equivalences high.
Real embeddings from the released model, precomputed. It cycles on its own — or pick a pair. A high score means the model thinks they compute the same thing.
H·H·H = H. Same unitary, just written longer.