··:··
quark — circuit_explorer.app
◆ embeddings for quantum circuits

Quantum circuits,
as vectors.

A small transformer that turns a quantum circuit into a vector, so circuits computing the same unitary land close and different ones land far apart. For deduplication, search, and fast equivalence filtering. Runs on a CPU.

v1.1.127 tests passingApache-2.0647k paramsCPU-fast
quickstart.py
from qiskit import QuantumCircuit
import torch
from quark import CircuitEncoder, embed
m = CircuitEncoder()
m.load_state_dict(torch.load("quark.pt", weights_only=True))
a = QuantumCircuit(3); a.h(0); a.cx(0,1); a.cx(1,2)
b = QuantumCircuit(3); b.h(0); b.h(0); b.h(0); b.cx(0,1); b.cx(1,2)
ea, eb = embed(m, [a, b])
print((ea * eb).sum().item()) # 0.96 — same unitary, written longer
the problem

“Which of these circuits are actually the same?”

Repositories fill up with circuits written five different ways. The obvious ways to tell them apart all break:

hash the gate stringfooled instantly

reorder two commuting gates and the hash changes, even though the circuit doesn't.

sort gates, then hashmisses order

catches simple reshuffles, blind to anything where gate order actually matters.

count gatesblind to qubits

ignores which qubits gates act on; can't tell H X H from X.

compare full unitariesexact but exponential

the real check, but 2ⁿ × 2ⁿ matrices choke above ~10 qubits.

quark sits in the middle. It learns a similarity from equivalence-preserving rewrites and hands back a fixed-size vector you can drop into any nearest-neighbour index. Wrong sometimes, fast always — seconds for a thousand circuits where the exact unitary check would take days.

what it's for

A search index for circuits, not a way to run them

dedupe

Deduplicate a folder

Group the QASM files that compute the same unitary, even when they're written differently.

$ quark dedupe ./circuits/
search

Search by example

Find circuits in a library that look like the one you just wrote, by nearest neighbour.

$ quark eval --weights quark.pt
verify

Filter before you prove

Use it as a fast pre-filter, then confirm survivors with an exact unitary check.

$ quark.verify.equiv(a, b)
regress

Catch transpiler drift

Tell whether a pass changed a circuit's meaning, or only its syntax.

$ quark show --n 3 --k 4
how it works

Every gate becomes a token. The transformer reads them and emits one vector.

1

Tokenize

each gate → (type, q1, q2, cosθ, sinθ)

2

Embed

20-gate vocab, inverses get their own id

3

Transformer

3 layers · d=128 · 4 heads · [CLS]

4

Normalize

L2 → a 128-d unit vector

9
verified equivalence rewrites it trains on
647k
parameters — ~1000 circuits/sec on a laptop CPU
128-d
unit vector per circuit, ready for any index
benchmarks

Real numbers, including the one that isn't flattering

Every figure reproduces from the repo with one command and a fixed seed. Bars compare quark to the strongest non-ML baseline.

In-distribution

Recall@10
0.000
quark
1.00
baseline
0.88

synthetic circuits, the same recipe it trained on

Held-out rewrite

Recall@10
0.000
quark
0.98
baseline
0.55

an equivalence rule the model never saw in training

Gate vs inverse

separated
0.000
quark
0.67
baseline
0.00

telling S from S† apart — it used to be a flat false positive

QASMBench (OOD)

Recall@10
0.000
quark
0.17
baseline
0.17

ties baselinematching transpiled forms — the honest weak spot

new in v0.4.0

It stopped calling S and S† the same gate

Earlier versions packed a gate and its inverse into the same token, so two circuits differing only by S vs S† embedded identically — a confident false positive. v0.4.0 gives every inverse its own token and trains against corrupted twins, so the model now pulls them apart while keeping genuine equivalences high.

  • distinct tokens for S/S†, T/T†, SX/SX†, CX/CY (vocab 16 → 20)
  • four new verified rewrites: gate↔rotation, Clifford conjugation, gate algebra, Pauli-through-CNOT
  • corrupted-twin hard negatives teach the model to separate near-misses
before · S vs S†
1.0000
confident false positive
now · S vs S†
−0.42
correctly different
playground

Two circuits in, one similarity out

Real embeddings from the released model, precomputed. It cycles on its own — or pick a pair. A high score means the model thinks they compute the same thing.

circuit a
q0q1q2H
vs
circuit b
q0q1q2HHH
model cosine
+0.959
−1 · different0+1 · same
truth: equivalentmodel agrees ✓

H·H·H = H. Same unitary, just written longer.