Swarm & Bee News

APRIL 5, 2026 — AFTERNOON

First Deed With Location Coordinates — The Inspection Report Ships

SB-2026-0405-019819 is the first deed in production that carries five dimension scores from each scale. Ten data points instead of two. The property inspection, not just the appraisal.

In commercial real estate, an appraisal says "this property is worth $850K." An inspection report says why — and what would make it worth more. That's what every deed now carries.

DEED SB-2026-0405-019819 — LOCATION COORDINATES

SCALE A (gemma3:12b)

accuracy 0.90

completeness 0.95

specificity 0.85

structure 0.90

domain_expertise 0.80

composite 0.88

SCALE B (qwen2.5:32b)

accuracy 0.95

completeness 0.92

specificity 0.85

structure 0.90

domain_expertise 0.88

composite 0.90

FINAL: 0.89 ROYAL JELLY | DRIFT: 0.00

Both scales independently scored specificity at 0.85 and structure at 0.90 — identical scores from two completely different model architectures. That's not coincidence. That's architecture-independent agreement at the dimension level.

Specificity at 0.85 tells the client exactly what to fix: more concrete numbers, named programs, specific dollar amounts. The deed doesn't just rate — it prescribes.

Every deed from this point forward carries 10 data points. The 19,598 scored before today carry the original dual-scale format. Clean historical break. The glass wall just got five dimensions thicker.

Search any deed → · Read the methodology →

10 data points per deed | 5 dimensions × 2 scales | specificity = 0.85 (both scales agree) | structure = 0.90 (identical)

SHIPPED DEED SCHEMA DIMENSIONS GLASS WALL

APRIL 5, 2026

Three Experiments, Five Coordinates — The Eval Page

We ran three controlled experiments on our scoring methodology in one session. 1,600+ scoring calls across 600 pairs. Every result published — including the ones that failed.

EXP-001: Position Bias. The MT-Bench paper says LLM judges score differently based on content order. We tested it on 600 pairs. Delta: +0.0064. Below significance. Our scales are position-neutral.

EXP-002: Few-Shot Calibration. Auto-CoT research says scoring examples improve consistency. We tested 3 tier-spanning exemplars. Agreement dropped 1.2%. The zero-shot prompt is more robust. Published as a negative result — because negative results prevent repeating mistakes.

EXP-003: Per-Dimension Scoring. Instead of one holistic score, we scored each of five dimensions independently. Agreement improved 1.0%. Score spread compressed 21%. And we discovered the five coordinates — the first X-ray of our corpus quality.

The finding: specificity is the quality gap. Structure scores 0.968. Domain expertise 0.940. Accuracy 0.930. But specificity — concrete numbers, named entities, actionable steps — sits at 0.849. That's the curb appeal problem. Now we know exactly what to fix.

In CRE terms: a single score is an appraisal. Five dimension scores are a property inspection. The deed becomes an inspection report that tells the buyer not just what something is worth, but why — and what would make it worth more.

Every experiment, every result, every methodology is now public.

Read the full evaluation methodology →

3 experiments | 1,600+ scoring calls | 600 eval pairs | 99% scale agreement | specificity = 0.849 (quality gap)

NEW PAGE RESEARCH EVAL GLASS WALL

APRIL 4, 2026 — LATE

10,000 Deeds — The Tribunal Never Sleeps

Today the autonomous tribunal crossed 10,000 deeds. Every one scored by two independent base models. Every batch sealed in a Merkle tree. Every root anchored to Hedera mainnet.

The tribunal runner is a systemd service on swarmrails. It loads pairs, sends them to Scale A (gemma3:12b on an RTX PRO 6000) and Scale B (qwen2.5:32b on a separate RTX 3090), validates the scores across two passes, and files the deed. The deed recorder on the edge box picks up every scored pair and files it to PostgreSQL + NAS + Merkle batch within 30 seconds.

Nobody touched it. Nobody monitored it. Nobody intervened. 767 pairs per hour, 24 hours a day.

The grants domain crossed 5,000 Royal Jelly. Medical holds at 4,295. Two domains selling on the shop. Eight more in the queue.

We don't sell training data. We sell relocation. Your model moves from generic to specialist. We prove it worked.

Watch the tribunal work · Search any deed · Buy a package

10,405 deeds | 9,539 Royal Jelly | 208 Merkle batches | 767 pairs/hr | 2 domains selling

MILESTONE 10K DEEDS TRIBUNAL

APRIL 4, 2026

SwarmGEO Launches — Free AI Visibility Scanner

How does AI see your website? We launched SwarmGEO — a free tool that scans any URL and scores it across 6 dimensions of Generative Engine Optimization. The scan is powered by Gemma 3 4B running on our Xeon w9-3475X Sapphire Rapids, completing full AI-powered analysis in under 3 seconds.

GEO is the new SEO. Google ranked pages. AI models cite content. Traditional SEO optimized for crawlers — GEO optimizes for understanding. Our scanner evaluates structured data, content clarity, entity density, citation readiness, authority signals, and technical access.

Try it free — no signup required →

LIVE GEO PRODUCT LAUNCH

APRIL 4, 2026

Grants Domain Goes Live — 1,000+ Royal Jelly Scored

The autonomous tribunal runner scored its way through the first 1,000+ grant pairs. Grants is now the second domain available for purchase on SwarmShop, joining medical. Every grant pair covers SBIR/STTR, NEA, DOE, NIH — real federal grant writing with domain-expert system prompts.

The grants dataset was scored entirely by the 24/7 tribunal runner — no human intervention. Scale A (gemma3:12b) on swarmrails GPU1, Scale B (qwen2.5:32b) on the Whale rig. 777 pairs/hour, 2-pass validation, automatic deed filing via the edge recorder.

Grants: 1,003 Royal Jelly | Avg score: 0.87 | Starter package: $29

LIVE GRANTS TRIBUNAL

APRIL 4, 2026

SwarmShop with Stripe Checkout — Buy Datasets in 60 Seconds

The dataset marketplace is fully operational. Pick a domain, choose a package, pay with Stripe, receive a ZIP with 9 files — 5 training formats, deed certificates, offering memorandum, quality report, and package manifest. Delivery is instant via email.

Every package ships with full provenance: dual-scale scores, scale reasoning, Merkle batch proofs, and a closing statement showing cost-to-mint vs title premium. This isn't a download — it's a closing.

Browse datasets →

LIVE SHOP STRIPE

APRIL 4, 2026

Gemma 4 31B — 68% Through Training

The swarmGrant-Gemma4-31B model is cooking on GPU0 (RTX PRO 6000 Blackwell, 300W, 99% utilization). Training on 35,957 Royal Jelly pairs — grants, finance, company DNA. QLoRA r=64, 3 epochs, eval loss trending down: 0.7534 → 0.6339 → 0.5919 → 0.5663 → 0.5378 → 0.5250.

Already beating the Gemma 12B baseline (0.58 eval loss) with 32% of training still to go. This will be the strongest domain-specific model in the fleet.

Step 2188/3204 (68%) | ETA: ~18h | Eval: 0.5250 | 300W 81°C

TRAINING GEMMA 4

APRIL 4, 2026

24/7 Tribunal Runner — The Swarm is Autonomous

The tribunal runner is now a permanent systemd service scoring pairs around the clock. Domain order: grants (smallest) → aviation → medical → CRE (largest). At 777 pairs/hour, the full 1.3M pair corpus will be scored in ~65 days of continuous operation.

The pipeline: tribunal runner scores → deed recorder files → Merkle batcher seals → shop updates in real time. Zero human intervention. The swarm scores while we sleep.

7 services | 4 machines | 14/14 watchdog checks | 1.3M pairs queued

ACTIVE INFRASTRUCTURE

News & Updates