Home | Signal | Curator | Morey | SwarmCare | Hedera | Discord
Medical AI Training Data — Platinum Tier

434,882 Verified Medical Pairs
Across 92 Specialties

Textbook-grade clinical reasoning, pharmacology, foundational science, and board preparation data. Every pair quality-gated, SHA-256 sealed, and ready for fine-tuning.

434,882 Verified Pairs
92 Specialties
19 Textbooks Sourced
Platinum Quality Tier

Clinical Medicine

220,000+ pairs
Internal Medicine PLATINUM
99,551 pairs Differential diagnosis, management protocols, Harrison's-grade clinical reasoning. The backbone of clinical AI training.
Surgery PLATINUM
43,051 pairs Perioperative assessment, surgical decision-making, Schwartz's-grade procedures. Pre-op workup through post-op management.
Neurology PLATINUM
39,296 pairs Neurological examination, stroke protocols, Adams-grade neuro reasoning. Localization, differential, and acute management.
Cardiology PLATINUM
14,064 pairs ECG interpretation, hemodynamic assessment, interventional protocols. ACS pathways to structural heart disease.
Emergency Medicine GOLD
1,041 pairs Acute management, trauma protocols, triage decision-making. Time-critical reasoning under clinical uncertainty.
Critical Care GOLD
142 pairs ICU management, ventilation protocols, sepsis pathways. High-acuity physiological reasoning and multi-organ support.

Women's Health & Obstetrics

51,000+ pairs
Obstetrics PLATINUM
27,531 pairs Prenatal care, labor management, Williams-grade obstetric reasoning. Antenatal risk stratification through postpartum care.
Women's Health PLATINUM
23,825 pairs Reproductive health, Novak's-grade gynecological assessment. Hormonal management, screening protocols, and preventive care.
Gynecology GOLD
490 pairs Surgical gynecology, endometriosis management, fertility assessment. Operative planning and reproductive endocrinology.

Pharmacology & Drug Science

22,000+ pairs
Pharmacology PLATINUM
23,432 pairs Drug interactions, pharmacokinetics, Katzung-grade mechanisms. ADME, receptor pharmacology, and therapeutic index analysis.
Drug Safety GOLD
3 pairs Adverse event detection, pharmacovigilance. Seed data for signal detection and FAERS-grade safety reasoning.

Foundational Sciences

84,000+ pairs
Cell Biology PLATINUM
21,130 pairs Molecular mechanisms, Alberts-grade cell biology. Signal transduction, cell cycle regulation, and organelle function.
Pathology PLATINUM
17,402 pairs Histopathological diagnosis, Robbins-grade disease mechanisms. Cellular injury, inflammation, neoplasia, and organ pathology.
Immunology PLATINUM
14,264 pairs Immune response, autoimmunity, Janeway-grade immunological reasoning. Innate and adaptive immunity, hypersensitivity, immunodeficiency.
Histology PLATINUM
13,104 pairs Tissue identification, Ross-grade microscopic anatomy. Epithelial, connective, muscle, and nervous tissue classification.
Physiology PLATINUM
13,064 pairs Organ system function, Levy-grade physiological reasoning. Cardiovascular, renal, respiratory, and neurophysiology.
Anatomy PLATINUM
9,065 pairs Structural anatomy, clinical correlations, Gray's-grade anatomical reasoning. Surface, cross-sectional, and surgical anatomy.
Biochemistry GOLD
5,797 pairs Metabolic pathways, enzyme kinetics, Lippincott-grade biochemistry. Glycolysis, TCA, oxidative phosphorylation, and inborn errors.

Mental Health

12,000+ pairs
Psychiatry PLATINUM
14,513 pairs DSM-5 criteria, psychopharmacology, treatment planning. Mood disorders, psychosis, anxiety, and substance use assessment.

Pediatrics

13,000+ pairs
Pediatrics PLATINUM
14,987 pairs Growth assessment, Nelson's-grade pediatric reasoning. Developmental milestones, neonatal care, and pediatric emergencies.

Radiology & Imaging

5,800+ pairs
Neuroradiology GOLD
6,073 pairs MRI interpretation, lumbar spine pathology, structured radiology reports. Brain and spine imaging with clinical correlation.
Radiology GOLD
744 pairs Chest X-ray, CT interpretation, systematic reporting. Pattern recognition and differential imaging diagnosis.

Oncology

650+ pairs
Oncology GOLD
650 pairs Tumor staging, treatment protocols, clinical trial reasoning. NCCN guidelines, chemo regimens, and survivorship planning.

Exam Preparation

6,600+ pairs
USMLE Step 2 CK GOLD
4,092 pairs Clinical science, Step 2 CK board preparation. Case-based reasoning, diagnosis, and next-best-step questions.
USMLE Step 1 GOLD
2,565 pairs Basic science, Step 1 board preparation. Integrated pathophysiology, pharmacology, and foundational science questions.

30+ Sub-Specialties

Additional coverage

Beyond the major specialty blocks above, the SwarmMed dataset includes verified pairs across 30+ additional clinical and research sub-specialties. Each sub-specialty is labeled, quality-gated, and available individually or as part of the complete medical vertical.

endocrinology (910) infectious-disease (545) geriatrics (531) pulmonology (517) gastroenterology nephrology dermatology rheumatology hematology ophthalmology otolaryngology urology orthopedics anesthesiology palliative-care sports-medicine allergy-immunology vascular-surgery plastic-surgery pain-medicine medical-genetics nuclear-medicine toxicology preventive-medicine rehabilitation occupational-medicine sleep-medicine clinical-informatics forensic-medicine tropical-medicine

Trained Models

Production models fine-tuned on SwarmMed data. Sealed, evaluated, and deployment-ready.

SwarmPharma-35B v1
Qwen3.5-35B-A3B — bf16 LoRA r=64

Clinical pharmacology specialist. 5-step trajectory reasoning: IDENTIFY → MECHANISM → ASSESS → CALCULATE → RECOMMEND. Drug-drug interactions, pharmacokinetic calculations, dosage optimization. Zero quantization loss at Q4_K_M.

sealed v1 25,629 pairs 5-step trajectory GGUF Q4_K_M 22 tok/s (3090 Ti)
SwarmMed-Vision-4B
MedMO-4B (Qwen3-VL) — multimodal

Medical imaging specialist. Trained on 14,474 vision pairs covering spine pathology, brain tumor segmentation, and structured radiology report generation. Multimodal input: DICOM-derived images + clinical context.

training 14,474 vision pairs spine + brain imaging Qwen3-VL base
King v2
Qwen2.5-7B-Instruct — medical adapter

General-purpose medical reasoning adapter. Broad clinical coverage across internal medicine, surgery, and foundational sciences. Lightweight deployment target for edge and mobile inference.

sealed v2 7B params medical adapter edge-ready

Imaging Assets

Raw medical imaging datasets used for vision model training and evaluation. Research-grade, multi-institutional sources.

Brain Tumor (BraTS)
484 NIfTI scans — 7.6 GB

Multi-institutional brain tumor segmentation challenge data. T1, T1-Gd, T2, FLAIR sequences with expert annotations. Glioma grading and volumetric analysis.

VinDr-SpineXR
10,466 images — 8 pathology classes

Lumbar spine X-ray dataset with radiologist annotations. Disc degeneration, spondylolisthesis, foraminal stenosis, vertebral fracture, and 4 additional pathology classes.

ECG Databases (PhysioNet)
15+ databases — multi-lead waveforms

Comprehensive ECG signal collections from PhysioNet. Arrhythmia detection, ST-segment analysis, and QT interval measurement across diverse patient populations.

Delivery

Every order ships in 5 formats with train/eval splits, provenance docs, and tamper-evident verification.

5
Output Formats
95/5
Train / Eval Split
SHA-256
Sealed Guarantee
DATA_CARD
Full Provenance
5 Delivery Formats — All Include Train + Eval Split
1. ChatML — swarmmed_train.chatml.jsonl
OpenAI API, TRL, Unsloth, Axolotl
2. Alpaca — swarmmed_train.alpaca.jsonl
LLaMA-Factory, HuggingFace trainers
3. ShareGPT — swarmmed_train.sharegpt.jsonl
FastChat, Vicuna, multi-turn
4. OpenAI — swarmmed_train.openai.jsonl
Direct gpt-4o fine-tuning upload
5. Completion — swarmmed_train.completion.jsonl
Legacy pipelines, custom loops
DATA_CARD.json
Quality metrics, model lineage, gate pass rates, specialty distribution, generation model ID.
guarantee.json
Merkle root of every pair. SHA-256 sealed. Optional Hedera HCS on-chain timestamp. Tamper-evident.
6 Gates
Deterministic quality: JSON validity, output length, numeric verify, concept presence, dedup, degeneration.
Per-Pair
Every pair carries: source, order_id, specialty, model, quality gate result, content fingerprint.