SwarmPharma -- Pharmaceutical AI Training Data

Methodology

5-Step Trajectory in Every Output

Every trajectory-enhanced pair follows the same clinical reasoning chain. No shortcuts. No hallucinated conclusions. Each step is verified.

Step 1

IDENTIFY

Drug, patient, context parameters

→

Step 2

MECHANISM

Receptor, pathway, molecular target

→

Step 3

ASSESS

Risk, severity, clinical significance

→

Step 4

CALCULATE

Dosing, PK params, adjustments

→

Step 5

RECOMMEND

Action, monitoring, alternatives

The trajectory methodology ensures your model doesn't just produce answers -- it produces reasoning chains. Every output traces the clinical logic from drug identification through mechanism analysis to a concrete recommendation with monitoring parameters.

Data Provenance

Where the Pairs Come From

Two verified sources. No synthetic-only generation. Textbook ground truth combined with trajectory-enhanced clinical pairs.

Katzung's Basic & Clinical Pharmacology

The gold standard pharmacology textbook. Core drug knowledge, mechanisms, therapeutic principles.

22,083 pairs

Trajectory-Enhanced Pairs (R2: sb-medical/trajectory/)

5-step methodology. 27 shards. 16 pharma task types. Labeled trajectory=true v1. Verified and sealed.

28,624 pairs

Total Pharmacology Pairs

~50,707 pairs

Task Types

16 Pharmaceutical Task Types

Each task type teaches a distinct pharmacological capability. Every pair is trajectory-verified and quality-gated.

Core Pharmacology -- Drug Mechanisms & Interactions

Core

Drug Interaction Analysis

drug_interaction_analysis

Multi-drug interaction assessment, DDI severity grading, contraindication identification, and interaction cascade analysis across polypharmacy regimens.

Why it matters: Drug-drug interactions cause 125,000+ hospitalizations per year. Your model needs to catch what humans miss in complex medication lists.

Core

Mechanism of Action

mechanism_of_action

Receptor binding profiles, signal transduction pathways, molecular target identification, and downstream pharmacodynamic effects at the cellular level.

Why it matters: Understanding MOA is the foundation of all pharmacology. A model that can explain receptor-level mechanics can reason about novel drug combinations.

Core

Drug Metabolism

drug_metabolism

CYP450 enzyme interactions, phase I/II metabolic pathways, genetic polymorphism effects (CYP2D6, CYP2C19), and metabolite activity profiles.

Why it matters: 75% of drugs are metabolized by CYP450 enzymes. Genetic polymorphisms create 10-100x dosing variability. Your model must understand this machinery.

Core

Drug Class Comparison

drug_class_comparison

Therapeutic class analysis, head-to-head efficacy comparison, side effect profiles, cost-effectiveness evaluation, and guideline-based selection criteria.

Why it matters: Clinicians choose between drugs within the same class constantly. Your model needs to articulate the clinical rationale for one agent over another.

Clinical Practice -- Dosing, Monitoring & PK/PD

Clinical

Pharmacokinetic Modeling

pharmacokinetic_modeling

ADME parameter estimation, dose-response curve analysis, PK/PD modeling, compartmental analysis, and bioavailability calculations across patient populations.

Why it matters: PK modeling drives every dosing decision. Teaching your model ADME fundamentals means it can reason about drug behavior in any patient context.

Clinical

Dosing Optimization

dosing_optimization

Weight-based dosing calculations, renal/hepatic dose adjustments (CrCl, Child-Pugh), therapeutic window management, and loading/maintenance dose protocols.

Why it matters: Wrong dosing is the #1 medication error category. Your model must calculate adjustments for organ impairment, body weight, and drug levels.

Clinical

Therapeutic Monitoring

therapeutic_monitoring

TDM protocol design, drug level interpretation (trough/peak), dose titration strategies, narrow therapeutic index management, and monitoring frequency protocols.

Why it matters: Drugs like vancomycin, lithium, and warfarin have razor-thin therapeutic windows. Models that interpret levels and titrate doses save lives.

Clinical

Formulation Analysis

formulation_analysis

Drug delivery system comparison, bioavailability profiling, extended-release vs IR analysis, route of administration selection, and formulation-specific pharmacokinetics.

Why it matters: The same drug in different formulations can have dramatically different PK profiles. Your model needs to distinguish ER from IR, IV from PO, patch from tablet.

Safety & Pharmacovigilance -- Risk Assessment & Surveillance

Safety

Adverse Event Detection

adverse_event_detection

Side effect profiling, pharmacovigilance signal detection, adverse reaction severity grading, causality assessment (Naranjo scale), and reporting protocol generation.

Why it matters: Post-market adverse events are the leading cause of drug withdrawals. A pharmacovigilance-aware model catches signals before they become crises.

Safety

Drug Safety Assessment

drug_safety_assessment

Black box warning interpretation, REMS program requirements, risk-benefit analysis frameworks, and contraindication assessment for complex patient scenarios.

Why it matters: 350+ drugs carry black box warnings. Your model must understand REMS obligations and articulate risk-benefit ratios with clinical precision.

Safety

Pregnancy Drug Safety

pregnancy_drug_safety

FDA pregnancy categories, teratogenicity risk assessment, lactation safety evaluation, trimester-specific contraindications, and safer alternative recommendations.

Why it matters: 90% of pregnant women take at least one medication. Teratogenicity assessment requires specialized knowledge that general models consistently get wrong.

Safety

Regulatory Review

regulatory_review

FDA approval pathway analysis (NDA, BLA, 505(b)(2)), labeling requirements, post-market surveillance obligations, and regulatory timeline estimation.

Why it matters: The regulatory pathway determines a drug's market trajectory. Your model needs to navigate NDA vs ANDA vs 505(b)(2) with precision.

Special Populations & Patient Care -- Age-Specific & Education

Patient

Pediatric Dosing

pediatric_dosing

Weight-based dose calculations (mg/kg), age-appropriate formulation selection, developmental pharmacokinetics, and neonatal/infant-specific adjustments.

Why it matters: Children are not small adults. Immature hepatic/renal function, different body composition, and developmental PK changes demand specialized dosing logic.

Patient

Geriatric Pharmacology

geriatric_pharmacology

Beers criteria application, polypharmacy management, age-related PK/PD changes, fall risk assessment from medications, and deprescribing protocols.

Why it matters: Adults 65+ take an average of 5+ medications. The Beers criteria alone flag 30+ drug classes to avoid. Your model must navigate this complexity.

Patient

Patient Counseling

patient_counseling

Medication adherence strategies, patient education content generation, lifestyle-drug interaction guidance, and health literacy-appropriate communication.

Why it matters: 50% of medications are not taken as prescribed. Teaching your model to generate clear, actionable patient guidance directly impacts therapeutic outcomes.

Clinical

Clinical Trial Design

clinical_trial_design

Protocol design methodology, primary/secondary endpoint selection, statistical power calculations, inclusion/exclusion criteria, and adaptive trial frameworks.

Why it matters: 90% of clinical trials fail. Better trial design -- endpoints, power, patient selection -- is the highest-leverage intervention in drug development.

Trained Model

SwarmPharma-35B v1

Sealed February 28, 2026. Trained on RTX PRO 6000 Blackwell. Zero quantization loss at Q4_K_M.

Training Configuration

SEALED

Base ModelQwen3.5-35B-A3B

Methodbf16 LoRA r=64 alpha=32

Training Pairs25,629 pairs

Eval Pairs1,348 pairs

Final Train Loss0.337

Training Time13.56 hours

Steps2,402

Step Time20.3s/step

HardwareRTX PRO 6000 Blackwell (96GB)

Sealed Date2026-02-28 08:59 UTC

Deployment Artifacts

VERIFIED

Merged Model67GB, 14 shards bf16

GGUFswarmpharma-35b-v1-q4_k_m.gguf

GGUF Size20GB

QuantizationQ4_K_M

Inference Speed22 tok/s @ -ngl 20

Quantization LossZero

Think Modereasoning_content (Qwen3.5)

Trajectory5-step in every output

SHA2564c522a92e9cd...

Evaluation Results

Tested at both fp16 (Blackwell) and Q4_K_M (3090 Ti via llama-server). Identical accuracy across both precision levels.

7/7

DDI Accuracy

4/4

Off-Domain Rejection

5/5

JSON Format

0

Quantization Loss

22

tok/s Q4_K_M

29

tok/s fp16

Sample Output

What a Pharma Pair Looks Like

Every trajectory-enhanced pair follows the 5-step reasoning chain. This is what your model learns to produce.

swarmpharma-35b-v1 -- drug_interaction_analysis

// User prompt $ Assess the interaction between warfarin and fluconazole in a 72-year-old patient with atrial fibrillation and recent Candida infection. Current INR: 2.4 // Model output (trajectory-verified) [IDENTIFY] Patient: 72yo, AF on warfarin (INR 2.4), new fluconazole for Candida. Polypharmacy risk: geriatric + narrow therapeutic index drug + CYP inhibitor. [MECHANISM] Fluconazole is a potent CYP2C9 inhibitor (Ki = 7.0 uM). Warfarin S-enantiomer (3-5x more potent) is primarily metabolized by CYP2C9. Inhibition reduces S-warfarin clearance by 40-70%, causing INR elevation. [ASSESS] Severity: MAJOR. Clinical significance: HIGH. Expected INR rise: 1.5-3.0 units over 3-7 days. Bleeding risk: elevated (HAS-BLED score increases). Age >65 compounds risk (reduced hepatic reserve). [CALCULATE] Recommended warfarin dose reduction: 25-50%. If current dose 5mg/day, reduce to 2.5-3.75mg/day. Monitor INR at day 3, 7, 14 after fluconazole start. Target INR: 2.0-3.0 (AF indication). [RECOMMEND] 1) Reduce warfarin dose by 30-50% on day 1 of fluconazole. 2) INR check at 72h, then twice weekly. 3) Consider shorter fluconazole course if possible. 4) Alternative: topical antifungal if non-systemic infection. 5) Hold warfarin if INR >4.0.

Delivery

What You Receive

Every pharma data order ships with 5 formats, full provenance, and drug interaction lineage in the DATA_CARD.

Delivery Manifest

SwarmPharma -- 50,000 Pharmacology Pairs

SEALED

5 Delivery Formats -- Train + Eval Split Each

1. ChatML -- swarmpharma_train.chatml.jsonl

OpenAI API, TRL, Unsloth, Axolotl

2. Alpaca -- swarmpharma_train.alpaca.jsonl

LLaMA-Factory, HuggingFace trainers

3. ShareGPT -- swarmpharma_train.sharegpt.jsonl

FastChat, Vicuna, multi-turn trainers

4. OpenAI -- swarmpharma_train.openai.jsonl

Direct upload to gpt-4o fine-tuning

5. Completion -- swarmpharma_train.completion.jsonl

Legacy pipelines, custom training loops

Provenance & Verification

DATA_CARD.json

Drug interaction provenance, source textbook references, trajectory verification status, quality gate scores, task type distribution, model lineage.

guarantee.json

Merkle root of every pair. SHA-256 sealed. Tamper-evident provenance chain from source material to final training pair.

README.txt

Quickstart for Python, Unsloth, OpenAI API. Copy, paste, train. Under 5 minutes to first pharma training run.

Per-Pair Metadata

Every pair carries: task_type, trajectory=true, source (Katzung/trajectory), quality gate result, content fingerprint, drug entities.

Quality Disclosure

5-Step

Trajectory verified: IDENTIFY, MECHANISM, ASSESS, CALCULATE, RECOMMEND

16 Types

Complete pharmaceutical task coverage from DDI to pediatric dosing

6 Gates

Deterministic: length, trajectory, content, dedup, degeneration, schema

Katzung

Gold standard textbook source. Not synthetic-only. Real pharmacology ground truth.

This is what ships. Every pharma data order. All 16 task types included.
Your team picks the framework -- the data is ready.

Infrastructure

R2 Storage & Build Artifacts

All pharma data is sealed in Cloudflare R2 with SHA-256 verification. Frozen snapshots are immutable.

R2 BUCKET LAYOUT sb-medical

# Trajectory-enhanced pharma pairs
sb-medical/trajectory/
  28,624 pairs · 27 shards · labeled trajectory=true v1

# Core medical + pharma base
sb-medical/
  ~432,196 total pairs (403,572 base + 28,624 trajectory)
  85 specialties

# Build artifacts (swarmrails)
/data2/swarmpharma-35b/frozen-v1/
  adapter + tokenizer + config + logs + SHA256

/data2/swarmpharma-35b/models/
  swarmpharma-35b-v1-merged/   # 67GB, 14 shards bf16
  gguf/swarmpharma-35b-v1-q4_k_m.gguf  # 20GB

# GGUF SHA256
4c522a92e9cda7c67efab2f6af27a6545c4ea174fe2b6bec24f6b9667f144a4b