MagStack™ Cluster Models

Enterprise AI models optimized for distributed inference across MagStack clusters

Thox.ai provides specialized AI models designed for distributed inference across MagStack™ device clusters. These models leverage tensor parallelism and MoE (Mixture of Experts) architectures to run frontier-class AI on your desk with complete privacy.

Healthcare

HIPAA Compliant

Legal

Attorney-Client Privilege

Enterprise

SOC2 & GDPR

Research

FERPA Compliant

Recommended for Most Users

Start with thox-cluster-nano - it offers a 1 million token context window in just 24GB, making it ideal for processing entire documents, codebases, and datasets on a 2-device cluster.

Core Cluster Models

Cluster Nano

RecommendedEntry

Thox-ai/thox-cluster-nano

View on Ollama

Long-context model with 1 million token window for processing entire documents, datasets, and complex analyses. MoE architecture with 128 experts.

Parameters

30B

3.5B (MoE)

Context

1M tokens

Memory

24GB

Min Devices

2x

Speed

80-120 tok/s

Base Model

Nemotron-3-Nano

Benchmarks:

mmluPro: 78.3%gpqa: 73.0%aime25: 99.2%liveCodeBench: 68.3%

Key Features:

  • 1M token context - process entire documents
  • MoE architecture (3.5B active of 30B)
  • 5-15 concurrent professional users
  • HIPAA, GDPR, SOC2, FERPA compliant
HIPAAGDPRSOC2FERPA

Cluster Scout

Professional

Thox-ai/thox-cluster-scout

View on Ollama

Professional multimodal model with vision capabilities and industry-leading 10M token context. Native image understanding for healthcare, legal, and finance.

Parameters

109B

17B (MoE)

Context

10M tokens

Memory

67GB

Min Devices

4x

Speed

60-90 tok/s

Base Model

Llama 4 Scout

Benchmarks:

mmluPro: 74.3%gpqaDiamond: 57.2%mmmu: 69.4%chartQA: 88.8%docVQA: 94.4%

Key Features:

  • 10M token context - industry-leading
  • Native vision & image understanding
  • Multilingual (12 languages)
  • Medical imaging, chart analysis, OCR
HIPAAGDPRSOC2FERPACCPA

Cluster Maverick

Enterprise

Thox-ai/thox-cluster-maverick

View on Ollama

Enterprise flagship model with frontier multimodal intelligence. For Fortune 500, hospitals, universities, and government.

Parameters

400B

17B (MoE)

Context

1M tokens

Memory

245GB

Min Devices

12x

Speed

30-50 tok/s

Base Model

Llama 4 Maverick

Benchmarks:

mmluPro: 80.5%gpqaDiamond: 69.8%mmmu: 73.4%liveCodeBench: 43.4%mgsm: 92.3%

Key Features:

  • Frontier-class multimodal AI
  • 200+ concurrent enterprise users
  • All major compliance frameworks
  • Fortune 500, hospitals, government
HIPAAGDPRSOC2FERPACCPAFISMAITAR

Specialized Cluster Variants

Purpose-built models optimized for specific professional workflows: software engineering, high-speed operations, frontier reasoning, and government/defense security requirements.

Cluster Code

Professional

Thox-ai/thox-cluster-code

View on Ollama

Elite software engineering model with GPT-4o competitive performance. Supports 92 programming languages with repository-level analysis, code generation, debugging, and collaborative code review.

Size

32B

Context

128K tokens

Devices

4+

Key Features:

  • 92 programming languages support
  • GPT-4o competitive on code generation
  • Repository-level analysis (128K context)
  • 73.7 Aider score - elite code repair

Benchmarks:

aiderScore: 73.7humanEvalPlus: 92.7%mbppPlus: 88.1%liveCodeBench: 73.5%
HIPAAGDPRSOC2FERPAFISMA

Cluster Swift

Professional

Thox-ai/thox-cluster-swift

View on Ollama

Speed-optimized model for high-volume, real-time applications. Handles 30-50+ concurrent users with <100ms latency. Ideal for customer support, call centers, and interactive applications.

Size

8B

Context

32K tokens

Devices

2+

Key Features:

  • 50+ tokens/sec ultra-fast responses
  • <100ms first token latency
  • 30-50+ concurrent users
  • Real-time chat & customer support
HIPAAGDPRSOC2FERPA

Cluster Deep

Enterprise

Thox-ai/thox-cluster-deep

View on Ollama

Frontier reasoning model with state-of-the-art capabilities. Largest openly available model for research institutions, strategic consulting, financial modeling, legal research, and complex quantitative analysis.

Size

405B

Context

128K tokens

Devices

12+

Key Features:

  • 405B parameters - largest open model
  • Frontier-class reasoning capabilities
  • Research-grade deep analysis
  • Strategic consulting & financial modeling

Benchmarks:

mmluPro: 84.7%gpqaDiamond: 59.1%mathVista: 67.8%ifEval: 88.6%
HIPAAGDPRSOC2FERPAFISMA

Cluster Secure

Enterprise

Thox-ai/thox-cluster-secure

View on Ollama

Government/defense-grade model with maximum security. Supports UNCLASSIFIED through SECRET workloads with N+2 redundancy, air-gap deployment, ITAR compliance, and FedRAMP High authorization.

Size

72B

Context

128K tokens

Devices

6+

Key Features:

  • ITAR, FedRAMP High, FISMA compliant
  • Air-gapped deployment ready
  • UNCLASSIFIED to SECRET workloads
  • N+2 redundancy for mission assurance
HIPAAGDPRSOC2FERPAFISMAITARFedRAMPCMMC

Detailed Use Cases & ROI

Cluster Code - Software Engineering Teams

Use Cases:

  • Software Teams (25 engineers): Repository analysis, code reviews, architecture design, testing
  • Startups (15-20 engineers): Rapid prototyping, technical debt reduction, performance optimization
  • DevOps (20 engineers): IaC, CI/CD pipelines, monitoring, incident response
  • QA Engineering (15 engineers): Test automation, coverage analysis, security testing

Engineering Team ROI:

  • Development Velocity: 40-50% faster feature development
  • Code Quality: 80%+ test coverage, reduced vulnerabilities
  • Onboarding: 60% faster new engineer ramp-up
  • Cost Savings: $50,000 - $150,000/year vs cloud alternatives

3-Year TCO (6 devices, 30 engineers):

$29,500 (~$985/user)

vs GitHub Copilot: $54,000 - $67,500

Pays for itself in 12-18 months

IDE Integration:

VS CodeIntelliJ / JetBrainsVim / NeovimEmacs

Cluster Swift - High-Speed Operations

Use Cases:

  • Customer Support (40-50 agents): Real-time query assistance, knowledge base search
  • Call Centers (30-40 agents): Live transcription, agent assistance, sentiment analysis
  • Interactive Apps (50+ users): Real-time chatbots, collaboration, dynamic content
  • Healthcare Admissions (20-30 staff): Patient intake, scheduling, HIPAA-compliant support

Enterprise ROI:

  • Customer Support: 50% faster response times, higher satisfaction
  • Call Center: $80,000/year operational efficiency gains
  • Healthcare: 40% faster patient processing
  • Cost Savings: $40,000 - $130,000/year vs cloud APIs

3-Year TCO (3 devices, 50 users):

$13,000 (~$260/user)

vs Cloud AI APIs: $54,000 - $144,000

Pays for itself in 3-9 months

Real-Time Performance:

First Token

<100ms

Throughput

50+ tok/s

Response Time

<1 second

Concurrent

50+ users

Uptime

99.9%

Cluster Deep - Frontier Reasoning

Use Cases:

  • R1 Universities (30-40 faculty): Grant proposals, literature reviews, collaboration
  • Consulting Firms (20-30 consultants): Strategic analysis, market research, forecasting
  • Financial Analysis (25-35 analysts): Quantitative modeling, risk assessment, compliance
  • Legal Research (20-30 attorneys): Case law analysis, litigation strategy, compliance

Research Institution ROI:

  • Grant Success: 4-5x improvement in grant funding
  • Publications: 3-5x increase in peer-reviewed output
  • Research Speed: 60% faster literature review and analysis
  • Cost Savings: $100,000 - $300,000/year vs cloud frontiers

3-Year TCO (14 devices, 40 researchers):

$79,000 (~$1,975/user)

vs Claude Opus/GPT-4: $108,000 - $288,000

Pays for itself in 9-24 months

Frontier Capabilities:

Advanced Chain-of-ThoughtLiterature ReviewHypothesis GenerationMathematical ModelingStrategic Planning

Cluster Secure - Government/Defense

Use Cases:

  • DOD (20-30 analysts): Intelligence analysis, mission planning, threat assessment
  • Intelligence Community (15-25 analysts): All-source analysis, counterintel, OSINT
  • Defense Contractors (20-30 personnel): ITAR-controlled analysis, export control
  • Federal Law Enforcement (15-25 agents): Case investigation, counterterrorism

Government/Defense ROI:

  • Mission Effectiveness: $200,000+/year improvement
  • Compliance: Avoid classification spills and violations
  • Security: Zero external data exposure
  • Availability: 99.99% mission assurance with N+2 redundancy

3-Year TCO (10 devices, 30 cleared personnel):

$124,000 (~$4,133/user)

Cloud AI: NOT AUTHORIZED for SECRET

Only authorized solution for classified AI

Security Features:

CAC/PIV AuthAir-Gap ReadyTEMPESTZero TrustSIEM IntegrationInsider Threat Detection

Compliance & Certification:

ITARFedRAMP HighFISMACMMC Level 3-5NIST 800-53NIST 800-171RMFDISA STIGs

Additional Cluster Models

Cluster 70B

Professional

Thox-ai/thox-cluster-70b

Enterprise-grade model for complex reasoning, analysis, and professional workflows.

Size

72B

Context

64K tokens

Devices

2x

Cluster 100B

Professional

Thox-ai/thox-cluster-100b

Expert-level model for enterprise, research, healthcare, and legal workloads.

Size

110B

Context

96K tokens

Devices

4x

Cluster 200B

Enterprise

Thox-ai/thox-cluster-200b

Frontier-class model matching cloud AI capabilities for any industry application.

Size

405B

Context

128K tokens

Devices

8x

Cluster Coordinator

Coordinator

Utility

Lightweight cluster orchestration and management model.

Size: 4B
Context: 16K tokens
Memory: 3GB
Speed: <100ms

Quick Start

# Pull and run thox-cluster-nano (recommended)

$ ollama pull Thox-ai/thox-cluster-nano

$ ollama run Thox-ai/thox-cluster-nano

# For software engineering teams (4+ devices)

$ ollama pull Thox-ai/thox-cluster-code

$ ollama run Thox-ai/thox-cluster-code

# For high-volume/real-time apps (2+ devices)

$ ollama pull Thox-ai/thox-cluster-swift

$ ollama run Thox-ai/thox-cluster-swift

# For frontier reasoning research (12+ devices)

$ ollama pull Thox-ai/thox-cluster-deep

$ ollama run Thox-ai/thox-cluster-deep

# For government/defense (6+ devices)

$ ollama pull Thox-ai/thox-cluster-secure

$ ollama run Thox-ai/thox-cluster-secure

# For vision capabilities (4+ devices)

$ ollama pull Thox-ai/thox-cluster-scout

$ ollama run Thox-ai/thox-cluster-scout

# Enterprise flagship (12+ devices)

$ ollama pull Thox-ai/thox-cluster-maverick

$ ollama run Thox-ai/thox-cluster-maverick

# Use clusterctl for automatic model selection

$ clusterctl recommend

# Output: Recommended: thox-cluster-nano (2 devices, 32GB RAM)

Model Selection Guide

Use CaseRecommendedDevicesWhy
Full document/codebase analysisthox-cluster-nano2+1M token context handles entire repos
Software engineering teamsthox-cluster-code4+92 languages, GPT-4o competitive coding
High-volume customer supportthox-cluster-swift2+50+ concurrent users, <100ms latency
Advanced research & analysisthox-cluster-deep12+405B frontier reasoning model
Government/defense classifiedthox-cluster-secure6+ITAR, FedRAMP, air-gap ready
Medical imaging & chart analysisthox-cluster-scout4+Native vision, 10M context, HIPAA
Fortune 500 enterprise AIthox-cluster-maverick12+Frontier-class, 200+ concurrent users

Complete Privacy & Compliance

All cluster models run entirely on your local MagStack configuration. No data is transmitted to external servers. Your code, documents, medical records, and conversations never leave your devices.

HIPAAGDPRSOC2FERPACCPA

CONFIDENTIAL AND PROPRIETARY INFORMATION

This documentation is provided for informational and operational purposes only. The specifications and technical details herein are subject to change without notice. Thox.ai LLC reserves all rights in the technologies, methods, and implementations described.

Nothing in this documentation shall be construed as granting any license or right to use any patent, trademark, trade secret, or other intellectual property right of Thox.ai LLC, except as expressly provided in a written agreement.

Patent Protection

The MagStack™ magnetic stacking interface technology, including the magnetic alignment system, automatic cluster formation, NFC-based device discovery, and distributed inference me...

Reverse Engineering Prohibited

You may not reverse engineer, disassemble, decompile, decode, or otherwise attempt to derive the source code, algorithms, data structures, or underlying ideas of any Thox.ai hardwa...

Thox.ai™, Thox OS™, MagStack™, and the Thox.ai logo are trademarks or registered trademarks of Thox.ai LLC in the United States and other countries.

NVIDIA, Jetson, TensorRT, and related marks are trademarks of NVIDIA Corporation. Ollama is a trademark of Ollama, Inc. All other trademarks are the property of their respective owners.

© 2026 Thox.ai LLC. All Rights Reserved.