Thox.ai Edge Device
The most powerful edge AI device for professionals. Run any Ollama-compatible model locally with blazing-fast inference. For healthcare, legal, research, development, and beyond.
Available Colors
Midnight Black
Arctic White
Space Gray
Technical Specifications
Enterprise-grade hardware engineered for any AI workload. Built for professionals who demand performance and privacy.
Dimensions
- Height
- 6 inches (152.4 mm)
- Width
- 4 inches (101.6 mm)
- Depth
- 1.2 inches (30.5 mm)
- Weight
- 450g (0.99 lbs)
Compute
- CPU
- ARM Cortex-A78AE 8-core @ 2.84 GHz
- GPU
- NVIDIA Jetson Orin NX 16GB
- NPU
- 100 TOPS AI Accelerator
- RAM
- 16GB LPDDR5 6400 MT/s
- Storage
- 2 Terabyte NVMe SSD
AI Performance
- INT8 Inference
- 100 TOPS
- FP16 Performance
- 25 TFLOPS
- 7B Model (Ollama)
- 45-72 tokens/s
- 14B Model (TensorRT)
- 45-56 tokens/s (+60%)
- 32B Model (TensorRT)
- 20-24 tokens/s (+100%)
- Max Context
- 128K tokens
Connectivity
- Ethernet
- 2.5 Gigabit
- Wi-Fi
- Wi-Fi 6E (802.11ax)
- Bluetooth
- 5.3 LE
- USB
- 2x USB-C 3.2, 1x USB-A 3.0
- HDMI
- HDMI 2.1 (4K60)
Power
- Input
- 12V DC / USB-C PD 65W
- Typical Load
- 25W
- Max Power
- 45W
- Idle Power
- 5W
MagStack™ Clustering
- Stacking Interface
- Magnetic alignment (8x N52 magnets)
- NFC Discovery
- ST25DV64K (30mm range)
- Data Connection
- 12-pin pogo (10 Gbps USB 3.2)
- Power Passthrough
- USB-PD up to 100W
- Alignment Accuracy
- ±0.5mm, self-centering
- Cluster Formation
- ~10 seconds automatic
- Max Stack Height
- 8 devices
- Cluster Interconnect
- Wi-Fi 6E / 2.5GbE
- Auto-Discovery
- mDNS + NFC handshake
- Combined RAM (8x)
- Up to 128GB
- Combined Compute (8x)
- Up to 800 TOPS
MagStack™ Clustering
Stack multiple devices to combine RAM and compute power. Run larger AI models than ever before.
- Combined RAM
- 16GB
- Total Compute
- 100 TOPS
- Max Model Size
- 32B
- Performance
- 20-72 tok/s
- Combined RAM
- 32GB
- Total Compute
- 200 TOPS
- Max Model Size
- 70B
- Performance
- 25-45 tok/s
- Combined RAM
- 64GB
- Total Compute
- 400 TOPS
- Max Model Size
- 100B+
- Performance
- 15-30 tok/s
- Combined RAM
- 128GB
- Total Compute
- 800 TOPS
- Max Model Size
- 200B+
- Performance
- 10-20 tok/s
How MagStack™ Works
Approach
NFC antennas detect proximity at 30mm and initiate handshake
Align & Connect
N52 magnets self-align, pogo pins establish 10 Gbps data link
Form Cluster
Leader elected in ~10 seconds, models auto-partitioned
Run Models
Pipeline parallelism splits layers across devices via 10 Gbps
Thox.ai™, Thox OS™, and MagStack™ are trademarks of Thox.ai LLC. MagStack magnetic stacking technology is Patent Pending.
Cluster AI Models
Models designed for distributed inference across MagStack™ clusters. Available on Ollama.
thox-cluster-nano
RECOMMENDEDOur recommended cluster model featuring a 1 million token context window based on NVIDIA Nemotron-3-Nano. Process entire codebases in a single context - no chunking or summarization needed.
Cluster Nano
Long-context model with 1 million token window for processing entire documents, datasets, and complex analyses. MoE architecture with 128 experts.
- Parameters
- 30B
- Context
- 1M tokens
- Min Devices
- 2x
- Speed
- 80-120 tok/s
- Base Model
- Nemotron-3-Nano
Cluster Code
Elite software engineering model with GPT-4o competitive performance. Supports 92 programming languages with repository-level analysis, code generation, debugging, and collaborative code review.
- Parameters
- 32B
- Context
- 128K tokens
- Min Devices
- 4x
- Speed
- 100-150 tok/s
- Base Model
- Qwen2.5-Coder
Cluster Swift
Speed-optimized model for high-volume, real-time applications. Handles 30-50+ concurrent users with <100ms latency. Ideal for customer support, call centers, and interactive applications.
- Parameters
- 8B
- Context
- 32K tokens
- Min Devices
- 2x
- Speed
- 50+ tok/s
- Base Model
- Ministral-3
Cluster Deep
Frontier reasoning model with state-of-the-art capabilities. Largest openly available model for research institutions, strategic consulting, financial modeling, legal research, and complex quantitative analysis.
- Parameters
- 405B
- Context
- 128K tokens
- Min Devices
- 12x
- Speed
- 120-180 tok/s
- Base Model
- Llama 3.1
Cluster Secure
Government/defense-grade model with maximum security. Supports UNCLASSIFIED through SECRET workloads with N+2 redundancy, air-gap deployment, ITAR compliance, and FedRAMP High authorization.
- Parameters
- 72B
- Context
- 128K tokens
- Min Devices
- 6x
- Speed
- 60-90 tok/s
- Base Model
- Qwen2.5
Cluster Scout
Professional multimodal model with vision capabilities and industry-leading 10M token context. Native image understanding for healthcare, legal, and finance.
- Parameters
- 109B
- Context
- 10M tokens
- Min Devices
- 4x
- Speed
- 60-90 tok/s
- Base Model
- Llama 4 Scout
Cluster Maverick
Enterprise flagship model with frontier multimodal intelligence. For Fortune 500, hospitals, universities, and government.
- Parameters
- 400B
- Context
- 1M tokens
- Min Devices
- 12x
- Speed
- 30-50 tok/s
- Base Model
- Llama 4 Maverick
Cluster 70B
Enterprise-grade model for complex reasoning, analysis, and professional workflows.
- Parameters
- 72B
- Context
- 64K tokens
- Min Devices
- 2x
- Speed
- 25-45 tok/s
- Base Model
- Qwen 3
Cluster 100B
Expert-level model for enterprise, research, healthcare, and legal workloads.
- Parameters
- 110B
- Context
- 96K tokens
- Min Devices
- 4x
- Speed
- 15-30 tok/s
- Base Model
- Qwen 3
Cluster 200B
Frontier-class model matching cloud AI capabilities for any industry application.
- Parameters
- 405B
- Context
- 128K tokens
- Min Devices
- 8x
- Speed
- 10-20 tok/s
- Base Model
- Llama 3.3
Which Model Should I Use?
| Use Case | Recommended Model | Why |
|---|---|---|
| Large document analysis | thox-cluster-nano | 1M context for full documents and datasets |
| Research & complex reasoning | thox-cluster-70b | 70B params for advanced analysis |
| Healthcare, legal, enterprise | thox-cluster-100b | Expert-level professional workloads |
| Frontier-class AI tasks | thox-cluster-200b | Matches cloud AI capabilities locally |
Latest Compatible Models
The newest Ollama models from 2024-2025, optimized for Thox.ai devices. Vision-enabled, multilingual, and professional-grade.
View complete model catalog and compatibility guideMinistral-3 8B
Vision, 32+ languages, edge AI
- Speed
- 40-60 tokens/s
- Backend
- Ollama
- Context
- 256K tokens
Llama 4 Scout
Frontier multimodal, 12 languages
- Speed
- 35-50 tokens/s
- Backend
- Hybrid
- Context
- 10M tokens
- Min Devices
- 2x
Qwen 3 14B
Advanced reasoning, vision
- Speed
- 30-45 tokens/s
- Backend
- TensorRT-LLM
- Context
- 128K tokens
Phi-4 Mini (3.8B)
Ultra-fast, multilingual, tools
- Speed
- 70-95 tokens/s
- Backend
- Ollama
- Context
- 128K tokens
Qwen 2.5 Coder 14B
Code specialist, reasoning
- Speed
- 28-42 tokens/s
- Backend
- TensorRT-LLM
- Context
- 128K tokens
Gemma 3 8B
Vision, single GPU optimized
- Speed
- 38-55 tokens/s
- Backend
- Ollama
- Context
- 128K tokens
Latest 2024-2025 models with vision, multilingual (32+ languages), and thinking capabilities. Hybrid Ollama + TensorRT-LLM inference delivers 60-100% faster performance on 14B+ models. Compatible with 100+ Ollama models.
What's in the Box
Everything you need to get started.
- Thox.ai Edge Device
- 65W GaN USB-C Power Adapter
- USB-C to USB-C Cable (1m)
- Quick Start Guide
- Ethernet Cable (CAT6, 1m)
- Mounting Bracket Kit
- Thermal Pad Set
Powered by Thox OS™
A custom operating system purpose-built for AI inference at the edge.
TensorRT-LLM Acceleration
60-100% faster inference on 14B+ models via TensorRT-LLM
Hybrid Smart Routing
Auto-routes to optimal backend: Ollama or TensorRT-LLM
Native Jetson Execution
Runs directly on device with JetPack 6.x integration
Hybrid AI Runtime
- Ollama + TensorRT-LLM backends
- Thox.ai Coder models (7B/14B/32B)
- Smart router with auto-backend
- OpenAI-compatible API
- 60-100% faster on 14B+ models
Ready for Any Workflow
- Intuitive web dashboard
- API access for any application
- CLI tools for power users
- Automatic updates (OTA)
Thox OS™ is a trademark of Thox.ai LLC. All rights reserved.
Frequently Asked Questions
Got questions? We've got answers.