Now Raising Series Seed

Invest in the Future of
Edge AI Inference

Thox.ai is building the purpose-built edge AI platform for professionals and enterprises across healthcare, legal, finance, and every industry that demands private, local, and scalable AI.

Pre-production, pre-revenueActive pre-ordersShips Q3 2026

$30B+ Edge AI Market

The edge AI hardware market is growing rapidly as inference costs eclipse training and privacy constraints tighten across all industries.

Multi-Industry TAM

Healthcare, legal, finance, research, enterprise - every privacy-sensitive sector needs local AI infrastructure.

Purpose-Built Platform

First inference-optimized edge device for professionals, not repurposed consumer or server hardware.

MagStack Clustering

Patent-pending modular scaling technology enables 7B to 200B+ model deployment for any use case.

Investment Thesis

Thox.ai is building a purpose-built edge AI platform that enables professionals and enterprises across all industries to run modern LLM inference locally with predictable cost, low latency, and strong privacy guarantees—without relying on cloud GPUs or oversized general-purpose hardware.

As inference costs eclipse training costs and privacy constraints tighten across healthcare, legal, finance, and enterprise, compute is shifting from centralized cloud to right-sized edge infrastructure. Thox aims to own this transition for every industry that demands data sovereignty.

The Problem

Cloud InferenceExpensive, variable pricing, latency-bound, privacy concerns
GPUsPower-hungry, oversized, operationally complex, expensive
Consumer SiliconLimited scalability, poor clustering, not enterprise-grade

There is no simple, modular, inference-first platform designed for on-prem LLM workloads.

Our Solution

100 TOPS accelerator tuned for inference workloads
Low-power, edge-friendly form factor (25W typical)
MagStack clustering for modular horizontal scaling
TensorRT-LLM-based stack for optimized local inference

Run 7B-32B LLMs locally, scale via clustering, maintain full data sovereignty.

Patent Pending

MagStack™ Clustering Technology

Revolutionary magnetic stacking technology that enables multiple Thox.ai devices to combine RAM and compute power for running larger AI models.

1x
Single Device
RAM
16GB
Compute
100 TOPS
Max Model
32B
2x
2x Stack
RAM
32GB
Compute
200 TOPS
Max Model
70B
4x
4x Stack
RAM
64GB
Compute
400 TOPS
Max Model
100B+
8x
8x Stack
RAM
128GB
Compute
800 TOPS
Max Model
200B+

Magnetic Alignment

N52 neodymium magnets with precision alignment pins ensure perfect stacking

Auto-Discovery

Devices automatically form clusters via mDNS when stacked together

Distributed Inference

Pipeline parallelism splits model layers across devices efficiently

MagStack™ is a trademark of Thox.ai LLC. Patent Pending.

Proprietary Technology Stack

Full-stack platform with custom OS, optimized AI models, and developer tools

Thox OS™

v1.1

Purpose-built for AI inference at the edge with Hybrid TensorRT-LLM

  • Hybrid Ollama + TensorRT-LLM inference
  • Native 100 TOPS NPU with TensorRT optimization
  • Smart model routing (7B → Ollama, 14B+ → TensorRT)
  • 60-100% faster on 14B/32B models
  • OpenAI-compatible REST API with backend info

Thox.ai Coder Models

7B / 14B / 32B variants

Custom fine-tuned models for coding assistance

  • Based on Qwen3-Coder architecture
  • 50+ programming languages
  • 45-72 tok/s on 7B models
  • TensorRT-LLM optimized for 14B+
  • INT4/INT8 quantization support

Developer Platform

APIs, SDK & Integrations

Seamless integration with existing workflows

  • OpenAI-compatible REST API
  • VS Code extension
  • Model Context Protocol (MCP)
  • Web dashboard & monitoring
  • CLI tools & Ollama compatibility

Cluster-Optimized Model Library

Pre-optimized models for MagStack distributed inference

Recommended
30B
Cluster Nano
Based on Nemotron-3-Nano
Context
1M tokens
Memory
24GB
Min Devices
2x
Speed
80-120 tok/s

Long-context model with 1 million token window for processing entire documents, datasets, and complex analyses. MoE architecture with 128 experts.

32B
Cluster Code
Based on Qwen2.5-Coder
Context
128K tokens
Memory
19GB
Min Devices
4x
Speed
100-150 tok/s

Elite software engineering model with GPT-4o competitive performance. Supports 92 programming languages with repository-level analysis, code generation, debugging, and collaborative code review.

8B
Cluster Swift
Based on Ministral-3
Context
32K tokens
Memory
6GB
Min Devices
2x
Speed
50+ tok/s

Speed-optimized model for high-volume, real-time applications. Handles 30-50+ concurrent users with <100ms latency. Ideal for customer support, call centers, and interactive applications.

405B
Cluster Deep
Based on Llama 3.1
Context
128K tokens
Memory
243GB
Min Devices
12x
Speed
120-180 tok/s

Frontier reasoning model with state-of-the-art capabilities. Largest openly available model for research institutions, strategic consulting, financial modeling, legal research, and complex quantitative analysis.

72B
Cluster Secure
Based on Qwen2.5
Context
128K tokens
Memory
47GB
Min Devices
6x
Speed
60-90 tok/s

Government/defense-grade model with maximum security. Supports UNCLASSIFIED through SECRET workloads with N+2 redundancy, air-gap deployment, ITAR compliance, and FedRAMP High authorization.

109B
Cluster Scout
Based on Llama 4 Scout
Context
10M tokens
Memory
67GB
Min Devices
4x
Speed
60-90 tok/s

Professional multimodal model with vision capabilities and industry-leading 10M token context. Native image understanding for healthcare, legal, and finance.

400B
Cluster Maverick
Based on Llama 4 Maverick
Context
1M tokens
Memory
245GB
Min Devices
12x
Speed
30-50 tok/s

Enterprise flagship model with frontier multimodal intelligence. For Fortune 500, hospitals, universities, and government.

72B
Cluster 70B
Based on Qwen 3
Context
64K tokens
Memory
140GB
Min Devices
2x
Speed
25-45 tok/s

Enterprise-grade model for complex reasoning, analysis, and professional workflows.

110B
Cluster 100B
Based on Qwen 3
Context
96K tokens
Memory
220GB
Min Devices
4x
Speed
15-30 tok/s

Expert-level model for enterprise, research, healthcare, and legal workloads.

405B
Cluster 200B
Based on Llama 3.3
Context
128K tokens
Memory
810GB
Min Devices
8x
Speed
10-20 tok/s

Frontier-class model matching cloud AI capabilities for any industry application.

Ideal Customer Profile

Who we are building for and why

Primary ICP

Healthcare Organizations

Hospitals, clinics, research institutions needing HIPAA-compliant AI for patient data analysis

Legal & Financial Services

Law firms, banks, compliance teams requiring confidential document processing

Enterprise & Technology

Companies deploying private AI for R&D, customer service, and internal operations

Secondary ICP (Expanding)

Research & Academia

Universities, research labs, and scientific institutions

Government & Defense

Agencies requiring air-gapped, classified AI environments

Creative & Media

Studios, agencies, and creators needing private content generation

Product Status

Target Shipping: Q3 2026

Completed

  • Product design and architecture
  • Hardware specifications finalized
  • Pre-order system with payment processing
  • Device configurator and pricing
  • Thox OS software direction defined
  • TensorRT-LLM integration architecture
  • Cluster software development
  • SDK and developer tooling

In Progress

  • Hardware prototyping (EVT phase)
  • Benchmarking vs Jetson, Apple Silicon
  • Manufacturing partner selection

Business Model

Phase 1: Hardware Revenue

Single Device

Entry point for professionals and small teams

$799
2-Device Bundle

Clinics, law offices, small enterprises

$1499
4-Device Bundle

Hospitals, enterprises / 100B+ models

$2899

Phase 2: Software & Services (Critical for Scale)

  • Fleet and cluster management software (SaaS)
  • Industry-specific enterprise support and SLAs
  • HIPAA, GDPR, SOC2 compliance packages
  • Custom model fine-tuning and deployment services

Long-term value accrues by owning the AI workflow across every privacy-sensitive industry.

Market Opportunity

$30B+
Edge AI Hardware TAM
Growing 25%+ annually
$12B
On-Prem Inference SAM
Multi-industry segment
$600M
Year 5 Target
Realistic capture with execution

Market Drivers

Inference cost pressure (70%+ of AI compute spend)
HIPAA, GDPR, and regulatory compliance requirements
Privacy demands across healthcare, legal, finance
Enterprise data sovereignty mandates

Why Not Alternatives?

AlternativeLimitationsThox Advantage
Cloud AI (OpenAI, Anthropic)Variable cost, latency, privacy concerns, rate limitsPredictable $0/month after purchase, <50ms latency, 100% private
Apple Silicon MacsPoor clustering, consumer-grade thermals, limited deployment controlMagStack clustering to 800 TOPS, enterprise-grade, purpose-built
NVIDIA Jetson (raw)Requires integration work, no turnkey solution, limited softwareComplete platform with OS, APIs, and developer tools
Server GPUsExpensive ($10K+), high power (300W+), complex deploymentEdge-optimized, 25W typical, desk-friendly form factor

Defensibility

Current Moats

Hardware-Software Integration

Thox OS + TensorRT-LLM + custom models

MagStack Patent Pending

Unique modular scaling approach

First-Mover in Category

Purpose-built developer inference device

Building Toward

Developer Ecosystem

VS Code extension, CLI tools, API compatibility

Model Library

Thox.ai Coder and optimized model collection

Fleet Management

Enterprise software layer for lock-in

Defensibility increases significantly with software layer adoption.

Investor FAQ

Honest answers to the hard questions

Interested in Investing?

Fill out the form below and our investor relations team will be in touch.

By submitting this form, you agree to our Terms of Service and Privacy Policy. We will never share your information with third parties.