Now Raising Series Seed

Invest in the Future of
Edge AI Inference

Thox.ai is building the purpose-built edge AI platform for professionals and enterprises across healthcare, legal, finance, and every industry that demands private, local, and scalable AI.

Pre-production, pre-revenueActive pre-ordersShips Q3 2026

$30B+ Edge AI Market

The edge AI hardware market is growing rapidly as inference costs eclipse training and privacy constraints tighten across all industries.

Multi-Industry TAM

Healthcare, legal, finance, research, enterprise - every privacy-sensitive sector needs local AI infrastructure.

Purpose-Built Platform

First inference-optimized edge device for professionals, not repurposed consumer or server hardware.

MagStack Clustering

Patent-pending modular scaling technology enables 7B to 200B+ model deployment for any use case.

Investment Thesis

Thox.ai is building a purpose-built edge AI platform that enables professionals and enterprises across all industries to run modern LLM inference locally with predictable cost, low latency, and strong privacy guarantees—without relying on cloud GPUs or oversized general-purpose hardware.

As inference costs eclipse training costs and privacy constraints tighten across healthcare, legal, finance, and enterprise, compute is shifting from centralized cloud to right-sized edge infrastructure. Thox aims to own this transition for every industry that demands data sovereignty.

The Problem

Cloud InferenceExpensive, variable pricing, latency-bound, privacy concerns

GPUsPower-hungry, oversized, operationally complex, expensive

Consumer SiliconLimited scalability, poor clustering, not enterprise-grade

There is no simple, modular, inference-first platform designed for on-prem LLM workloads.

Our Solution

100 TOPS accelerator tuned for inference workloads

Low-power, edge-friendly form factor (25W typical)

MagStack clustering for modular horizontal scaling

TensorRT-LLM-based stack for optimized local inference

Run 7B-32B LLMs locally, scale via clustering, maintain full data sovereignty.

Patent Pending

MagStack™ Clustering Technology

Revolutionary magnetic stacking technology that enables multiple Thox.ai devices to combine RAM and compute power for running larger AI models.

Single Device

RAM: 16GB
Compute: 100 TOPS
Max Model: 32B

2x Stack

RAM: 32GB
Compute: 200 TOPS
Max Model: 70B

4x Stack

RAM: 64GB
Compute: 400 TOPS
Max Model: 100B+

8x Stack

RAM: 128GB
Compute: 800 TOPS
Max Model: 200B+

Magnetic Alignment

N52 neodymium magnets with precision alignment pins ensure perfect stacking

Auto-Discovery

Devices automatically form clusters via mDNS when stacked together

Distributed Inference

Pipeline parallelism splits model layers across devices efficiently

MagStack™ is a trademark of Thox.ai LLC. Patent Pending.

Proprietary Technology Stack

Full-stack platform with custom OS, optimized AI models, and developer tools

Thox OS™

v1.1

Purpose-built for AI inference at the edge with Hybrid TensorRT-LLM

Hybrid Ollama + TensorRT-LLM inference
Native 100 TOPS NPU with TensorRT optimization
Smart model routing (7B → Ollama, 14B+ → TensorRT)
60-100% faster on 14B/32B models
OpenAI-compatible REST API with backend info

Thox.ai Coder Models

7B / 14B / 32B variants

Custom fine-tuned models for coding assistance

Based on Qwen3-Coder architecture
50+ programming languages
45-72 tok/s on 7B models
TensorRT-LLM optimized for 14B+
INT4/INT8 quantization support

Developer Platform

APIs, SDK & Integrations

Seamless integration with existing workflows

OpenAI-compatible REST API
VS Code extension
Model Context Protocol (MCP)
Web dashboard & monitoring
CLI tools & Ollama compatibility

Cluster-Optimized Model Library

Pre-optimized models for MagStack distributed inference

Recommended

30B

Cluster Nano

Based on Nemotron-3-Nano

Context: 1M tokens
Memory: 24GB
Min Devices: 2x
Speed: 80-120 tok/s

Long-context model with 1 million token window for processing entire documents, datasets, and complex analyses. MoE architecture with 128 experts.

32B

Cluster Code

Based on Qwen2.5-Coder

Context: 128K tokens
Memory: 19GB
Min Devices: 4x
Speed: 100-150 tok/s

Elite software engineering model with GPT-4o competitive performance. Supports 92 programming languages with repository-level analysis, code generation, debugging, and collaborative code review.

Cluster Swift

Based on Ministral-3

Context: 32K tokens
Memory: 6GB
Min Devices: 2x
Speed: 50+ tok/s

Speed-optimized model for high-volume, real-time applications. Handles 30-50+ concurrent users with <100ms latency. Ideal for customer support, call centers, and interactive applications.

405B

Cluster Deep

Based on Llama 3.1

Context: 128K tokens
Memory: 243GB
Min Devices: 12x
Speed: 120-180 tok/s

Frontier reasoning model with state-of-the-art capabilities. Largest openly available model for research institutions, strategic consulting, financial modeling, legal research, and complex quantitative analysis.

72B

Cluster Secure

Based on Qwen2.5

Context: 128K tokens
Memory: 47GB
Min Devices: 6x
Speed: 60-90 tok/s

Government/defense-grade model with maximum security. Supports UNCLASSIFIED through SECRET workloads with N+2 redundancy, air-gap deployment, ITAR compliance, and FedRAMP High authorization.

109B

Cluster Scout

Based on Llama 4 Scout

Context: 10M tokens
Memory: 67GB
Min Devices: 4x
Speed: 60-90 tok/s

Professional multimodal model with vision capabilities and industry-leading 10M token context. Native image understanding for healthcare, legal, and finance.

400B

Cluster Maverick

Based on Llama 4 Maverick

Context: 1M tokens
Memory: 245GB
Min Devices: 12x
Speed: 30-50 tok/s

Enterprise flagship model with frontier multimodal intelligence. For Fortune 500, hospitals, universities, and government.

72B

Cluster 70B

Based on Qwen 3

Context: 64K tokens
Memory: 140GB
Min Devices: 2x
Speed: 25-45 tok/s

Enterprise-grade model for complex reasoning, analysis, and professional workflows.

110B

Cluster 100B

Based on Qwen 3

Context: 96K tokens
Memory: 220GB
Min Devices: 4x
Speed: 15-30 tok/s

Expert-level model for enterprise, research, healthcare, and legal workloads.

405B

Cluster 200B

Based on Llama 3.3

Context: 128K tokens
Memory: 810GB
Min Devices: 8x
Speed: 10-20 tok/s

Frontier-class model matching cloud AI capabilities for any industry application.

Ideal Customer Profile

Who we are building for and why

Primary ICP

Healthcare Organizations

Hospitals, clinics, research institutions needing HIPAA-compliant AI for patient data analysis

Legal & Financial Services

Law firms, banks, compliance teams requiring confidential document processing

Enterprise & Technology

Companies deploying private AI for R&D, customer service, and internal operations

Secondary ICP (Expanding)

Research & Academia

Universities, research labs, and scientific institutions

Government & Defense

Agencies requiring air-gapped, classified AI environments

Creative & Media

Studios, agencies, and creators needing private content generation

Product Status

Target Shipping: Q3 2026

Completed

Product design and architecture
Hardware specifications finalized
Pre-order system with payment processing
Device configurator and pricing
Thox OS software direction defined
TensorRT-LLM integration architecture
Cluster software development
SDK and developer tooling

In Progress

Hardware prototyping (EVT phase)
Benchmarking vs Jetson, Apple Silicon
Manufacturing partner selection

Business Model

Phase 1: Hardware Revenue

Single Device

Entry point for professionals and small teams

$799

2-Device Bundle

Clinics, law offices, small enterprises

$1499

4-Device Bundle

Hospitals, enterprises / 100B+ models

$2899

Phase 2: Software & Services (Critical for Scale)

Fleet and cluster management software (SaaS)
Industry-specific enterprise support and SLAs
HIPAA, GDPR, SOC2 compliance packages
Custom model fine-tuning and deployment services

Long-term value accrues by owning the AI workflow across every privacy-sensitive industry.

Market Opportunity

$30B+

Edge AI Hardware TAM

Growing 25%+ annually

$12B

On-Prem Inference SAM

Multi-industry segment

$600M

Year 5 Target

Realistic capture with execution

Market Drivers

Inference cost pressure (70%+ of AI compute spend)

HIPAA, GDPR, and regulatory compliance requirements

Privacy demands across healthcare, legal, finance

Enterprise data sovereignty mandates

Why Not Alternatives?

Alternative	Limitations	Thox Advantage
Cloud AI (OpenAI, Anthropic)	Variable cost, latency, privacy concerns, rate limits	Predictable $0/month after purchase, <50ms latency, 100% private
Apple Silicon Macs	Poor clustering, consumer-grade thermals, limited deployment control	MagStack clustering to 800 TOPS, enterprise-grade, purpose-built
NVIDIA Jetson (raw)	Requires integration work, no turnkey solution, limited software	Complete platform with OS, APIs, and developer tools
Server GPUs	Expensive ($10K+), high power (300W+), complex deployment	Edge-optimized, 25W typical, desk-friendly form factor

Defensibility

Current Moats

Hardware-Software Integration

Thox OS + TensorRT-LLM + custom models

MagStack Patent Pending

Unique modular scaling approach

First-Mover in Category

Purpose-built developer inference device

Building Toward

Developer Ecosystem

VS Code extension, CLI tools, API compatibility

Model Library

Thox.ai Coder and optimized model collection

Fleet Management

Enterprise software layer for lock-in

Defensibility increases significantly with software layer adoption.

Investor FAQ

Honest answers to the hard questions

Interested in Investing?

Fill out the form below and our investor relations team will be in touch.

Invest in the Future ofEdge AI Inference