Advanced Configuration
Fine-tune your Thox.ai device for optimal performance and security.
Model Management
Listing Models
# List all available models
thox models list --all
# List installed models
thox models list --installed
Installing Models
# Install a model
thox models pull thox-coder-large
# Install with specific quantization
thox models pull thox-coder-large:q4_k_m
Model Priority
Configure model loading priority and memory allocation:
# /etc/thox/models.yaml
models:
thox-coder:
priority: high
memory_limit: 8GB
auto_load: true
thox-chat:
priority: medium
memory_limit: 4GB
auto_load: falseCustom Models
Import GGUF-compatible models:
# Import from local file
thox models import ./my-model.gguf --name custom-model
# Import from URL
thox models import https://example.com/model.gguf --name custom-model
Hybrid Inference
Thox.ai uses a hybrid architecture with Ollama for smaller models and TensorRT-LLM for larger models, providing 60-100% performance improvement for 14B+ models.
Router Configuration
# /opt/thox/configs/router-config.json
{
"strategy": "model_size",
"model_size_threshold_b": 10.0,
"fallback_enabled": true,
"tensorrt_models": {
"thox-coder-max": "thox-coder-32b-trt",
"thox-coder-pro": "thox-coder-14b-trt"
}
}Routing Strategies
model_size (default)
Routes based on parameter count. Models 10B+ use TensorRT-LLM.
explicit
Direct model-to-backend mapping for production control.
performance
Route based on latency requirements for real-time apps.
fallback
TensorRT first, Ollama as backup for reliability.
Check Router Status
curl http://thox.local:8080/router/status | jq
TensorRT-LLM
TensorRT-LLM provides high-performance inference for large models on NVIDIA Jetson hardware with custom attention kernels, paged KV caching, and advanced quantization.
Building TensorRT Engines
# List available models
./build-tensorrt-engines.sh --list
# Build recommended models
./build-tensorrt-engines.sh --all
# Build specific model with INT4 quantization
./build-tensorrt-engines.sh --model thox-coder-14b --quant int4_awq
Performance Comparison
| Model | Ollama (tok/s) | TensorRT-LLM (tok/s) | Improvement |
|---|---|---|---|
| 14B (thox-coder-pro) | 28 | 45-56 | +60-100% |
| 32B (thox-coder-max) | 12 | 20-24 | +67-100% |
| 13B (thox-review) | 30 | 48-55 | +60-83% |
Service Management
# Check TensorRT service status
systemctl status thox-tensorrt
# Restart TensorRT service
sudo systemctl restart thox-tensorrt
# View TensorRT logs
journalctl -u thox-tensorrt -f
Performance Tuning
Ollama Inference Settings
# /etc/thox/inference.yaml
inference:
# Number of threads for CPU operations
threads: 4
# Batch size for inference
batch_size: 512
# Context window size
context_length: 8192
# GPU memory fraction to use
gpu_memory_fraction: 0.9
# Enable flash attention
flash_attention: true
# KV cache quantization
kv_cache_type: q8_0TensorRT-LLM Settings
# Environment variables in thox-tensorrt.service
TRT_MAX_BATCH_SIZE=4
TRT_MAX_INPUT_LEN=4096
TRT_MAX_OUTPUT_LEN=2048
TRT_KV_CACHE_FREE_GPU_MEM_FRACTION=0.4
TRT_ENABLE_PAGED_KV_CACHE=1
TRT_ENABLE_CHUNKED_CONTEXT=1Memory Optimization
Low Memory Mode
Use smaller context, aggressive offloading
thox config set memory_mode lowHigh Performance Mode
Maximize speed, use full memory
thox config set memory_mode highBenchmarking
# Run performance benchmark
thox benchmark --model thox-coder
# Expected output:
Model: thox-coder
Prompt eval: 125 tokens/s
Generation: 45 tokens/s
Memory usage: 6.2GB
Security Settings
API Authentication
Enable API key authentication for remote access:
# Generate API key
thox auth generate-key --name "my-app"
# Enable authentication
thox config set auth.enabled true
# Use in requests
curl -H "Authorization: Bearer sk-xxx" http://thox.local:8080/v1/models
Network Access Control
# /etc/thox/security.yaml
network:
# Bind to specific interface
bind_address: "0.0.0.0"
# Allowed IP ranges
allowed_ips:
- "192.168.1.0/24"
- "10.0.0.0/8"
# Rate limiting
rate_limit:
requests_per_minute: 60
tokens_per_minute: 100000TLS/HTTPS
Enable HTTPS for secure connections:
# Generate self-signed certificate
thox tls generate --hostname thox.local
# Or use existing certificate
thox tls import --cert /path/to/cert.pem --key /path/to/key.pem
# Enable TLS
thox config set tls.enabled true
Security Note: When exposing your device to the internet, always enable authentication, use HTTPS, and configure firewall rules to restrict access.
Backup & Restore
Creating Backups
# Full backup (config + models)
thox backup create --output /path/to/backup.tar.gz
# Config only backup
thox backup create --config-only --output /path/to/config-backup.tar.gz
# Automatic scheduled backup
thox backup schedule --daily --keep 7 --output /backups/
Restoring from Backup
# Full restore
thox backup restore /path/to/backup.tar.gz
# Restore config only
thox backup restore --config-only /path/to/backup.tar.gz
What's Included
Configuration
- • Device settings
- • Model configurations
- • Network settings
- • API keys
Data
- • Installed models
- • Custom prompts
- • Chat history (optional)
- • Usage statistics
Factory Reset
# Reset to factory defaults (keeps models)
thox system reset --keep-models
# Full factory reset
thox system reset --full
Warning: Factory reset is irreversible. Always create a backup before performing a reset.
CONFIDENTIAL AND PROPRIETARY INFORMATION
This documentation is provided for informational and operational purposes only. The specifications and technical details herein are subject to change without notice. Thox.ai LLC reserves all rights in the technologies, methods, and implementations described.
Nothing in this documentation shall be construed as granting any license or right to use any patent, trademark, trade secret, or other intellectual property right of Thox.ai LLC, except as expressly provided in a written agreement.
Patent Protection
The MagStack™ magnetic stacking interface technology, including the magnetic alignment system, automatic cluster formation, NFC-based device discovery, and distributed inference me...
Reverse Engineering Prohibited
You may not reverse engineer, disassemble, decompile, decode, or otherwise attempt to derive the source code, algorithms, data structures, or underlying ideas of any Thox.ai hardwa...
Thox.ai™, Thox OS™, MagStack™, and the Thox.ai logo are trademarks or registered trademarks of Thox.ai LLC in the United States and other countries.
NVIDIA, Jetson, TensorRT, and related marks are trademarks of NVIDIA Corporation. Ollama is a trademark of Ollama, Inc. All other trademarks are the property of their respective owners.
© 2026 Thox.ai LLC. All Rights Reserved.