Advanced Configuration

Fine-tune your Thox.ai device for optimal performance and security.

Model Management

Listing Models

# List all available models

thox models list --all

# List installed models

thox models list --installed

Installing Models

# Install a model

thox models pull thox-coder-large

# Install with specific quantization

thox models pull thox-coder-large:q4_k_m

Model Priority

Configure model loading priority and memory allocation:

# /etc/thox/models.yaml
models:
  thox-coder:
    priority: high
    memory_limit: 8GB
    auto_load: true

  thox-chat:
    priority: medium
    memory_limit: 4GB
    auto_load: false

Custom Models

Import GGUF-compatible models:

# Import from local file

thox models import ./my-model.gguf --name custom-model

# Import from URL

thox models import https://example.com/model.gguf --name custom-model

Hybrid Inference

Thox.ai uses a hybrid architecture with Ollama for smaller models and TensorRT-LLM for larger models, providing 60-100% performance improvement for 14B+ models.

Router Configuration

# /opt/thox/configs/router-config.json
{
  "strategy": "model_size",
  "model_size_threshold_b": 10.0,
  "fallback_enabled": true,
  "tensorrt_models": {
    "thox-coder-max": "thox-coder-32b-trt",
    "thox-coder-pro": "thox-coder-14b-trt"
  }
}

Routing Strategies

model_size (default)

Routes based on parameter count. Models 10B+ use TensorRT-LLM.

explicit

Direct model-to-backend mapping for production control.

performance

Route based on latency requirements for real-time apps.

fallback

TensorRT first, Ollama as backup for reliability.

Check Router Status

curl http://thox.local:8080/router/status | jq

TensorRT-LLM

TensorRT-LLM provides high-performance inference for large models on NVIDIA Jetson hardware with custom attention kernels, paged KV caching, and advanced quantization.

Building TensorRT Engines

# List available models

./build-tensorrt-engines.sh --list

# Build recommended models

./build-tensorrt-engines.sh --all

# Build specific model with INT4 quantization

./build-tensorrt-engines.sh --model thox-coder-14b --quant int4_awq

Performance Comparison

ModelOllama (tok/s)TensorRT-LLM (tok/s)Improvement
14B (thox-coder-pro)2845-56+60-100%
32B (thox-coder-max)1220-24+67-100%
13B (thox-review)3048-55+60-83%

Service Management

# Check TensorRT service status

systemctl status thox-tensorrt

# Restart TensorRT service

sudo systemctl restart thox-tensorrt

# View TensorRT logs

journalctl -u thox-tensorrt -f

Performance Tuning

Ollama Inference Settings

# /etc/thox/inference.yaml
inference:
  # Number of threads for CPU operations
  threads: 4

  # Batch size for inference
  batch_size: 512

  # Context window size
  context_length: 8192

  # GPU memory fraction to use
  gpu_memory_fraction: 0.9

  # Enable flash attention
  flash_attention: true

  # KV cache quantization
  kv_cache_type: q8_0

TensorRT-LLM Settings

# Environment variables in thox-tensorrt.service
TRT_MAX_BATCH_SIZE=4
TRT_MAX_INPUT_LEN=4096
TRT_MAX_OUTPUT_LEN=2048
TRT_KV_CACHE_FREE_GPU_MEM_FRACTION=0.4
TRT_ENABLE_PAGED_KV_CACHE=1
TRT_ENABLE_CHUNKED_CONTEXT=1

Memory Optimization

Low Memory Mode

Use smaller context, aggressive offloading

thox config set memory_mode low

High Performance Mode

Maximize speed, use full memory

thox config set memory_mode high

Benchmarking

# Run performance benchmark

thox benchmark --model thox-coder

# Expected output:

Model: thox-coder

Prompt eval: 125 tokens/s

Generation: 45 tokens/s

Memory usage: 6.2GB

Security Settings

API Authentication

Enable API key authentication for remote access:

# Generate API key

thox auth generate-key --name "my-app"

# Enable authentication

thox config set auth.enabled true

# Use in requests

curl -H "Authorization: Bearer sk-xxx" http://thox.local:8080/v1/models

Network Access Control

# /etc/thox/security.yaml
network:
  # Bind to specific interface
  bind_address: "0.0.0.0"

  # Allowed IP ranges
  allowed_ips:
    - "192.168.1.0/24"
    - "10.0.0.0/8"

  # Rate limiting
  rate_limit:
    requests_per_minute: 60
    tokens_per_minute: 100000

TLS/HTTPS

Enable HTTPS for secure connections:

# Generate self-signed certificate

thox tls generate --hostname thox.local

# Or use existing certificate

thox tls import --cert /path/to/cert.pem --key /path/to/key.pem

# Enable TLS

thox config set tls.enabled true

Security Note: When exposing your device to the internet, always enable authentication, use HTTPS, and configure firewall rules to restrict access.

Backup & Restore

Creating Backups

# Full backup (config + models)

thox backup create --output /path/to/backup.tar.gz

# Config only backup

thox backup create --config-only --output /path/to/config-backup.tar.gz

# Automatic scheduled backup

thox backup schedule --daily --keep 7 --output /backups/

Restoring from Backup

# Full restore

thox backup restore /path/to/backup.tar.gz

# Restore config only

thox backup restore --config-only /path/to/backup.tar.gz

What's Included

Configuration

  • • Device settings
  • • Model configurations
  • • Network settings
  • • API keys

Data

  • • Installed models
  • • Custom prompts
  • • Chat history (optional)
  • • Usage statistics

Factory Reset

# Reset to factory defaults (keeps models)

thox system reset --keep-models

# Full factory reset

thox system reset --full

Warning: Factory reset is irreversible. Always create a backup before performing a reset.

CONFIDENTIAL AND PROPRIETARY INFORMATION

This documentation is provided for informational and operational purposes only. The specifications and technical details herein are subject to change without notice. Thox.ai LLC reserves all rights in the technologies, methods, and implementations described.

Nothing in this documentation shall be construed as granting any license or right to use any patent, trademark, trade secret, or other intellectual property right of Thox.ai LLC, except as expressly provided in a written agreement.

Patent Protection

The MagStack™ magnetic stacking interface technology, including the magnetic alignment system, automatic cluster formation, NFC-based device discovery, and distributed inference me...

Reverse Engineering Prohibited

You may not reverse engineer, disassemble, decompile, decode, or otherwise attempt to derive the source code, algorithms, data structures, or underlying ideas of any Thox.ai hardwa...

Thox.ai™, Thox OS™, MagStack™, and the Thox.ai logo are trademarks or registered trademarks of Thox.ai LLC in the United States and other countries.

NVIDIA, Jetson, TensorRT, and related marks are trademarks of NVIDIA Corporation. Ollama is a trademark of Ollama, Inc. All other trademarks are the property of their respective owners.

© 2026 Thox.ai LLC. All Rights Reserved.