Performance Optimization

Get the best speed and quality from your Thox.ai device.

Back to Articles

Thox.ai is optimized out of the box, but you can fine-tune performance for your specific workflow. This guide covers hardware, network, and software optimizations to get the fastest responses with the best quality.

Expected Performance

With default settings and the thox-coder model, you should expect:

50-100ms

First token latency

30-50

Tokens per second

<200ms

End-to-end completion

Results vary based on model size, context length, and network conditions.

Model Selection

Use the right model for the task

thox-coder-fast for quick completions, thox-coder for balanced quality, larger models for complex generation.

Consider model quantization

Quantized models (Q4, Q5) are faster and use less memory with minimal quality loss for most tasks.

Pre-load your primary model

Keep your most-used model loaded to avoid cold-start latency. Use thox models switch only when needed.

Network Configuration

Use Ethernet for lowest latency

Wired connections add ~5ms latency vs 20-50ms for Wi-Fi. Essential for real-time completions.

Optimize network path

Place the device on the same network segment as your development machine. Avoid routing through VPNs.

Use local DNS

Configure your router to resolve thox.local locally, or use the IP address directly in IDE settings.

Thermal Management

Ensure proper ventilation

2+ inches clearance on all sides. Don't stack or enclose. Place on hard, flat surface.

Monitor thermal status

Run thox thermal status to check temperatures. Throttling begins at 80°C sustained.

Consider ambient temperature

Best performance at 0-35°C (32-95°F). In warm environments, a small fan can help.

Context Optimization

Minimize context size

Close unnecessary files in your IDE. Smaller context = faster processing.

Use .thoxignore

Exclude build directories, node_modules, and large files from indexing.

Target specific files

Use @filename references in chat instead of project-wide context when possible.

Useful Commands

thox status

View overall system status and resource usage

thox thermal status

Check current temperatures and throttle state

thox models status

See loaded model and memory usage

thox benchmark

Run performance benchmark

thox cache clear

Clear inference cache to free memory

thox service restart

Restart the inference service

Advanced Tuning

Adjust Thread Count

By default, the device uses all available cores. Reduce threads if you need to reserve CPU for other tasks:

thox config set inference.threads 4

Adjust Context Length

Reduce context length for faster processing if you don't need full context:

thox config set inference.context_length 2048

Enable Flash Attention

Faster attention mechanism for compatible models (enabled by default):

thox config set inference.flash_attention true

Benchmarking Your Device

Run the built-in benchmark to measure your device's performance:

thox benchmark --full

This tests inference speed, memory bandwidth, and network latency. Results are compared to expected baselines and saved to /var/log/thox/benchmark.log.

Related Articles