AI Runtime Help - Ollama & TensorRT-LLM Configuration

OpenAI-Compatible API

API Compatibility

Thox.ai provides a fully OpenAI-compatible API endpoint, allowing you to use existing OpenAI SDKs and tools with your local device. Simply point your API calls to http://thox.local:8080/v1 instead of api.openai.com.

Supported Endpoints

The following endpoints are fully supported: /v1/chat/completions, /v1/completions, /v1/models, and /v1/embeddings. All endpoints support streaming responses for real-time output.

Authentication

By default, no API key is required for local access. For remote access, generate an API key using: thox api-key generate. Use the key in your Authorization header as "Bearer YOUR_API_KEY".

Example Usage

Using the OpenAI Python SDK: client = OpenAI(base_url="http://thox.local:8080/v1", api_key="optional"). Then use client.chat.completions.create() as you normally would with OpenAI.

Rate Limits

Local requests have no rate limits. For remote access, the default limit is 100 requests per minute, configurable via thox config set api.rate_limit <value>.

Coder Models

Pre-installed Models

Your Thox.ai device comes with thox-coder pre-installed, a 7B parameter model optimized for code completion, explanation, and refactoring. It supports 50+ programming languages.

Available Models

Additional models include: thox-coder-large (13B, more capable), thox-coder-fast (3B, faster responses), and specialized models for Python, JavaScript, Rust, and Go.

Downloading Models

List available models: thox models list --remote. Download a model: thox models pull thox-coder-large. Check download progress: thox models status.

Switching Models

Set the default model: thox config set default_model thox-coder-large. Or specify per-request using the model parameter in your API calls.

Model Performance

thox-coder-fast: ~50 tokens/sec, best for quick completions. thox-coder: ~30 tokens/sec, balanced performance. thox-coder-large: ~15 tokens/sec, highest quality.

Custom Fine-tuning

Fine-tune models on your codebase: thox finetune --model thox-coder --data ./my-code --epochs 3. This creates a personalized model that understands your coding patterns.

MCP (Model Context Protocol) Support

What is MCP?

Model Context Protocol (MCP) is an open standard for connecting AI models to external tools and data sources. Thox.ai natively supports MCP, enabling rich integrations with your development workflow.

Built-in MCP Servers

Pre-configured MCP servers include: filesystem (read/write local files), git (repository operations), shell (execute commands), and browser (web fetching). Enable them via thox mcp enable <server>.

Connecting MCP Clients

Your Thox.ai device acts as an MCP server. Connect from Claude Desktop, VS Code, or any MCP-compatible client using the endpoint: http://thox.local:8080/mcp.

Custom MCP Servers

Add custom MCP servers by placing configuration in ~/.thox/mcp/servers.json. Thox.ai will automatically discover and load compatible servers on startup.

Security Considerations

MCP servers can execute code and access files. Review permissions carefully: thox mcp permissions list. Restrict access: thox mcp permissions set filesystem --read-only.

Debugging MCP

Enable MCP debug logging: thox config set mcp.debug true. View logs: thox logs --filter mcp. Test server connectivity: thox mcp test <server-name>.

Related Resources

API Reference

Complete API documentation

Model Catalog

Browse all available models