Features & Usage

Learn how to make the most of your Thox.ai device's capabilities.

Popular Guides

AI-powered code completion

Get intelligent suggestions as you type.

How it works

Thox.ai analyzes your code context in real-time to provide relevant completions. It considers your current file, open files, and project structure to suggest accurate code.

Triggering completions

Completions appear automatically as you type. Press Tab to accept, Escape to dismiss. In VS Code, you can also use Ctrl+Space to manually trigger suggestions.

Multi-line completions

For longer suggestions, Thox.ai can complete entire functions or code blocks. These appear with a preview showing what will be inserted.

Language support

Best results with Python, JavaScript, TypeScript, Go, Rust, Java, and C++. Other languages are supported but may have reduced accuracy.

Customization

Adjust completion behavior in settings: delay before suggestions, maximum suggestion length, and languages to enable/disable.

Choosing the right model

Select optimal models for your use case.

thox-coder (default)

Optimized for code completion and generation. 7B parameters, balanced speed and quality. Best for most development workflows.

thox-coder-fast

3B parameter variant for faster responses. Ideal for quick completions and lower latency requirements. Slightly reduced quality on complex tasks.

codestral

Mistral's code-focused model. Excellent for code review and refactoring suggestions. 22B parameters, requires more memory.

llama3-8b

General-purpose model good for documentation and explanations. Also handles code but optimized models perform better for pure coding tasks.

Switching models

Change active model via web interface (/admin/models) or CLI: "thox models switch [name]". Changes take effect for new requests immediately.

Interactive chat and Q&A

Ask questions and get explanations.

Accessing chat

Use the web interface at /chat or IDE extensions' chat panel. Send questions about code, ask for explanations, or request help with debugging.

Context-aware responses

The chat understands your codebase. Reference files with @filename and it will include them in context. Ask about specific functions or classes.

Code generation

Request new code: "Write a function that validates email addresses" and receive complete, ready-to-use code blocks.

Conversation history

Chat maintains context within a session. Follow up on previous responses without repeating context. Start a new session to reset.

System prompts

Customize behavior with system prompts in settings. Define coding style preferences, language preferences, or specialized instructions.

Context and project understanding

How Thox.ai understands your codebase.

Automatic indexing

On first connection, Thox.ai indexes your project structure. This enables smart completions that reference other files and understand project layout.

Context window

The model can process thousands of tokens of context. It automatically selects relevant code from open files, imports, and related files.

Project configuration

Add a .thoxignore file to exclude files from indexing (similar to .gitignore). Exclude build directories, node_modules, and large binary files.

Re-indexing

Trigger manual re-index after major project changes: "thox index refresh" or via web interface at /admin/index.

API and integrations

Integrate Thox.ai with your tools.

OpenAI-compatible API

Thox.ai exposes an OpenAI-compatible API at /v1. Use existing OpenAI client libraries by pointing them to your Thox.ai device.

Endpoints

/v1/completions for text completion, /v1/chat/completions for chat, /v1/embeddings for vector embeddings. Full API reference at /docs/api-reference.

Authentication

Generate API keys in /admin/api-keys. Pass via Authorization header: "Bearer your-api-key". Keys can have scopes and rate limits.

Rate limits

Default 60 requests/minute, 100k tokens/hour. Adjust per-key limits in admin. Local network requests can be exempted from limits.

Webhooks

Configure webhooks in /admin/webhooks to receive notifications on completion events, errors, or model changes.

Getting the best performance

Optimize speed and quality of responses.

Use Ethernet

Wired connections provide the lowest latency. Wi-Fi adds 20-50ms per request. For real-time completions, Ethernet is strongly recommended.

Model sizing

Smaller models respond faster. Use thox-coder-fast for quick completions during active coding, larger models for complex generation or review.

Batch requests

When generating multiple completions, use streaming or batch endpoints. This is more efficient than individual requests.

Reduce context

Smaller context windows process faster. Close unneeded files, use specific file references instead of project-wide context when possible.

Thermal management

Keep the device cool for sustained performance. Heavy workloads cause thermal throttling. Allow cool-down periods during intensive sessions.

Explore More