Features & Usage
Learn how to make the most of your Thox.ai device's capabilities.
Popular Guides
AI-powered code completion
Get intelligent suggestions as you type.
How it works
Thox.ai analyzes your code context in real-time to provide relevant completions. It considers your current file, open files, and project structure to suggest accurate code.
Triggering completions
Completions appear automatically as you type. Press Tab to accept, Escape to dismiss. In VS Code, you can also use Ctrl+Space to manually trigger suggestions.
Multi-line completions
For longer suggestions, Thox.ai can complete entire functions or code blocks. These appear with a preview showing what will be inserted.
Language support
Best results with Python, JavaScript, TypeScript, Go, Rust, Java, and C++. Other languages are supported but may have reduced accuracy.
Customization
Adjust completion behavior in settings: delay before suggestions, maximum suggestion length, and languages to enable/disable.
Choosing the right model
Select optimal models for your use case.
thox-coder (default)
Optimized for code completion and generation. 7B parameters, balanced speed and quality. Best for most development workflows.
thox-coder-fast
3B parameter variant for faster responses. Ideal for quick completions and lower latency requirements. Slightly reduced quality on complex tasks.
codestral
Mistral's code-focused model. Excellent for code review and refactoring suggestions. 22B parameters, requires more memory.
llama3-8b
General-purpose model good for documentation and explanations. Also handles code but optimized models perform better for pure coding tasks.
Switching models
Change active model via web interface (/admin/models) or CLI: "thox models switch [name]". Changes take effect for new requests immediately.
Interactive chat and Q&A
Ask questions and get explanations.
Accessing chat
Use the web interface at /chat or IDE extensions' chat panel. Send questions about code, ask for explanations, or request help with debugging.
Context-aware responses
The chat understands your codebase. Reference files with @filename and it will include them in context. Ask about specific functions or classes.
Code generation
Request new code: "Write a function that validates email addresses" and receive complete, ready-to-use code blocks.
Conversation history
Chat maintains context within a session. Follow up on previous responses without repeating context. Start a new session to reset.
System prompts
Customize behavior with system prompts in settings. Define coding style preferences, language preferences, or specialized instructions.
Context and project understanding
How Thox.ai understands your codebase.
Automatic indexing
On first connection, Thox.ai indexes your project structure. This enables smart completions that reference other files and understand project layout.
Context window
The model can process thousands of tokens of context. It automatically selects relevant code from open files, imports, and related files.
Project configuration
Add a .thoxignore file to exclude files from indexing (similar to .gitignore). Exclude build directories, node_modules, and large binary files.
Re-indexing
Trigger manual re-index after major project changes: "thox index refresh" or via web interface at /admin/index.
API and integrations
Integrate Thox.ai with your tools.
OpenAI-compatible API
Thox.ai exposes an OpenAI-compatible API at /v1. Use existing OpenAI client libraries by pointing them to your Thox.ai device.
Endpoints
/v1/completions for text completion, /v1/chat/completions for chat, /v1/embeddings for vector embeddings. Full API reference at /docs/api-reference.
Authentication
Generate API keys in /admin/api-keys. Pass via Authorization header: "Bearer your-api-key". Keys can have scopes and rate limits.
Rate limits
Default 60 requests/minute, 100k tokens/hour. Adjust per-key limits in admin. Local network requests can be exempted from limits.
Webhooks
Configure webhooks in /admin/webhooks to receive notifications on completion events, errors, or model changes.
Getting the best performance
Optimize speed and quality of responses.
Use Ethernet
Wired connections provide the lowest latency. Wi-Fi adds 20-50ms per request. For real-time completions, Ethernet is strongly recommended.
Model sizing
Smaller models respond faster. Use thox-coder-fast for quick completions during active coding, larger models for complex generation or review.
Batch requests
When generating multiple completions, use streaming or batch endpoints. This is more efficient than individual requests.
Reduce context
Smaller context windows process faster. Close unneeded files, use specific file references instead of project-wide context when possible.
Thermal management
Keep the device cool for sustained performance. Heavy workloads cause thermal throttling. Allow cool-down periods during intensive sessions.