Self-Hosted Models#

The LIT Platform allows you to run large language models (LLMs) directly on your own infrastructure, ensuring data privacy, reducing latency, and eliminating dependency on external API services.

Benefits of Self-Hosted Models#

Data Privacy: All data stays within your environment
Cost Control: No per-token charges or subscription fees
Customization: Fine-tune models for your specific use cases
Offline Operation: Run models without internet connectivity
Predictable Performance: Consistent response times

Supported Models#

The LIT Platform supports a wide range of open-source language models including:

Llama 3 (8B, 70B)
Qwen 2 (7B, 72B)
Mistral (7B)
Gemma (7B, 27B)
Phi-3 (mini, small)
DeepSeek (7B, 67B)
Falcon (7B, 40B)
And many more...

Hardware Requirements#

The hardware requirements depend on the model size:

Model Size	Minimum RAM	Recommended GPU	Approximate Speed
7-8B	16GB	8GB VRAM	15-30 tokens/sec
13-14B	24GB	16GB VRAM	10-20 tokens/sec
30-40B	64GB	24GB VRAM	5-10 tokens/sec
65-70B	128GB	48GB VRAM	3-8 tokens/sec

Setting Up a Model#

Navigate to the Models section in the LIT interface
Click "Add New Model"
Select from available model options or provide a custom download URL
Choose hardware configuration (CPU/GPU, quantization level)
Click "Download and Configure"
Wait for the model to download and initialize

Quantization Options#

To run larger models on less powerful hardware, the LIT Platform offers various quantization options:

GGUF Format: 4-bit, 5-bit, and 8-bit quantization
GPTQ Format: 4-bit and 8-bit quantization with optional groupsize settings
AWQ Format: Advanced weight quantization for better quality/performance balance

Monitoring and Management#

The Models dashboard provides:

Real-time usage statistics
Memory consumption metrics
Token generation speed
Current model status
Model versioning and updates

Troubleshooting#

If you encounter issues with self-hosted models:

Check system resources (memory, GPU utilization)
Verify model compatibility with your hardware
Adjust quantization settings for better performance
Restart the model server if performance degrades
Check logs for specific error messages

For models that exceed your local hardware capabilities, consider using the LIT Platform's Ollama integration to access more efficient model serving.