Skip to content

Self-Hosted Models#

The LIT Platform allows you to run large language models (LLMs) directly on your own infrastructure, ensuring data privacy, reducing latency, and eliminating dependency on external API services.

Benefits of Self-Hosted Models#

  • Data Privacy: All data stays within your environment
  • Cost Control: No per-token charges or subscription fees
  • Customization: Fine-tune models for your specific use cases
  • Offline Operation: Run models without internet connectivity
  • Predictable Performance: Consistent response times

Supported Models#

The LIT Platform supports a wide range of open-source language models including:

  • Llama 3 (8B, 70B)
  • Qwen 2 (7B, 72B)
  • Mistral (7B)
  • Gemma (7B, 27B)
  • Phi-3 (mini, small)
  • DeepSeek (7B, 67B)
  • Falcon (7B, 40B)
  • And many more...

Hardware Requirements#

The hardware requirements depend on the model size:

Model Size Minimum RAM Recommended GPU Approximate Speed
7-8B 16GB 8GB VRAM 15-30 tokens/sec
13-14B 24GB 16GB VRAM 10-20 tokens/sec
30-40B 64GB 24GB VRAM 5-10 tokens/sec
65-70B 128GB 48GB VRAM 3-8 tokens/sec

Setting Up a Model#

  1. Navigate to the Models section in the LIT interface
  2. Click "Add New Model"
  3. Select from available model options or provide a custom download URL
  4. Choose hardware configuration (CPU/GPU, quantization level)
  5. Click "Download and Configure"
  6. Wait for the model to download and initialize

Quantization Options#

To run larger models on less powerful hardware, the LIT Platform offers various quantization options:

  • GGUF Format: 4-bit, 5-bit, and 8-bit quantization
  • GPTQ Format: 4-bit and 8-bit quantization with optional groupsize settings
  • AWQ Format: Advanced weight quantization for better quality/performance balance

Monitoring and Management#

The Models dashboard provides:

  • Real-time usage statistics
  • Memory consumption metrics
  • Token generation speed
  • Current model status
  • Model versioning and updates

Troubleshooting#

If you encounter issues with self-hosted models:

  1. Check system resources (memory, GPU utilization)
  2. Verify model compatibility with your hardware
  3. Adjust quantization settings for better performance
  4. Restart the model server if performance degrades
  5. Check logs for specific error messages

For models that exceed your local hardware capabilities, consider using the LIT Platform's Ollama integration to access more efficient model serving.