AI Features Setup Guide#

Piler's AI features provide intelligent email summarization, conversational search, and thread analysis using a local LLM (Large Language Model). This guide walks you through setting up the AI infrastructure.

Overview#

AI Features Available:

✨ Email Summarization Instant AI-generated summaries
🔍 Conversational Search Natural language queries
📋 Thread Summarization Structured thread intelligence

Key Advantage: 100% on-premise. Your data NEVER leaves your infrastructure.

Prerequisites#

Hardware Requirements#

Deployment Size	Minimum	Recommended	Notes
Small (100-500 users)	4 CPU cores, 8GB RAM	RTX 3060 (12GB VRAM)	CPU-only works but slow (10-30s per query)
Medium (500-5K users)	8 CPU cores, 16GB RAM	RTX 4090 (24GB VRAM)	GPU highly recommended (2-5s per query)
Large (5K+ users)	16 CPU cores, 32GB RAM	A100 (40GB VRAM)	GPU required for acceptable performance

GPU strongly recommended for production (20-50x faster than CPU).

Software Requirements#

Linux: Ubuntu 22.04+, Debian 12+, RHEL 9+, or similar
macOS: macOS 12+ (Apple Silicon or Intel)
Docker: (optional) For containerized deployment

Installation Methods#

Method 1: Ollama (Recommended) ⭐#

Why Ollama:

✅ Easiest installation (one command)
✅ Automatic GPU detection
✅ Model management built-in
✅ Works on Linux, macOS, Windows
✅ Production-ready
✅ Free and open source

Install Ollama#

Linux/macOS:

# One-line install
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version
# Should show: ollama version X.X.X

Or with Docker:

docker pull ollama/ollama:latest

# GPU support (NVIDIA)
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# CPU only
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Pull the LLM Model#

Recommended model: Llama 3.1 (8B)

# Pull model (will download ~4.7GB)
ollama pull llama3.1:8b

# Verify model is available
ollama list
# Should show: llama3.1:8b ... 4.7 GB

Alternative models:

Model	Size	VRAM	Speed	Quality	Use Case
llama3.1:8b	4.7GB	6GB	Fast	Excellent	Recommended
llama3.1:70b	40GB	48GB	Slow	Best	Large deployments only
mistral:7b	4.1GB	5GB	Very fast	Good	Budget/CPU setups
qwen2.5:7b	4.7GB	6GB	Fast	Excellent	Multilingual
phi3:mini	2.3GB	3GB	Very fast	Good	Small/demo setups

Test Ollama#

# Test generation
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Summarize: This is a test email about quarterly budget approval.",
  "stream": false
}'

# Should return JSON with summary

Start Ollama as Service#

Linux (systemd):

# Ollama installer creates service automatically
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

macOS (launchd):

# Ollama runs as service automatically after install
# Check status:
ps aux | grep ollama

Docker:

# Already running if you used docker run -d
docker ps | grep ollama

Method 2: LM Studio (Alternative)#

Why LM Studio:

GUI for model management
Good for macOS/Windows users
Easy testing and experimentation

Install:

Download from https://lmstudio.ai/
Install and open LM Studio
Download a model (llama3.1:8b recommended)
Start local server on port 11434

Configure Piler:

LLM_BASE_URL=http://localhost:1234  # LM Studio default port
LLM_MODEL=llama3.1:8b

Method 3: Custom LLM Server (Advanced)#

For organizations with specific requirements:

vLLM (High performance):

pip install vllm
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 11434

Ollama on remote server:

# On GPU server
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# On Piler server
LLM_BASE_URL=http://gpu-server.internal:11434

Piler Configuration#

1. Enable AI Features#

Edit your Piler .env file:

# Enable AI features
LLM_ENABLED=true

# Ollama connection
LLM_BASE_URL=http://localhost:11434
LLM_MODEL=llama3.1:8b

# Cache settings (optional tuning)
LLM_CACHE_EXPIRY_MINUTES=15

# Advanced (optional)
NL2QUERY_PROMPT_FILE=  # Empty = use embedded default

2. Restart Piler#

# Systemd
sudo systemctl restart piler

# Docker
docker-compose restart piler-ui

# Manual
pkill piler-ui
./piler-ui

3. Verify AI Features#

Check LLM connectivity:

curl http://localhost:3000/api/v1/llm/ping \
  -H "Authorization: Bearer YOUR_TOKEN"

# Should return:
{"status":"ok","model":"llama3.1:8b"}

In the UI:

Log in to Piler
View any email
Click "AI Tools" dropdown
Click "AI Summary"
Should see summary within 2-5 seconds ✅

Network Setup#

Same Server (Simplest)#

Piler and Ollama on same machine:

┌─────────────────────────────────┐
│  Server                         │
│  ┌──────────┐    ┌───────────┐  │
│  │ Piler UI │───→│  Ollama   │  │
│  │ :3000    │    │  :11434   │  │
│  └──────────┘    └───────────┘  │
└─────────────────────────────────┘

Config:
LLM_BASE_URL=http://localhost:11434

Firewall: No changes needed (local connection)

Separate LLM Server (Recommended for Production)#

Piler on one server, Ollama on GPU server:

┌─────────────┐         ┌──────────────────-┐
│ Piler Server│         │  GPU Server       │
│             │         │                   │
│  Piler UI   │────────→│  Ollama :11434    │
│  :3000      │ Network │  (GPU-accelerated)│
└─────────────┘         └───────────────── ─┘

Config:
LLM_BASE_URL=http://gpu-server.internal:11434

Firewall rules:

# On GPU server
sudo ufw allow from PILER_SERVER_IP to any port 11434

# Or open to network (if trusted)
sudo ufw allow 11434/tcp

Ollama config (GPU server):

# Allow network access
export OLLAMA_HOST=0.0.0.0:11434

# Start Ollama
ollama serve

Docker Compose Setup#

Full stack with Ollama:

version: '3.8'

services:
  piler-ui:
    image: sutoj/piler-ui:latest
    environment:
      LLM_ENABLED: "true"
      LLM_BASE_URL: "http://ollama:11434"
      LLM_MODEL: "llama3.1:8b"
    depends_on:
      - ollama
    networks:
      - piler-network

  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama-data:/root/.ollama
    ports:
      - "11434:11434"
    # GPU support (uncomment if you have NVIDIA GPU)
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]
    networks:
      - piler-network

networks:
  piler-network:
    driver: bridge

volumes:
  ollama-data:

Pull model:

docker-compose up -d
docker exec -it <ollama-container> ollama pull llama3.1:8b

Performance Tuning#

GPU Acceleration (NVIDIA)#

Verify GPU is detected:

# Check NVIDIA driver
nvidia-smi

# Ollama should detect GPU automatically
# Check logs:
journalctl -u ollama -f | grep -i gpu
# Should show: "Using GPU: NVIDIA GeForce RTX 4090"

If GPU not detected:

# Install CUDA toolkit
# Ubuntu/Debian:
sudo apt install nvidia-cuda-toolkit

# Restart Ollama
sudo systemctl restart ollama

CPU-Only Optimization#

If running without GPU:

# Use smaller model for faster responses
ollama pull phi3:mini  # 2.3GB, faster on CPU

# Update Piler config
LLM_MODEL=phi3:mini

Tune thread count:

# Set CPU threads (default: auto)
export OLLAMA_NUM_THREADS=8  # Match your CPU cores

ollama serve

Memory Management#

Limit model memory:

# Unload models after 5 minutes of inactivity (default)
export OLLAMA_KEEP_ALIVE=5m

# Or keep loaded always (faster but uses RAM)
export OLLAMA_KEEP_ALIVE=-1

ollama serve

Troubleshooting#

"LLM service not configured" Error#

Cause: Piler can't reach Ollama

Solutions:

Check Ollama is running:

curl http://localhost:11434/api/tags
# Should return list of models

Check Piler config:

grep LLM_BASE_URL .env
# Should match Ollama address

Check firewall (if separate servers):

# From Piler server
curl http://gpu-server:11434/api/tags
# Should connect

Slow AI Responses (>10 seconds)#

Cause: Running on CPU without GPU

Solutions:

Add GPU: Install NVIDIA GPU, verify with nvidia-smi
Use smaller model:

ollama pull phi3:mini
# Update LLM_MODEL=phi3:mini in .env

Upgrade hardware:
Minimum: 8 CPU cores, 16GB RAM
Recommended: RTX 3060 or better

"Model not found" Error#

Cause: Model not pulled

Solution:

# Pull the model
ollama pull llama3.1:8b

# Verify
ollama list

Out of Memory (OOM) Errors#

Cause: Model too large for available VRAM/RAM

Solutions:

Use smaller model:

ollama pull llama3.1:8b  # Instead of 70b

Increase swap (Linux):

sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Reduce concurrent requests:

# Piler handles this automatically with rate limiting
# No config needed

Connection Timeout#

Cause: LLM taking too long to respond

Solution: Piler has built-in timeouts:

Email summary: 60 seconds
Thread summary: 120 seconds
Search translation: 10 seconds

These are reasonable. If hitting timeouts, your LLM is too slow (needs GPU).

Security Considerations#

Network Security#

Ollama has NO authentication!

If running on separate server:

# Option 1: Firewall (recommended)
sudo ufw allow from PILER_IP to any port 11434
sudo ufw deny 11434  # Block others

# Option 2: VPN/Private network
# Run Ollama on private network only

# Option 3: Reverse proxy with auth (advanced)
# Use nginx with basic auth in front of Ollama

Never expose Ollama directly to the internet!

Data Privacy#

What gets sent to Ollama:

Email subject
Email body (truncated to 50KB for summaries)
Thread messages (up to 50 messages)

What NEVER leaves your server:

Nothing! Ollama runs locally
No external API calls
No telemetry (unless you enable it)

Compliance:

✅ GDPR compliant (data stays on-premise)
✅ HIPAA compatible (no third-party processing)
✅ SOC 2 friendly (complete audit trail)

Production Deployment#

Recommended Architecture#

For small deployments (<500 users):

┌────────────────────────────┐
│  Single Server             │
│  • Piler UI                │
│  • Ollama (CPU/small GPU)  │
│  • MySQL                   │
│  • Manticore               │
└────────────────────────────┘

Hardware: 8 cores, 16GB RAM, optional RTX 3060

For medium deployments (500-5K users):

┌─────────────┐      ┌──────────────┐
│ Piler Server│      │  GPU Server  │
│ • UI        │─────→│  • Ollama    │
│ • MySQL     │      │  • RTX 4090  │
│ • Manticore │      └──────────────┘
└─────────────┘

For large deployments (5K+ users):

┌─────────────┐      ┌────────────────────┐
│ Piler Master│      │  GPU Cluster       │
│ • UI        │─────→│  • Ollama (node 1) │
│             │      │  • Ollama (node 2) │
└─────────────┘      └────────────────────┘
         │
         ├──→ Worker 1 (emails)
         └──→ Worker 2 (emails)

Use load balancer for Ollama nodes

High Availability#

Option 1: Multiple Ollama instances

# Run Ollama on 2+ servers
# Use nginx/HAProxy load balancer

# Piler config:
LLM_BASE_URL=http://llm-loadbalancer:11434

Option 2: Failover

# Primary Ollama on GPU server
# Backup Ollama on CPU (slower but works)

# Application handles failover automatically via retries

Cost Analysis#

On-Premise (Ollama)#

One-time costs:

RTX 4090 GPU: $1,600-2,000
Server: $2,000-4,000 (if needed)
Setup: $500-1,000 (engineering time)

Ongoing:

Power: ~$30-50/month (300W GPU)
Maintenance: Minimal

Total first year: ~$4,000-7,000

Unlimited queries after initial investment.

Cloud API (Alternative - Not Recommended)#

OpenAI/Anthropic pricing:

~$0.01-0.05 per summary
1,000 summaries/day = $10-50/day = $3,650-18,250/year
10,000 summaries/day = $36,500-182,500/year

Drawbacks:

❌ Data leaves your infrastructure (compliance risk)
❌ Ongoing per-query costs
❌ Vendor lock-in
❌ Potential downtime (external dependency)

Recommendation: On-premise Ollama for enterprise. Cloud APIs only for trials/demos.

Configuration Reference#

All Available Options#

# Core AI Settings
LLM_ENABLED=true|false
# Enable/disable all AI features
# Default: false

LLM_BASE_URL=http://localhost:11434
# Ollama/LLM server URL
# Default: http://localhost:11434

LLM_MODEL=llama3.1:8b
# Model name (must be pulled in Ollama)
# Default: llama3.1:8b
# Alternatives: mistral:7b, qwen2.5:7b, phi3:mini

LLM_CACHE_EXPIRY_MINUTES=15
# Redis cache TTL for summaries
# Default: 15 minutes
# Threads cached for 60 minutes (hardcoded)

NL2QUERY_PROMPT_FILE=
# Custom prompt template for conversational search
# Empty = use embedded default
# Example: /etc/piler/prompts/custom_nl2query.txt

Multi-Tenant Configuration#

Same LLM for all tenants:

# Global config in .env
LLM_ENABLED=true
LLM_BASE_URL=http://localhost:11434

Per-tenant enable/disable:

-- In tenant_settings table
UPDATE piler.tenant_settings
SET settings_json = JSON_SET(settings_json, '$.llm_enabled', true)
WHERE tenant_id = 'tenant1';

Customizing AI Behavior#

Default Conversational Search Prompt#

The conversational search uses this prompt template (embedded in binary):

Location: internal/llm/prompts/nl2query.txt

Click to view full default prompt template

You are a search query translator for an email archiving system. Convert natural language questions into Piler search syntax.

SEARCH SYNTAX:
- from:email@domain.com - Filter by sender
- to:email@domain.com - Filter by recipient
- subject:keyword - Search in subject (can include multiple words: subject:word1 word2)
- subject:"exact phrase" - Search for exact phrase in subject
- body:keyword - Search in body text (can include multiple words: body:word1 word2)
- body:"exact phrase" - Search for exact phrase in body
- date1:YYYY-MM-DD - Start date (use with date2 for range)
- date2:YYYY-MM-DD - End date (use with date1 for range)
- a:any - Has any attachment
- a:pdf - Has PDF attachments ONLY
- a:word - Has Word document attachments ONLY
- a:excel - Has Excel spreadsheet attachments ONLY
- a:image - Has image attachments ONLY (jpg, png, gif, etc. - use a:image, NOT a:jpg)
- a:zip - Has zip/archive attachments ONLY
- attachment:filename.pdf - Has specific attachment filename
- size:>5M - Size greater than 5MB (use M for megabytes, K for kilobytes)
- direction:inbound - Received emails (also: direction:outbound, direction:internal)
- category:name - Filter by category (short form: cat:name)
- tag:tagname - Search by tag
- note:text - Search in notes
- id:123 - Specific message ID

OPERATORS:
- Multiple terms separated by spaces are implicitly AND (no parentheses needed)
- OR - Either term (use uppercase: urgent OR important)
- NOT - Exclude term (use uppercase: NOT spam)
- "exact phrase" - Phrase search in quotes
- * - Wildcard (john* matches john, johnny, johns)
- ( ) - ONLY use parentheses for grouping OR/NOT operations, NOT for simple AND queries

IMPORTANT NOTES:
- For date ranges, use date1: and date2: together (NOT date:X..Y)
- For "has any attachment" use a:any (NOT has:attachment)
- Valid attachment types: a:pdf, a:word, a:excel, a:image, a:zip, a:any
- For images, ALWAYS use a:image (NOT a:jpg, a:png, etc.)
- For size, use M for megabytes, K for kilobytes (e.g., size:>5M NOT size:>5MB)
- Keep queries SIMPLE - only include meaningful search terms, skip filler words
- Skip generic words like "with", "about", "problems", "issues", "emails", "messages"
- Focus on specific entities: names, subjects, dates, attachment types
- Example: "virtualfax problems with pdf attachments" → "subject:virtualfax a:pdf" (skip "problems with")
- There is NO is:unread filter (this is an archive, not a mailbox)
- Direction must be: inbound, outbound, or internal

RELATIVE DATES (calculate from today: {{CURRENT_DATE}}):
- "today" → date1:{{TODAY}} date2:{{TODAY}}
- "yesterday" → date1:{{YESTERDAY}} date2:{{YESTERDAY}}
- "last week" → date1:{{LAST_WEEK}} (7 days ago)
- "last month" → date1:{{LAST_MONTH}} (30 days ago)
- "last quarter" → date1:{{LAST_QUARTER}} (90 days ago)
- "this month" → date1:{{MONTH_START}} date2:{{TODAY}}
- "this year" → date1:{{YEAR_START}}

EXAMPLES:
Input: "emails from sarah last week"
Output: {"query":"from:sarah date1:{{LAST_WEEK}}","explanation":"Emails from Sarah sent since {{LAST_WEEK}}","confidence":0.95}

Input: "urgent messages about project"
Output: {"query":"subject:urgent subject:project","explanation":"Messages about urgent project","confidence":0.90}

Input: "employment invitation"
Output: {"query":"subject:employment invitation","explanation":"Emails about employment invitation","confidence":0.92}

Input: "virtualfax problems with pdf attachments"
Output: {"query":"subject:virtualfax a:pdf","explanation":"Virtualfax emails with PDF attachments","confidence":0.93}

Input: "large PDF attachments from vendors"
Output: {"query":"from:*vendor* size:>5M a:pdf","explanation":"PDF attachments larger than 5MB from vendor domains","confidence":0.88}

Input: "images from operator"
Output: {"query":"a:image from:operator","explanation":"Image attachments from operator","confidence":0.95}

Input: "invoices in December"
Output: {"query":"subject:invoice date1:2024-12-01 date2:2024-12-31","explanation":"Emails with 'invoice' in subject from December 2024","confidence":0.92}

Input: "emails with any attachments from john"
Output: {"query":"from:john a:any","explanation":"Emails from John that have attachments","confidence":0.95}

Input: "emails about budget OR finance from sarah"
Output: {"query":"from:sarah (subject:budget OR subject:finance)","explanation":"Emails from Sarah about budget or finance","confidence":0.93}

Input: "attachments sent in the last 5 years"
Output: {"query":"date1:2020-01-20 a:any","explanation":"Emails with attachments since January 2020","confidence":0.94}

CONTEXT-AWARE EXAMPLES (showing how to handle follow-ups):

Context: Previous query was "virtualfax problems with pdf attachments" → "subject:virtualfax a:pdf"
Input: "after 2015-10-21"
Output: {"query":"subject:virtualfax a:pdf date1:2015-10-21","explanation":"Virtualfax emails with PDF attachments after October 21, 2015","confidence":0.96}

Context: Previous query was "emails from john" → "from:john"
Input: "just PDFs"
Output: {"query":"from:john a:pdf","explanation":"PDF emails from John","confidence":0.95}

Context: Previous query was "invoices last month" → "subject:invoice date1:2024-12-20"
Input: "over $10,000"
Output: {"query":"subject:invoice date1:2024-12-20 body:$10,000 OR body:10000","explanation":"Invoices from last month over $10,000","confidence":0.88}

{{CONVERSATION_CONTEXT}}

IMPORTANT RULES:
1. MUST return valid JSON with exactly these fields: query, explanation, confidence
2. If there is CONTEXT from previous queries, you MUST include those search terms in the new query
3. For follow-up/refinement queries (like "just PDFs" or "after 2020"), ALWAYS combine with previous context
4. Use confidence < 0.7 if the question is ambiguous
5. For ambiguous queries, explain what clarification is needed in the explanation field
6. Calculate today's date from context (assume current date: {{CURRENT_DATE}})
7. NEVER change dates provided by the user - if user says "2015" use 2015, NOT 2025 (this is an archive with old emails)
8. Use EXACT dates from user input - do not assume typos or correct dates
9. Always escape special characters in email addresses
10. Use wildcards (*) for partial matches when appropriate
11. Prefer (subject:X OR body:X) over just subject:X unless explicitly about subject only

RESPONSE FORMAT (copy this structure exactly):
{
  "query": "your translated search query here",
  "explanation": "brief explanation of what you're searching for",
  "confidence": 0.95
}

Respond with ONLY the JSON object above. No markdown, no code blocks, no extra text.

Customizing the Prompt#

Why customize:

Add industry-specific examples (legal, healthcare, finance)
Improve accuracy for your organization's terminology
Adjust for different LLM models

How to customize:

Create custom prompt file:

# Copy embedded template (extract from binary or docs)
cat > /etc/piler/custom_nl2query.txt << 'EOF'
[Paste default template above]

# Add your custom examples:
Input: "discovery documents from opposing counsel"
Output: {"query":"from:*@opposing-firm.com category:discovery","explanation":"...","confidence":0.94}

Input: "patient records with consent forms"
Output: {"query":"subject:patient subject:consent a:pdf","explanation":"...","confidence":0.93}
EOF

Point Piler to custom prompt:

# In .env
NL2QUERY_PROMPT_FILE=/etc/piler/custom_nl2query.txt

Restart Piler:

sudo systemctl restart piler-ui

Template variables available:

{{CURRENT_DATE}} Today's date (auto-calculated)
{{LAST_WEEK}}, {{LAST_MONTH}}, {{LAST_QUARTER}} Relative dates
{{CONVERSATION_CONTEXT}} Previous queries (auto-injected)

Testing your prompt:

Try queries in the UI and check if translations match expectations. Iterate based on real usage patterns.

Monitoring#

Check AI Feature Health#

1. Test LLM connectivity:

curl http://localhost:3000/api/v1/llm/ping \
  -H "Cookie: session=YOUR_SESSION"

# Should return:
{"status":"ok","model":"llama3.1:8b"}

2. Monitor Ollama:

# Check running models
curl http://localhost:11434/api/tags

# Monitor logs
journalctl -u ollama -f

# Check resource usage
htop  # Watch CPU/RAM
nvidia-smi -l 1  # Watch GPU (if NVIDIA)

3. Check Piler logs:

tail -f /var/log/piler/app.log | grep -i llm

# Look for:
# "LLM summarization failed" - problems
# "Cache hit for email summary" - working well

Performance Metrics#

Target metrics:

Email summary: <5 seconds (P95)
Thread summary: <30 seconds for 20-message thread (P95)
Search translation: <3 seconds (P95)
LLM uptime: >99%

If slower:

Add GPU
Use smaller model
Check network latency (if separate servers)

Upgrading#

Update Ollama#

# Linux/macOS
curl -fsSL https://ollama.com/install.sh | sh

# Docker
docker pull ollama/ollama:latest
docker-compose up -d

Update Model#

# Pull newer version
ollama pull llama3.1:8b

# Old version removed automatically if space needed

Piler Updates#

AI features are part of Piler UI - update Piler as normal:

# Binary update
sudo systemctl stop piler-ui
sudo cp new-piler-ui /var/piler/ui/app
sudo systemctl start piler

# Docker update
docker-compose pull sutoj/piler-ui:2.1.0
docker-compose up -d

FAQ#

Q: Do I need a GPU?#

A: Highly recommended but not required.

Without GPU: 10-30 seconds per summary (acceptable for light use)
With GPU: 2-5 seconds per summary (production quality)

Q: Can I use GPT-4/Claude instead of Ollama?#

A: Technically yes, but not recommended:

❌ Data leaves your infrastructure
❌ Ongoing costs
❌ Requires code changes (different API format)
❌ Compliance risks

Ollama is designed for on-premise use.

Q: How much disk space for models?#

A: ~5-10GB per model

llama3.1:8b: 4.7GB
Keep 2-3 models for testing: ~15GB
Models stored in ~/.ollama/models/

A: Yes! Ollama handles concurrent requests.

Single RTX 4090: ~10-20 concurrent requests
Add more GPUs for higher concurrency

Q: What if Ollama crashes?#

A: AI features gracefully degrade:

User gets "LLM service unavailable" error
Can still use traditional search
Cached summaries still served (if available)
No impact on email viewing/searching

Ollama auto-restarts via systemd.

Q: Can I customize AI prompts?#

A: Yes! See AI Prompt Customization Guide

Q: Does this work offline/air-gapped?#

A: Yes!

Download Ollama installer on internet-connected machine
Pull model: ollama pull llama3.1:8b
Copy model files to air-gapped server
Works completely offline

Support#

Getting Help#

Check logs: journalctl -u piler-ui and journalctl -u ollama
Test Ollama directly: curl http://localhost:11434/api/tags
Verify config: grep LLM .env
Contact support: support@mailpiler.com

Reporting Issues#

Include:

Piler version: ./piler-ui --version
Ollama version: ollama --version
Model: ollama list
Error logs (last 50 lines)
Hardware: CPU, RAM, GPU (if any)

Next Steps#

✅ Install Ollama
✅ Pull llama3.1:8b model
✅ Configure Piler (LLM_ENABLED=true)
✅ Restart Piler
✅ Test AI Summary feature
✅ Train users on AI features

See also:

AI Features User Guide - How to use AI features
Configuration Options - All Piler settings

Last Update: November 22, 2025

Piler Version: 2.1.0+

Status: Production Ready

AI Features Setup Guide#

Overview#

Prerequisites#

Hardware Requirements#

Software Requirements#

Installation Methods#

Method 1: Ollama (Recommended) ⭐#

Install Ollama#

Pull the LLM Model#

Test Ollama#

Start Ollama as Service#

Method 2: LM Studio (Alternative)#

Method 3: Custom LLM Server (Advanced)#

Piler Configuration#

1. Enable AI Features#

2. Restart Piler#

3. Verify AI Features#

Network Setup#

Same Server (Simplest)#

Separate LLM Server (Recommended for Production)#

Docker Compose Setup#

Performance Tuning#

GPU Acceleration (NVIDIA)#

CPU-Only Optimization#

Memory Management#

Troubleshooting#

"LLM service not configured" Error#

Slow AI Responses (>10 seconds)#

"Model not found" Error#

Out of Memory (OOM) Errors#

Connection Timeout#

Security Considerations#

Network Security#

Data Privacy#

Production Deployment#

Recommended Architecture#

High Availability#

Cost Analysis#

On-Premise (Ollama)#

Cloud API (Alternative - Not Recommended)#

Configuration Reference#

All Available Options#

Multi-Tenant Configuration#

Customizing AI Behavior#

Default Conversational Search Prompt#

Customizing the Prompt#

Monitoring#

Check AI Feature Health#

Performance Metrics#

Upgrading#

Update Ollama#

Update Model#

Piler Updates#

FAQ#

Q: Do I need a GPU?#

Q: Can I use GPT-4/Claude instead of Ollama?#

Q: How much disk space for models?#

Q: Can multiple Piler instances share one Ollama?#

Q: What if Ollama crashes?#

Q: Can I customize AI prompts?#

Q: Does this work offline/air-gapped?#

Support#

Getting Help#

Reporting Issues#

Next Steps#