Skip to content

AI Features Setup Guide#

Piler's AI features provide intelligent email summarization, conversational search, and thread analysis using a local LLM (Large Language Model). This guide walks you through setting up the AI infrastructure.


Overview#

AI Features Available:

  • ✨ Email Summarization Instant AI-generated summaries
  • πŸ” Conversational Search Natural language queries
  • πŸ“‹ Thread Summarization Structured thread intelligence

Key Advantage: 100% on-premise. Your data NEVER leaves your infrastructure.


Prerequisites#

Hardware Requirements#

Deployment Size Minimum Recommended Notes
Small (100-500 users) 4 CPU cores, 8GB RAM RTX 3060 (12GB VRAM) CPU-only works but slow (10-30s per query)
Medium (500-5K users) 8 CPU cores, 16GB RAM RTX 4090 (24GB VRAM) GPU highly recommended (2-5s per query)
Large (5K+ users) 16 CPU cores, 32GB RAM A100 (40GB VRAM) GPU required for acceptable performance

GPU strongly recommended for production (20-50x faster than CPU).

Software Requirements#

  • Linux: Ubuntu 22.04+, Debian 12+, RHEL 9+, or similar
  • macOS: macOS 12+ (Apple Silicon or Intel)
  • Docker: (optional) For containerized deployment

Installation Methods#

Why Ollama:

  • βœ… Easiest installation (one command)
  • βœ… Automatic GPU detection
  • βœ… Model management built-in
  • βœ… Works on Linux, macOS, Windows
  • βœ… Production-ready
  • βœ… Free and open source

Install Ollama#

Linux/macOS:

# One-line install
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version
# Should show: ollama version X.X.X

Or with Docker:

docker pull ollama/ollama:latest

# GPU support (NVIDIA)
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# CPU only
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Pull the LLM Model#

Recommended model: Llama 3.1 (8B)

# Pull model (will download ~4.7GB)
ollama pull llama3.1:8b

# Verify model is available
ollama list
# Should show: llama3.1:8b ... 4.7 GB

Alternative models:

Model Size VRAM Speed Quality Use Case
llama3.1:8b 4.7GB 6GB Fast Excellent Recommended
llama3.1:70b 40GB 48GB Slow Best Large deployments only
mistral:7b 4.1GB 5GB Very fast Good Budget/CPU setups
qwen2.5:7b 4.7GB 6GB Fast Excellent Multilingual
phi3:mini 2.3GB 3GB Very fast Good Small/demo setups

Test Ollama#

# Test generation
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Summarize: This is a test email about quarterly budget approval.",
  "stream": false
}'

# Should return JSON with summary

Start Ollama as Service#

Linux (systemd):

# Ollama installer creates service automatically
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

macOS (launchd):

# Ollama runs as service automatically after install
# Check status:
ps aux | grep ollama

Docker:

# Already running if you used docker run -d
docker ps | grep ollama

Method 2: LM Studio (Alternative)#

Why LM Studio:

  • GUI for model management
  • Good for macOS/Windows users
  • Easy testing and experimentation

Install:

  1. Download from https://lmstudio.ai/
  2. Install and open LM Studio
  3. Download a model (llama3.1:8b recommended)
  4. Start local server on port 11434

Configure Piler:

LLM_BASE_URL=http://localhost:1234  # LM Studio default port
LLM_MODEL=llama3.1:8b

Method 3: Custom LLM Server (Advanced)#

For organizations with specific requirements:

vLLM (High performance):

pip install vllm
vllm serve meta-llama/Llama-3.1-8B-Instruct --port 11434

Ollama on remote server:

# On GPU server
OLLAMA_HOST=0.0.0.0:11434 ollama serve

# On Piler server
LLM_BASE_URL=http://gpu-server.internal:11434

Piler Configuration#

1. Enable AI Features#

Edit your Piler .env file:

# Enable AI features
LLM_ENABLED=true

# Ollama connection
LLM_BASE_URL=http://localhost:11434
LLM_MODEL=llama3.1:8b

# Cache settings (optional tuning)
LLM_CACHE_EXPIRY_MINUTES=15

# Advanced (optional)
NL2QUERY_PROMPT_FILE=  # Empty = use embedded default

2. Restart Piler#

# Systemd
sudo systemctl restart piler

# Docker
docker-compose restart piler-ui

# Manual
pkill piler-ui
./piler-ui

3. Verify AI Features#

Check LLM connectivity:

curl http://localhost:3000/api/v1/llm/ping \
  -H "Authorization: Bearer YOUR_TOKEN"

# Should return:
{"status":"ok","model":"llama3.1:8b"}

In the UI:

  1. Log in to Piler
  2. View any email
  3. Click "AI Tools" dropdown
  4. Click "AI Summary"
  5. Should see summary within 2-5 seconds βœ…

Network Setup#

Same Server (Simplest)#

Piler and Ollama on same machine:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Server                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Piler UI │───→│  Ollama   β”‚  β”‚
β”‚  β”‚ :3000    β”‚    β”‚  :11434   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Config:
LLM_BASE_URL=http://localhost:11434

Firewall: No changes needed (local connection)


Piler on one server, Ollama on GPU server:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€-┐
β”‚ Piler Serverβ”‚         β”‚  GPU Server       β”‚
β”‚             β”‚         β”‚                   β”‚
β”‚  Piler UI   │────────→│  Ollama :11434    β”‚
β”‚  :3000      β”‚ Network β”‚  (GPU-accelerated)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         └───────────────── β”€β”˜

Config:
LLM_BASE_URL=http://gpu-server.internal:11434

Firewall rules:

# On GPU server
sudo ufw allow from PILER_SERVER_IP to any port 11434

# Or open to network (if trusted)
sudo ufw allow 11434/tcp

Ollama config (GPU server):

# Allow network access
export OLLAMA_HOST=0.0.0.0:11434

# Start Ollama
ollama serve

Docker Compose Setup#

Full stack with Ollama:

version: '3.8'

services:
  piler-ui:
    image: sutoj/piler-ui:latest
    environment:
      LLM_ENABLED: "true"
      LLM_BASE_URL: "http://ollama:11434"
      LLM_MODEL: "llama3.1:8b"
    depends_on:
      - ollama
    networks:
      - piler-network

  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama-data:/root/.ollama
    ports:
      - "11434:11434"
    # GPU support (uncomment if you have NVIDIA GPU)
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]
    networks:
      - piler-network

networks:
  piler-network:
    driver: bridge

volumes:
  ollama-data:

Pull model:

docker-compose up -d
docker exec -it <ollama-container> ollama pull llama3.1:8b

Performance Tuning#

GPU Acceleration (NVIDIA)#

Verify GPU is detected:

# Check NVIDIA driver
nvidia-smi

# Ollama should detect GPU automatically
# Check logs:
journalctl -u ollama -f | grep -i gpu
# Should show: "Using GPU: NVIDIA GeForce RTX 4090"

If GPU not detected:

# Install CUDA toolkit
# Ubuntu/Debian:
sudo apt install nvidia-cuda-toolkit

# Restart Ollama
sudo systemctl restart ollama

CPU-Only Optimization#

If running without GPU:

# Use smaller model for faster responses
ollama pull phi3:mini  # 2.3GB, faster on CPU

# Update Piler config
LLM_MODEL=phi3:mini

Tune thread count:

# Set CPU threads (default: auto)
export OLLAMA_NUM_THREADS=8  # Match your CPU cores

ollama serve

Memory Management#

Limit model memory:

# Unload models after 5 minutes of inactivity (default)
export OLLAMA_KEEP_ALIVE=5m

# Or keep loaded always (faster but uses RAM)
export OLLAMA_KEEP_ALIVE=-1

ollama serve

Troubleshooting#

"LLM service not configured" Error#

Cause: Piler can't reach Ollama

Solutions:

  1. Check Ollama is running:
curl http://localhost:11434/api/tags
# Should return list of models
  1. Check Piler config:
grep LLM_BASE_URL .env
# Should match Ollama address
  1. Check firewall (if separate servers):
# From Piler server
curl http://gpu-server:11434/api/tags
# Should connect

Slow AI Responses (>10 seconds)#

Cause: Running on CPU without GPU

Solutions:

  1. Add GPU: Install NVIDIA GPU, verify with nvidia-smi

  2. Use smaller model:

ollama pull phi3:mini
# Update LLM_MODEL=phi3:mini in .env
  1. Upgrade hardware:

  2. Minimum: 8 CPU cores, 16GB RAM

  3. Recommended: RTX 3060 or better

"Model not found" Error#

Cause: Model not pulled

Solution:

# Pull the model
ollama pull llama3.1:8b

# Verify
ollama list

Out of Memory (OOM) Errors#

Cause: Model too large for available VRAM/RAM

Solutions:

  1. Use smaller model:
ollama pull llama3.1:8b  # Instead of 70b
  1. Increase swap (Linux):
sudo fallocate -l 16G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
  1. Reduce concurrent requests:
# Piler handles this automatically with rate limiting
# No config needed

Connection Timeout#

Cause: LLM taking too long to respond

Solution: Piler has built-in timeouts:

  • Email summary: 60 seconds
  • Thread summary: 120 seconds
  • Search translation: 10 seconds

These are reasonable. If hitting timeouts, your LLM is too slow (needs GPU).


Security Considerations#

Network Security#

Ollama has NO authentication!

If running on separate server:

# Option 1: Firewall (recommended)
sudo ufw allow from PILER_IP to any port 11434
sudo ufw deny 11434  # Block others

# Option 2: VPN/Private network
# Run Ollama on private network only

# Option 3: Reverse proxy with auth (advanced)
# Use nginx with basic auth in front of Ollama

Never expose Ollama directly to the internet!

Data Privacy#

What gets sent to Ollama:

  • Email subject
  • Email body (truncated to 50KB for summaries)
  • Thread messages (up to 50 messages)

What NEVER leaves your server:

  • Nothing! Ollama runs locally
  • No external API calls
  • No telemetry (unless you enable it)

Compliance:

  • βœ… GDPR compliant (data stays on-premise)
  • βœ… HIPAA compatible (no third-party processing)
  • βœ… SOC 2 friendly (complete audit trail)

Production Deployment#

For small deployments (<500 users):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Single Server             β”‚
β”‚  β€’ Piler UI                β”‚
β”‚  β€’ Ollama (CPU/small GPU)  β”‚
β”‚  β€’ MySQL                   β”‚
β”‚  β€’ Manticore               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Hardware: 8 cores, 16GB RAM, optional RTX 3060

For medium deployments (500-5K users):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Piler Serverβ”‚      β”‚  GPU Server  β”‚
β”‚ β€’ UI        │─────→│  β€’ Ollama    β”‚
β”‚ β€’ MySQL     β”‚      β”‚  β€’ RTX 4090  β”‚
β”‚ β€’ Manticore β”‚      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For large deployments (5K+ users):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Piler Masterβ”‚      β”‚  GPU Cluster       β”‚
β”‚ β€’ UI        │─────→│  β€’ Ollama (node 1) β”‚
β”‚             β”‚      β”‚  β€’ Ollama (node 2) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”œβ”€β”€β†’ Worker 1 (emails)
         └──→ Worker 2 (emails)

Use load balancer for Ollama nodes

High Availability#

Option 1: Multiple Ollama instances

# Run Ollama on 2+ servers
# Use nginx/HAProxy load balancer

# Piler config:
LLM_BASE_URL=http://llm-loadbalancer:11434

Option 2: Failover

# Primary Ollama on GPU server
# Backup Ollama on CPU (slower but works)

# Application handles failover automatically via retries

Cost Analysis#

On-Premise (Ollama)#

One-time costs:

  • RTX 4090 GPU: $1,600-2,000
  • Server: $2,000-4,000 (if needed)
  • Setup: $500-1,000 (engineering time)

Ongoing:

  • Power: ~$30-50/month (300W GPU)
  • Maintenance: Minimal

Total first year: ~$4,000-7,000

Unlimited queries after initial investment.

OpenAI/Anthropic pricing:

  • ~$0.01-0.05 per summary
  • 1,000 summaries/day = $10-50/day = $3,650-18,250/year
  • 10,000 summaries/day = $36,500-182,500/year

Drawbacks:

  • ❌ Data leaves your infrastructure (compliance risk)
  • ❌ Ongoing per-query costs
  • ❌ Vendor lock-in
  • ❌ Potential downtime (external dependency)

Recommendation: On-premise Ollama for enterprise. Cloud APIs only for trials/demos.


Configuration Reference#

All Available Options#

# Core AI Settings
LLM_ENABLED=true|false
# Enable/disable all AI features
# Default: false

LLM_BASE_URL=http://localhost:11434
# Ollama/LLM server URL
# Default: http://localhost:11434

LLM_MODEL=llama3.1:8b
# Model name (must be pulled in Ollama)
# Default: llama3.1:8b
# Alternatives: mistral:7b, qwen2.5:7b, phi3:mini

LLM_CACHE_EXPIRY_MINUTES=15
# Redis cache TTL for summaries
# Default: 15 minutes
# Threads cached for 60 minutes (hardcoded)

NL2QUERY_PROMPT_FILE=
# Custom prompt template for conversational search
# Empty = use embedded default
# Example: /etc/piler/prompts/custom_nl2query.txt

Multi-Tenant Configuration#

Same LLM for all tenants:

# Global config in .env
LLM_ENABLED=true
LLM_BASE_URL=http://localhost:11434

Per-tenant enable/disable:

-- In tenant_settings table
UPDATE piler.tenant_settings
SET settings_json = JSON_SET(settings_json, '$.llm_enabled', true)
WHERE tenant_id = 'tenant1';

Customizing AI Behavior#

Default Conversational Search Prompt#

The conversational search uses this prompt template (embedded in binary):

Location: internal/llm/prompts/nl2query.txt

Click to view full default prompt template
You are a search query translator for an email archiving system. Convert natural language questions into Piler search syntax.

SEARCH SYNTAX:
- from:email@domain.com - Filter by sender
- to:email@domain.com - Filter by recipient
- subject:keyword - Search in subject (can include multiple words: subject:word1 word2)
- subject:"exact phrase" - Search for exact phrase in subject
- body:keyword - Search in body text (can include multiple words: body:word1 word2)
- body:"exact phrase" - Search for exact phrase in body
- date1:YYYY-MM-DD - Start date (use with date2 for range)
- date2:YYYY-MM-DD - End date (use with date1 for range)
- a:any - Has any attachment
- a:pdf - Has PDF attachments ONLY
- a:word - Has Word document attachments ONLY
- a:excel - Has Excel spreadsheet attachments ONLY
- a:image - Has image attachments ONLY (jpg, png, gif, etc. - use a:image, NOT a:jpg)
- a:zip - Has zip/archive attachments ONLY
- attachment:filename.pdf - Has specific attachment filename
- size:>5M - Size greater than 5MB (use M for megabytes, K for kilobytes)
- direction:inbound - Received emails (also: direction:outbound, direction:internal)
- category:name - Filter by category (short form: cat:name)
- tag:tagname - Search by tag
- note:text - Search in notes
- id:123 - Specific message ID

OPERATORS:
- Multiple terms separated by spaces are implicitly AND (no parentheses needed)
- OR - Either term (use uppercase: urgent OR important)
- NOT - Exclude term (use uppercase: NOT spam)
- "exact phrase" - Phrase search in quotes
- * - Wildcard (john* matches john, johnny, johns)
- ( ) - ONLY use parentheses for grouping OR/NOT operations, NOT for simple AND queries

IMPORTANT NOTES:
- For date ranges, use date1: and date2: together (NOT date:X..Y)
- For "has any attachment" use a:any (NOT has:attachment)
- Valid attachment types: a:pdf, a:word, a:excel, a:image, a:zip, a:any
- For images, ALWAYS use a:image (NOT a:jpg, a:png, etc.)
- For size, use M for megabytes, K for kilobytes (e.g., size:>5M NOT size:>5MB)
- Keep queries SIMPLE - only include meaningful search terms, skip filler words
- Skip generic words like "with", "about", "problems", "issues", "emails", "messages"
- Focus on specific entities: names, subjects, dates, attachment types
- Example: "virtualfax problems with pdf attachments" β†’ "subject:virtualfax a:pdf" (skip "problems with")
- There is NO is:unread filter (this is an archive, not a mailbox)
- Direction must be: inbound, outbound, or internal

RELATIVE DATES (calculate from today: {{CURRENT_DATE}}):
- "today" β†’ date1:{{TODAY}} date2:{{TODAY}}
- "yesterday" β†’ date1:{{YESTERDAY}} date2:{{YESTERDAY}}
- "last week" β†’ date1:{{LAST_WEEK}} (7 days ago)
- "last month" β†’ date1:{{LAST_MONTH}} (30 days ago)
- "last quarter" β†’ date1:{{LAST_QUARTER}} (90 days ago)
- "this month" β†’ date1:{{MONTH_START}} date2:{{TODAY}}
- "this year" β†’ date1:{{YEAR_START}}

EXAMPLES:
Input: "emails from sarah last week"
Output: {"query":"from:sarah date1:{{LAST_WEEK}}","explanation":"Emails from Sarah sent since {{LAST_WEEK}}","confidence":0.95}

Input: "urgent messages about project"
Output: {"query":"subject:urgent subject:project","explanation":"Messages about urgent project","confidence":0.90}

Input: "employment invitation"
Output: {"query":"subject:employment invitation","explanation":"Emails about employment invitation","confidence":0.92}

Input: "virtualfax problems with pdf attachments"
Output: {"query":"subject:virtualfax a:pdf","explanation":"Virtualfax emails with PDF attachments","confidence":0.93}

Input: "large PDF attachments from vendors"
Output: {"query":"from:*vendor* size:>5M a:pdf","explanation":"PDF attachments larger than 5MB from vendor domains","confidence":0.88}

Input: "images from operator"
Output: {"query":"a:image from:operator","explanation":"Image attachments from operator","confidence":0.95}

Input: "invoices in December"
Output: {"query":"subject:invoice date1:2024-12-01 date2:2024-12-31","explanation":"Emails with 'invoice' in subject from December 2024","confidence":0.92}

Input: "emails with any attachments from john"
Output: {"query":"from:john a:any","explanation":"Emails from John that have attachments","confidence":0.95}

Input: "emails about budget OR finance from sarah"
Output: {"query":"from:sarah (subject:budget OR subject:finance)","explanation":"Emails from Sarah about budget or finance","confidence":0.93}

Input: "attachments sent in the last 5 years"
Output: {"query":"date1:2020-01-20 a:any","explanation":"Emails with attachments since January 2020","confidence":0.94}

CONTEXT-AWARE EXAMPLES (showing how to handle follow-ups):

Context: Previous query was "virtualfax problems with pdf attachments" β†’ "subject:virtualfax a:pdf"
Input: "after 2015-10-21"
Output: {"query":"subject:virtualfax a:pdf date1:2015-10-21","explanation":"Virtualfax emails with PDF attachments after October 21, 2015","confidence":0.96}

Context: Previous query was "emails from john" β†’ "from:john"
Input: "just PDFs"
Output: {"query":"from:john a:pdf","explanation":"PDF emails from John","confidence":0.95}

Context: Previous query was "invoices last month" β†’ "subject:invoice date1:2024-12-20"
Input: "over $10,000"
Output: {"query":"subject:invoice date1:2024-12-20 body:$10,000 OR body:10000","explanation":"Invoices from last month over $10,000","confidence":0.88}

{{CONVERSATION_CONTEXT}}

IMPORTANT RULES:
1. MUST return valid JSON with exactly these fields: query, explanation, confidence
2. If there is CONTEXT from previous queries, you MUST include those search terms in the new query
3. For follow-up/refinement queries (like "just PDFs" or "after 2020"), ALWAYS combine with previous context
4. Use confidence < 0.7 if the question is ambiguous
5. For ambiguous queries, explain what clarification is needed in the explanation field
6. Calculate today's date from context (assume current date: {{CURRENT_DATE}})
7. NEVER change dates provided by the user - if user says "2015" use 2015, NOT 2025 (this is an archive with old emails)
8. Use EXACT dates from user input - do not assume typos or correct dates
9. Always escape special characters in email addresses
10. Use wildcards (*) for partial matches when appropriate
11. Prefer (subject:X OR body:X) over just subject:X unless explicitly about subject only

RESPONSE FORMAT (copy this structure exactly):
{
  "query": "your translated search query here",
  "explanation": "brief explanation of what you're searching for",
  "confidence": 0.95
}

Respond with ONLY the JSON object above. No markdown, no code blocks, no extra text.

Customizing the Prompt#

Why customize:

  • Add industry-specific examples (legal, healthcare, finance)
  • Improve accuracy for your organization's terminology
  • Adjust for different LLM models

How to customize:

  1. Create custom prompt file:
# Copy embedded template (extract from binary or docs)
cat > /etc/piler/custom_nl2query.txt << 'EOF'
[Paste default template above]

# Add your custom examples:
Input: "discovery documents from opposing counsel"
Output: {"query":"from:*@opposing-firm.com category:discovery","explanation":"...","confidence":0.94}

Input: "patient records with consent forms"
Output: {"query":"subject:patient subject:consent a:pdf","explanation":"...","confidence":0.93}
EOF
  1. Point Piler to custom prompt:
# In .env
NL2QUERY_PROMPT_FILE=/etc/piler/custom_nl2query.txt
  1. Restart Piler:
sudo systemctl restart piler-ui

Template variables available:

  • {{CURRENT_DATE}} Today's date (auto-calculated)
  • {{LAST_WEEK}}, {{LAST_MONTH}}, {{LAST_QUARTER}} Relative dates
  • {{CONVERSATION_CONTEXT}} Previous queries (auto-injected)

Testing your prompt:

Try queries in the UI and check if translations match expectations. Iterate based on real usage patterns.


Monitoring#

Check AI Feature Health#

1. Test LLM connectivity:

curl http://localhost:3000/api/v1/llm/ping \
  -H "Cookie: session=YOUR_SESSION"

# Should return:
{"status":"ok","model":"llama3.1:8b"}

2. Monitor Ollama:

# Check running models
curl http://localhost:11434/api/tags

# Monitor logs
journalctl -u ollama -f

# Check resource usage
htop  # Watch CPU/RAM
nvidia-smi -l 1  # Watch GPU (if NVIDIA)

3. Check Piler logs:

tail -f /var/log/piler/app.log | grep -i llm

# Look for:
# "LLM summarization failed" - problems
# "Cache hit for email summary" - working well

Performance Metrics#

Target metrics:

  • Email summary: <5 seconds (P95)
  • Thread summary: <30 seconds for 20-message thread (P95)
  • Search translation: <3 seconds (P95)
  • LLM uptime: >99%

If slower:

  • Add GPU
  • Use smaller model
  • Check network latency (if separate servers)

Upgrading#

Update Ollama#

# Linux/macOS
curl -fsSL https://ollama.com/install.sh | sh

# Docker
docker pull ollama/ollama:latest
docker-compose up -d

Update Model#

# Pull newer version
ollama pull llama3.1:8b

# Old version removed automatically if space needed

Piler Updates#

AI features are part of Piler UI - update Piler as normal:

# Binary update
sudo systemctl stop piler-ui
sudo cp new-piler-ui /var/piler/ui/app
sudo systemctl start piler

# Docker update
docker-compose pull sutoj/piler-ui:2.1.0
docker-compose up -d

FAQ#

Q: Do I need a GPU?#

A: Highly recommended but not required.

  • Without GPU: 10-30 seconds per summary (acceptable for light use)
  • With GPU: 2-5 seconds per summary (production quality)

Q: Can I use GPT-4/Claude instead of Ollama?#

A: Technically yes, but not recommended:

  • ❌ Data leaves your infrastructure
  • ❌ Ongoing costs
  • ❌ Requires code changes (different API format)
  • ❌ Compliance risks

Ollama is designed for on-premise use.

Q: How much disk space for models?#

A: ~5-10GB per model

  • llama3.1:8b: 4.7GB
  • Keep 2-3 models for testing: ~15GB
  • Models stored in ~/.ollama/models/

Q: Can multiple Piler instances share one Ollama?#

A: Yes! Ollama handles concurrent requests.

  • Single RTX 4090: ~10-20 concurrent requests
  • Add more GPUs for higher concurrency

Q: What if Ollama crashes?#

A: AI features gracefully degrade:

  1. User gets "LLM service unavailable" error
  2. Can still use traditional search
  3. Cached summaries still served (if available)
  4. No impact on email viewing/searching

Ollama auto-restarts via systemd.

Q: Can I customize AI prompts?#

A: Yes! See AI Prompt Customization Guide

Q: Does this work offline/air-gapped?#

A: Yes!

  1. Download Ollama installer on internet-connected machine
  2. Pull model: ollama pull llama3.1:8b
  3. Copy model files to air-gapped server
  4. Works completely offline

Support#

Getting Help#

  1. Check logs: journalctl -u piler-ui and journalctl -u ollama
  2. Test Ollama directly: curl http://localhost:11434/api/tags
  3. Verify config: grep LLM .env
  4. Contact support: support@mailpiler.com

Reporting Issues#

Include:

  • Piler version: ./piler-ui --version
  • Ollama version: ollama --version
  • Model: ollama list
  • Error logs (last 50 lines)
  • Hardware: CPU, RAM, GPU (if any)

Next Steps#

  1. βœ… Install Ollama
  2. βœ… Pull llama3.1:8b model
  3. βœ… Configure Piler (LLM_ENABLED=true)
  4. βœ… Restart Piler
  5. βœ… Test AI Summary feature
  6. βœ… Train users on AI features

See also:


Last Update: November 22, 2025

Piler Version: 2.1.0+

Status: Production Ready