Procedural documentation of piler enterprise edition#

This documentation applies to Piler enterprise edition 1.7.2

Revision #2

Publication date: Apr 22, 2023

What’s piler enterprise?#

Piler enterprise is an email archiving product. This document also applies to the open source edition.

Receiving emails#

Piler receives emails via the smtp protocol. The piler-smtp daemon listens on port 25 by default (it’s a configurable value) and accepts all emails from any remote smtp client that is able to reach port 25.

You may use the smtp acl feature or appropriate firewall rules to ensure no malicious host can send emails to the archive.

Note that even though the piler-smtp daemon accepts all connections on its own, it’s not an open relay. It only receives emails, but doesn’t send any.

Piler-smtp syslogs a variable amount of data of the smtp transaction, eg. remote host IP address, date and time, envelope sender, message size and a unique ID assigned to the smtp transaction.

Some typical syslog entries for an smtp transaction (the open source edition doesn’t have the customer label):

Apr 18 07:45:08 myarchive piler-smtp[481]: connected from 172.21.0.8:40910 on fd=7 (active connections: 2)
Apr 18 07:45:08 myarchive piler-smtp[481]: received: 06L4YG5CJVVYXVK0, customer=aaa, from=sender@aaa.fu, size=29083, client=172.21.0.8, fd=7
Apr 18 07:45:08 myarchive piler-smtp[481]: disconnected from 172.21.0.8 on fd=7, reason=finished (3 active connections)

Note that a more verbose logging is possible by increasing the verbosity value in piler.conf

During the smtp transaction the piler-smtp daemon writes the email to a temporary file in /var/piler/tmp directory. The file name is the same as the internal id (06L4YG5CJVVYXVK0 in the above example). Both the directory and the file permissions allow only the user piler to have access to this file. (Note that the root user explicitly has access to the whole system).

After safely writing the email to the disk the piler-smtp daemon sends back the internal ID to the smtp client with the 250 OK reply, eg. 250 OK <06L4YG5CJVVYXVK0>.

If there’s an error during the smtp transaction, then the piler-smtp daemon returns an appropriate error code to the smtp client and syslogs the issue thus making sure no email is lost.

The piler-smtp daemon supports the STARTTLS SMTP extension to encrypt emails while transmitted over the network.

Processing emails#

The piler-smtp daemon drops new emails inside the piler tmp folder (/var/piler/tmp). The piler daemon reads this directory for new emails, and processes them as soon as possible. After processing a new email, its temporary file is removed from disk.

Parsing the email#

Piler parses the email, and extracts its metadata, eg. sender, recipients, date, size, message id, etc. as well as the attachment metadata (eg. filename, size, etc). All metadata is written to the piler mysql database.

Single instance copy#

Piler stores an email in a single copy only, even if it’s sent to the archive several times. According to RFC 2822 “every message SHOULD have a "Message-ID:" field.”. The Message-ID header is important for piler. Because it’s a unique value for every sent email, piler uses this information to determine if the message is duplicate, ie. whether it’s already stored in the archive. If that’s the case, then piler discards the duplicated email and syslogs that it’s a duplicate.

Piler also deduplicates the attachments. To do so it extracts the attachment from the email body, and stores it separately. The email is reassembled transparently before presenting it to the user. Eg. if the employees of a company send the company logo in their signatures, then the company logo is stored only in one copy in the archive.

Storing emails and attachments#

To save even more disk space, piler compresses each stored file (both emails and attachments) using zlib. After compressing the files they are also encrypted using the AES-256 algorithm. The encryption key is accessible only by user piler. Without the encryption key even a privileged user is not able to retrieve the contents of the emails.

Note that piler can put emails and attachments to an S3 compatible object store. When doing so only the encrypted data is sent to the S3 object store.

Other than the compression and encryption no format transformation is applied, eg. a Word document is not converted to a text or PDF file. Attachments are stored in their original format. Also there’s no encoding transformation, eg. a base64 encoded attachment is stored as base64 encoded.

After storing the email and its attachments any temporary files are removed. In case of any internal error the temporary file piler-smtp created is saved to the error directory (/var/piler/error) for later inspection. The GUI displays the number of error emails for administrators.

Piler also syslogs the result of its action. The status can be ‘stored’, ‘discarded’, ‘duplicate’ or ‘error’. The log entry for a successfully stored message:

Apr 13 18:11:36 a455f3977b50 piler[836]: 1/aaa-ABFPBO56SDC73TB6: 400000005e9ab00e034d67d400832338dd28, size=29080/12352, attachments=2, reference=, message-id=<6YLOQLOLF7R07WVJE1FS64QTW1U7FF5OMM0Y67XE@myhost.aaa.fu>, retention=2557, delay=0.04, delays=0.01/0.01/0.00/0.00/0.02/0.00, status=stored

Indexing emails#

Piler relies on a 3rd party software (manticore or sphinx search) to provide a searchable archive. The parser extracts the textual information of the email including the attachments, and some information from the email headers, eg. subject, sender, recipients, date and the message id. The textual data to be indexed are written to a mysql table (sph_index).

The sphinx or manticore indexer is called periodically to read the sph_index table, and updates the index files. After processing the sph_index table the indexer removes all processed rows from the table. The cleartext data stays in the sph_index table for up to 30 minutes.

The index data directories have 0700 permissions to make sure only the piler user can access the index data. Note that the data in the index database are not encrypted.

Starting from version 1.7.2, piler supports manticore realtime (rt) indexes as well. In that case the index data is written to the manticore daemon directly.

The installer script (starting from version 1.7.2) by default enables encryption at rest for the mysql data.

Rules#

Exclusion rules (formely archiving rules)#

Administrators may set up rules to discard specific emails based on subject, size, sender, recipient, etc. After parsing the email, piler iterates over the exclusion rules, and checks if the given message matches any of them. If so, then piler discards the email, and syslogs the event as well as the matching policy rule, eg.

Apr 18 07:43:41 myarchive piler[836]: 1/aaa-S22SZU3URP71GW9T: discarding: archiving policy: *customer=fictive,domain=,from=newsletters@aaa.fu,to=,subject=,body=,size0,att.name=,att.type=,att.size0,spam=-1,days=0*
Apr 18 07:43:41 myarchive piler[836]: 1/aaa-S22SZU3URP71GW9T: 400000005e9aafb70dd6464400fe5a46aa1d, size=110529/0, attachments=1, reference=, message-id=<20151120041635.1D16E68BB67DD947@aaa.fu>, retention=0, delay=0.00, delays=0.00/0.00/0.00/0.00/0.00/0.00, status=discarded

Retention rules#

Piler assigns a retention value to each archived message when the message is being stored. The retention timestamp is stored in the metadata table. The default value is set in the default_retention_days parameter in piler.conf. However this value can be overridden by using the retention rules.

It’s important to know that if you changed the retention value either in the piler config file or in the retention rules it wouldn’t affect the retention values of the already stored messages. The new retention values would apply to new messages only.

The calculated retention value (in days) is also syslogged.

Administrators should follow both company and industry standards when setting the retention policy for the archive.

Data integrity#

Besides the AES-256 encryption the file and directory permissions ensure that only the user piler can access the stored emails and attachments.

Also while storing the emails and files an SHA256 hash value is computed and stored for the given message and its attachments. When the email is retrieved from the archive its SHA256 hash value is computed then compared against the stored hash value in the mysql database. If they don’t match, then the fact that the retrieved message is not the same that was stored is displayed to the end user.

Piler stores various timestamps in the metadata table, eg. when the message was sent, when it arrived and was archived, as well as the retention time.

Note that piler enterprise doesn’t allow any user including the administrators to alter any archived message.

Purging emails#

Piler retains messages until their retention value expires. Piler runs a daily task to get rid of aged or otherwise unwanted emails (see the GDPR related notes below).

The purge tool queries the metadata table to find out what emails to remove from the archive based on their retention timestamp. Then it physically removes them from the system, and updates the ‘deleted’ column in the metadata. The ‘deleted’ column instructs sphinx search to exclude the deleted emails from any search results. When using realtime index, then the email is deleted from manticore's own database.

Note that if the attachment in the email to be removed is present in other emails, too, then the given attachment is not removed to ensure the integrity of the remaining emails.

Also there might be cases when you want to preserve someone’s all emails even if some of their emails’ retention is expired. To do so administrators may add those email addresses to the legal hold table on the GUI, and the purging tool will exclude such emails no matter if the user’s emails are expired or not.

Authentication#

The GUI allows access to emails only after authentication. The archive administrator may set up several authentication methods, eg. local database, LDAP, Active Directory, Azure AD, Single Sign-On (SSO), etc. Two factor authentication is also supported to enhance security. In that case the recovery codes are stored in the piler mysql database.

All login attempts are syslogged. Bellow is a successful login against an LDAP database:

Apr 18 10:31:50 myarchive piler-webui[432]: ldap query: base dn='ou=usersF,dc=nodomain', filter='(&(objectClass=inetOrgPerson)(mail=jim@fictive.com))', attr='', 1 hits
Apr 18 10:31:50 myarchive piler-webui[432]: ldap auth against 'ldap.fictive.com', dn: 'cn=Jim Jones,ou=usersF,dc=nodomain', result: 1
Apr 18 10:31:50 myarchive piler-webui[432]: ldap query: base dn='ou=usersF,dc=nodomain', filter='(|(&(objectClass=inetOrgPerson)(mail=jim@fictive.com))(&(objectClass=posixGroup)(memberuid=jim@fictive.com))(&(objectClass=posixGroup)(memberuid=cn=Jim Jones,ou=usersF,dc=nodomain)))', attr='', 2 hits
Apr 18 10:31:50 myarchive piler-webui[432]: ldap auth result against ldap.fictive.com / generic_ldap: 1
Apr 18 10:31:50 myarchive piler-webui[432]: username=jim@fictive.com, customer=fictive, event='logged in', ip=172.20.0.1

This is a failed login attempt:

Apr 18 10:38:07 myarchive piler-webui[431]: ldap query: base dn='ou=usersF,dc=nodomain', filter='(&(objectClass=inetOrgPerson)(mail=jim@fictive.com))', attr='', 1 hits
Apr 18 10:38:07 myarchive piler-webui[431]: ldap auth against 'ldap.fictive.com', dn: 'cn=Jim Jones,ou=usersF,dc=nodomain', result: 0
Apr 18 10:38:07 myarchive piler-webui[431]: ldap auth result against ldap.fictive.com / generic_ldap: 0
Apr 18 10:38:09 myarchive piler-webui[431]: username=jim@fictive.com, customer=fictive, event='login failed', ip=172.20.0.1

The GUI supports CAPTCHA to slow down a brute force login attack. Also single sign-on (SSO) is supported for passwordless access.

Accessing the emails#

Piler provides a web based GUI to users accessing their emails. The search engine provides the search results in seconds, provided it’s allocated enough resources to serve requests.

To ensure maximum security administrators should set up TLS for the virtual host serving the archive. They may also apply other restrictions to the GUI, eg. limit the access to a range of IP-addresses only.

The GUI has a built-in access control to prevent a regular user from accessing others' messages. Auditors can see every archived email within the same organization. If such a user is not needed, then remove this user. From version 1.4.9 there’s no default auditor user that was created at the installation of the archive.

Note that administrators may create groups in the piler GUI. The groups can be used to grant access to emails belonging to other email addresses. A typical usage of groups is to provide access to emails sent to a mailing list or distribution list address.

Users may perform arbitrary search queries, however a filter is applied to the search query automatically by the GUI to limit access to their own emails only. This filter is compiled from the given user’s email addresses. All search queries are logged to syslog, eg.

Apr 18 10:51:42 myarchive piler-webui[433]: sphinx query: 'SELECT id FROM fictive_main1,fictive_dailydelta1,fictive_delta1 WHERE MATCH(' (@from jimXfictiveXcom | @to jimXfictiveXcom) ') ORDER BY `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 15 hits, 15 total found

Additional notes on the GUI#

A user may assign tags and notes to his own emails. These metadata are stored in the mysql database, and also indexed and then can be searched. Each user can search and see his own tags and notes only.

Users can save search queries. The saved queries are stored in mysql database and in memcached if memcached support is enabled.

The administrator account (eg. admin@local) is used only to administer piler. It's not a super powerful account to see anyone's emails that's why an admin user can't see the search menu at all.

The administrator role may see system statistics, accounting summary, edit user / group settings, policies, see audit logs, etc.

Users are able to restore their own emails to their own mailboxes. To do so the GUI restores the selected emails by sending them to the smarthost. Currently this traffic is not encrypted, however administrators may deploy a local smtp relay on the localhost, then use 127.0.0.1 as the smarthost, so the restored email won’t leave the archive host in cleartext, provided that the smarthost is configured to send using encryption.

Users may download their emails in a zip or eml file. These files are created in /var/piler/www/tmp directory during the process, and removed immediately after sent to the user for download. These zip and eml files are not encrypted.

Auditing#

The GUI keeps track of what users do and when they do. Every user's action involves an audit record that the GUI stores in the audit mysql table creating an audit trail of every user activity, eg. searching for some messages, viewing a message, downloading another one, etc. Also the GUI records when a user logs in or logs out.

The following information is logged:

Administrators and auditors are able to search within the audit logs, and even export the audit trail as a CSV file.

Auditing is enabled by default, however administrators may turn it off if it’s not necessary.

Piler can be configured by setting ENABLE_DELETE flag in the config-site.php file to allow auditors to remove certain emails from the archive, eg. when a user receives an email with sensitive personal data.

If the Data Officer feature is also enabled, then auditors may only mark messages for removal, and the data officer must either to accept the removal request, and delete the given message or to reject the removal request.

An auditor must provide a short explanation, a reason why he wants the given message to be removed. Also the data officer must provide the rejection reason if he rejects the removal. Both actions are logged to the mysql database table called “deleted”. Note that the data officer has permissions to see any email marked for removal before he removes the message to ensure it’s a valid request.

If the message is removed in the GUI, then it gets grayed out and visually displays the fact. However, the message is still in the archive. Only the purge utility removes the stored message and its attachments physically from the disk. See the notes for purging emails above for more.

Disaster recovery#

It’s the archive administrators' liability to perform regular backups of the archived data in a safe manner following the regulations of both the company and the relevant industry.

The legal basis for mail archiving varies from country to country and from industry to industry. Please consult with your attorney if you are subject to a mandatory email archiving.

You may find some laws, acts and regulations below demanding for an audit-proof email archive.