Solr for search in dovecot
I’ve been searching around various issues on GitHub and here and discovered that Lucene based FTS (Full Text Search) was dropped some time ago due to a lack of maintained package in the Ubuntu repository. So I decided to implement a Solr backed FTS for my MIAB instance. It was actually very easy to add without having to modify much, so this is a very basic guide for anyone interested.
Why do I need FTS?
Without FTS, searching mail is extremely slow and returns very poor results. Dovecot can interact with various FTS backends, the current preferred one is Apache Solr.
How do I install?
You will need Docker installed for this guide. If you’re on Ubuntu 18.04 then the following guide will work: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04
Once Docker is installed you can work through the following steps.
Install the dovecot-solr package so that dovecot and solr will be able to interact:
sudo apt-get install dovecot-solr
Pull the version of solr that we need:
sudo docker pull solr:7
Create the directory that we’ll use for our solr config and that will be mounted into the Docker container:
sudo mkdir -p /srv/solr
The directory needs to be owned and written to by the docker container so we need to change ownership:
sudo chown 8983:8983 /srv/solr
The next command will open a new shell session inside the solr docker container:
sudo docker run --rm -it -v /srv/solr:/var/solr -e SOLR_HOME=/var/solr -e INIT_SOLR_HOME=yes solr:7 bash
Now you will be inside the container. You need to run these commands:
init-solr-home
precreate-core dovecot
exit
After exiting you will be back in your host’s shell session.
Now we’ll prepare the solr config:
cd /srv/solr/dovecot/conf
First we need to remove the existing files:
sudo rm -f schema.xml managed-schema solrconfig.xml
And then download the dovecot ones. First the solrconfig.xml
file:
sudo curl https://raw.githubusercontent.com/dovecot/core/master/doc/solr-config-7.7.0.xml -o solrconfig.xml
Next the schema.xml
file:
sudo curl https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0.xml -o schema.xml
Finally we need to change ownership of the files we just downloaded so that the docker container can read them:
sudo chown 8983:8983 /srv/solr/dovecot/conf/*.xml
We’re now at the point where we can start the docker container. The following command starts the container and tells Docker to keep it running, even after reboots. The memory limit is adjustable so feel free to tweak it as needed, here it is set to 2 gigabytes which should work well enough for a small server with not many users.
sudo docker run -d --name solr --restart unless-stopped \
--log-driver json-file --log-opt max-size=10m --log-opt max-file=3 \
-m 2G -v /srv/solr:/var/solr \
-e SOLR_HOME=/var/solr \
-p 127.0.0.1:8983:8983 \
solr:7
The Solr container will now be running locally on port 8983. You can double check everything is working correctly by checking the docker logs:
sudo docker logs solr
Now that solr is working we need to tell dovecot about it. Instead of modifying files that Mail in a Box might manage we can add new ones which will be picked up automatically on restarting dovecot.
Create a new file (below we’re using vim but feel free to change to nano):
sudo vi /etc/dovecot/conf.d/11-mail.conf
Paste this into the file and save it:
mail_plugins = $mail_plugins fts fts_solr
Now create another file:
sudo vi /etc/dovecot/conf.d/91-plugin.conf
With the following content:
plugin {
fts = solr
# Fall back to built in search.
#fts_enforced = no
fts_solr = url=http://127.0.0.1:8983/solr/dovecot/
# Detected languages. Languages that are not recognized, default to the
# first enumerated language, i.e. en.
fts_languages = en
# This chain of filters first normalizes and lower cases the text, then
# stems the words and lastly removes stopwords.
fts_filters = normalizer-icu snowball stopwords
# This chain of filters will first lowercase all text, stem the words,
# remove possessive suffixes, and remove stopwords.
fts_filters_en = lowercase snowball english-possessive stopwords
# These tokenizers will preserve addresses as complete search tokens, but
# otherwise tokenize the text into "words".
fts_tokenizers = generic email-address
fts_tokenizer_generic = algorithm=simple
# Proactively index mail as it is delivered or appended, not only when
# searching.
fts_autoindex=yes
# How many \Recent flagged mails a mailbox is allowed to have, before it
# is not autoindexed.
# This setting can be used to exclude mailboxes that are seldom accessed
# from automatic indexing.
fts_autoindex_max_recent_msgs=99
# Exclude mailboxes we do not wish to index automatically.
# These will be indexed on demand, if they are used in a search.
fts_autoindex_exclude = \Junk
fts_autoindex_exclude2 = \Trash
fts_autoindex_exclude3 = .DUMPSTER
}
With those two files in place we just need to restart dovecot:
sudo service dovecot restart
And now we should have dovecot and solr working together to provide extremely fast full text searching capabilities.
You may want to index your content in advance of searching, in which case you can do the following, changing username@domain.com
as appropriate:
sudo -u mail -g mail doveadm -v fts rescan -u username@domain.com
sudo -u mail -g mail doveadm -D -v index -u username@domain.com -q Inbo
This will take a bit of time, depending on your server’s resources. On my server with a few gigabytes of emails it took around 5 minutes. I monitored progress by looking at the output of top
and once the processor calmed down I knew it was done.
Sources
I followed this guide: https://blog.onee3.org/2020/03/apache-solr-as-dovecot-full-text-search-backend-with-improved-cjk-support/
But modified slightly to not include cjk support, instead using the default dovecot config.
Troubleshooting
I had an error starting the solr core which was down to not changing ownership of the downloaded xml files to 8983:8983
If you find things are not working check the docker logs: docker logs solr
You can also telnet into solr, see the testing section: https://wiki.dovecot.org/Plugins/FTS/Solr