Email Spam Filtering

Hi Josh,

Before I write anything else, I just wanted to say a huge thanks for the work you have put into the project, and also for all the responses on the threads on this forum. Mail in a Box is absolutely brilliant.

I just have a question regarding the inbuilt spam filtering. Some of my users are receiving a lot of spam since I performed a migration over from Google Apps. This spam has always been received, but was filtered out by Google’s Spam Filter when on GA.

I understand that it is hard to get spam filtering as good as Google’s, but I have been looking at this file: https://github.com/mail-in-a-box/mailinabox/blob/master/setup/spamassassin.sh. In this file it details how Spamassassin should learn spam, if the user manually moves it to the spam folder. I have been experimenting with this and I am pretty certain that it is not learning when reported. I have tested this by sending a variety of emails from a perticular email address, and then reporting these all as spam.

When I then send further emails from this address - they still pass the spam test. I can see when running: sudo tail -f /var/log/mail.log that the email is scanned, and reported as good.

Is there any way to verify that the mentioned sa-learn is being run when a user moves a message into the spam folder manually?

That way I could verify that this is happening, because at the moment I am not 100% sure it is.

Thanks in advance, and thanks for the great project,

Ben.

1 Like

Yeah I don’t think it’s learning. Might be a permissions problem.

See https://github.com/mail-in-a-box/mailinabox/issues/357 for current discussion.

Thanks!

Hi i had problems with spam learning too.

Log message:

spampd[8583]: bayes: cannot open bayes databases /home/user-data/mail/spamassassin/bayes_* R/O: tie failed: Permission denied

Solution:

chown spampd:spampd /home/user-data/mail/spamassassin/bayes_toks

And after last update i see wrong owner and group again.Not sure when wrong owner was set: maybe after update, maybe after clearing samassasin rules, …

Recent versions of Mail-in-a-Box should have those permissions set. They are on my machine. What were the permissions before you changed them, do you know?

See https://github.com/mail-in-a-box/mailinabox/blob/master/setup/spamassassin.sh#L47 for what we’re aiming for.

If i remember right, it was “mail” owner/group.

Thanks for getting back to me guys.

Glad it is known.

FYI, checked my permissions as default without changing them.

login as: benmaynard
benmaynard@productionmail.london.uk.vpn’s password:
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-43-generic x86_64)

System information as of Sat Mar 21 16:16:42 EDT 2015

System load: 0.0 Users logged in: 0
Usage of /: 37.2% of 29.40GB IP address for eth0: XXX.XXX.XXX.XXX
Memory usage: 40% IP address for eth1: XXX.XXX.XXX.XXX
Swap usage: 0% IP address for vpn0: XXX.XXX.XXX.XXX
Processes: 145

Graph this data and manage this system at:
https://landscape.canonical.com/

0 packages can be updated.
0 updates are security updates.

Last login: Sat Mar 21 16:16:42 2015 from 25.148.205.168
benmaynard@pms01:~$ su
Password:
root@pms01:/home/benmaynard# cd /home/user-data/mail/spamassassin/
root@pms01:/home/user-data/mail/spamassassin# ls -l
total 552
-rw-rw---- 1 spampd spampd 24576 Mar 21 10:48 bayes_seen
-rw-rw---- 1 spampd spampd 659456 Mar 21 10:48 bayes_toks
root@pms01:/home/user-data/mail/spamassassin#

Thanks for this. I had the same issue.

this is still the case.
bayes_toks owned by mail:mail

I have started using MIAB just now v.26b -and checked it has right user: group privileges.

Does it learn only when you use Web - RoundCube?
Or it also works when you use a mail client app and move e-mail to junk?
It seems that benmaynard is in binary format and only index info can be displayed.
The only way I have found skimming the spamassassin manual is:

$ sudo sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 0 0 non-token data: nspam 0.000 0 148 0 non-token data: nham 0.000 0 33345 0 non-token data: ntokens 0.000 0 1516481926 0 non-token data: oldest atime 0.000 0 1517294794 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count

Is there a way to see what is inside a ebinary file in human readable format?
The only other way would be just to grep all logs for spam and analyze that?

Regards,

Has this been addressed in 0.26, ie does SA now actually learn?

If memory serves correctly SpamAssassin learns when you put spam in your spam/junk folder.

Yes, that’s the official “should work like that” explanation, but having read prior threads about spam filtering, there have been several statements that this didn’t work, in mailinabox.

So, my question is if this had been fixed and is now working?

A post was split to a new topic: How to train Spamassassin?