Spamassassin learning SPAM2HAM and HAM2SPAM. Purging spam folder

robertpopa22 · February 15, 2022, 4:47pm

Hello,

It is pretty clear that by moving an email to the spam folder, spamassassin learns that this is spam.
But, how does spamassassin learn that a false positive spam message can be HAM?
For example I move the false positives to the archive folder? Is this ok? Should I move it to the inbox and after to archive?
When can I purge the spam folder? I do not want to keep a very long spam history…
If I delete the spam folder does spamassassin learn that the messages are no longer spam?

Where can I check details on this issue?

Thanks!

openletter · February 15, 2022, 4:50pm

When I have looked into this, what I determined is that sa-learn is the process that is doing the work. A simple way to answer these questions is run top and then move a message. You will see sa-learn at the top of the processes for one second and then disappear. Or at least this works on a small server.

You don’t need to keep the messages forever. IIRC, sa-learn does its learning when the message is moved unless you use the CLI options to analyze a file or directory (these exist for the purpose of training new installations).

robertpopa22 · February 15, 2022, 7:34pm

Ok, after some testing with sudo sa-learn --dump magic I have found that:

Moving from Inbox to SPAM, increases nspam, decreases nham
Moving from SPAM to Inbox, increases nham, decreases nspam, so it works backwards
Deleting from SPAM folder does not decrease nspam (so, once an email was moved to the spam folder and automatically learned, you can safely delete it)
Moving from SPAM folder to ARCHIVE and reverse is similar to INBOX-SPAM (steps 1 and 2)

So, for a normal setup, for eg with Thunderbird client, you can let your users decide on what to do with the spam/ham and the system learns on each move!!! Great functionality!

NOTE! For larger emails sa-learn will not work. Please check your limits at /etc/default/spampd , ADDOPTS="--maxsize=2000". My server has a limit of 2000B (bytes)…

At this moment I try not to change any defaults as I am not familiarised with updates…