A long standing issue with speed for multiple mail processing was the linear feed to spamassassin sa-learn from /usr/local/bin/sa-learn-pipe.sh
Default instalation of MaiB for /usr/local/bin/sa-learn-pipe.sh is:
cat<&0 >> /tmp/sendmail-msg-$$.txt
/usr/bin/sa-learn $* /tmp/sendmail-msg-$$.txt > /dev/null
rm -f /tmp/sendmail-msg-$$.txt
exit 0
But, what if you have a mail server with multiple cores? How to do it?
1 Like
The best solution until now is this and I would suggest that it should become the default for MaiB instalations. If available, when needed, it uses most of the cores that are free.
This is the best solution I found to handle 100+emails. On a 56 core machine it runs ok for handling 10000+ emails in one go.
Basically is the default but using available resources.
#!/bin/bash
################################################################################
# SpamAssassin Parallel Learning Script
################################################################################
# This script is called by Dovecot's antispam plugin when users move emails
# to/from spam folders. Instead of processing emails one-by-one sequentially,
# it processes them in parallel to dramatically speed up training.
#
# Called with: --spam (when moving TO spam) or --ham (when moving FROM spam)
################################################################################
# Calculate maximum parallel processes
# We use (total CPUs - 2) to leave some resources for the system
# The ternary operation ensures we always have at least 1 process even on
# very small servers (1-2 CPUs)
NPROC=$(nproc) # Get total number of CPU cores
MAX_PARALLEL=$(( NPROC > 2 ? NPROC - 2 : 1 )) # Subtract 2, minimum 1
# Directory for lock files to control parallel execution
# Each parallel job will create a lock directory to claim a "slot"
LOCK_DIR="/tmp/sa-learn-locks"
mkdir -p "$LOCK_DIR" # Create lock directory if it doesn't exist
################################################################################
# Save the incoming email to a temporary file
################################################################################
# The email content comes from stdin (file descriptor 0)
# We save it with a unique name using the process ID ($$)
tmpfile="/tmp/sendmail-msg-$$.txt"
cat<&0 >> "$tmpfile" # Read from stdin and append to temp file
################################################################################
# Wait for an available processing slot
################################################################################
# This section implements a semaphore-like mechanism using directory creation
# Directory creation is atomic in Linux, so it's safe for concurrent access
slot=-1 # Initialize slot as "not found"
while [ $slot -lt 0 ]; do
# Try to claim one of the available slots (0 to MAX_PARALLEL-1)
for i in $(seq 0 $((MAX_PARALLEL-1))); do
# Try to create a lock directory for this slot
# mkdir will succeed only if the directory doesn't exist (slot is free)
if mkdir "$LOCK_DIR/slot-$i" 2>/dev/null; then
slot=$i # Successfully claimed this slot
break # Exit the for loop
fi
done
# If no slot was available, wait a tiny bit and try again
# 0.01 seconds = 10 milliseconds
[ $slot -lt 0 ] && sleep 0.01
done
################################################################################
# Process the email in background
################################################################################
# By running in background (&), we return immediately to Dovecot
# This allows Dovecot to continue accepting more emails to process
(
# Run SpamAssassin learning with the arguments passed to this script
# $* contains either "--spam" or "--ham" depending on how we were called
# Output is discarded (> /dev/null) and errors too (2>&1)
/usr/bin/sa-learn $* "$tmpfile" > /dev/null 2>&1
# Clean up the temporary email file
rm -f "$tmpfile"
# Release the slot by removing the lock directory
# This allows another process to claim this slot
rmdir "$LOCK_DIR/slot-$slot"
) & # The & makes this entire block run in background
################################################################################
# Exit immediately
################################################################################
# We exit 0 (success) right away, while sa-learn continues in background
# This prevents Dovecot from waiting for sa-learn to complete
exit 0
2 Likes
Nice one. Iām certainly going to try it.
This is also discussed on github.
3 Likes