Postgrey efficacy

MarthinL · April 9, 2024, 1:07pm

Hi,

Postgrey is one of the packages MiaB sets up for maximum protection against spam.

Postgrey’s approach is to reject the first email from any source it has not seen before on the assumption that legitimate email servers will retry whereas spamming servers are compelled by their volumes to give up on rejected addresses to save on resource usage.

In my personal experience I get as much spam after implementing MiaB with postgrey as before, and when I look at the headers of the spam emails I find that they were indeed delayed by postgrey but ultimately came through anyway because the originating server retried anyway.

While trying to get a handle on how effective others have found the postgrey aproach to be, I came across quite a debate on the topic between detractors and what I assume to be loyalists and/or maintainers using a lot of vague words, fear mongering and “just-keep-it-on-no-big-deal-just-for-safety” sentiments.

Which is weird, because by rights there is an extremely simple, direct and cheap way to measure exactly how efective postgrey (still) is. A simple count of how many emails came from senders/servers that never retried, per mailbox, domain or MiaB server would be a objective measure of its efficacy.

If anyone have devised a way to extract such a number from the logs or postgrey’s working files I’d love to learn how to do that myself. I’m aware that the postgrey community would be a far more reliable source for such enabling insights, but I don’t expect to make many instant friends there by bringing it up.

Since (the delays caused by) postgrey filtering has a marked impact on user experience, I propose that as a MiaB community we do what it takes to factually determine f the protection provided by postgrey is worth the impact, and if it doesn’t pass muster to work towards removing it from the solution.

My aim isn’t to make enemies of whoever are involved in postgrey or to discredit their work. If it has value quantifiable value, let it be known, but if it doesn’t we should just quietly move forward without it.

alento · April 9, 2024, 1:23pm

My initial thought would be to grep the mail log for the postgrey deferral entries, then grep the deferred addresses for later successful deliveries.

It should be an easy bash script for someone to write.

JoshData · April 9, 2024, 3:37pm

I think it used to be more effective than it is today. Always happy to consider a data-driven suggestion (although keep in mind everyone’s spam experience may be different).

aroundmyroom · April 9, 2024, 4:06pm

Just for that and the remarks of my users: why does it takes so long for receiving an e-mail (especially when resetting a password, or when you need a time limited number for SSO purposes) ? Weekly 1 or 2 times I got this question … but not anymore …

That is why I have disabled the greylisting in postfix

MarthinL · April 10, 2024, 7:00am

Can you recommend a way to disable greylisting in MiaB’s postfix settings that isn’t subject to being undone by upgrading and MiaB’s practice of overwritting settings for the sake of safety, i.e. using only .local config files?

MarthinL · April 10, 2024, 7:43am

It works, in principle, but in principle the mail log is rotated too often (weekly, it seems) so that even with the most recent log files kept as zipped archives it still only keeps about a month’s worth of data.

Committing the time to develop bash (as suggested), awk (as I’d expect would be easier) or python (as an absolute last resort) script to get at most a one-month view doesn’t strike me as a great investment. Not when greypost is a database driven application and would in all likelihood already have all the required data already in tables without having to scrape log files for it.

There are some complications though.

Postgrey tries to keep its databaswes clean so that it can run long term with minimal intervention, which means that records of individual emails received without retries from individual sources would eventually age out of its database too.
Based on assumptions similar to “spamming servers don’t retry” postgrey actually identifies email sources based on the first three bytes of the origininating server’s IP address rather than the name. I suspect that assumption could also be tested and subsequently confirmed or refuted.
For grey-listed email the logs suggest that postgrey also considers servers that retry sooner than 2 minutes as suspicious but I’ve seen several largely reputable email sources such linkedin’s acting on a retry schedule that first tries within two or three seconds (which gets rejected for retrying too soon) and then two minutes later which then passes. This has several implications. Firstly it suggests that postgrey’s author David Schweikert is aware that some servers defeat his assumptions, secondly that some legitimate servers have already adopted a retry policy which breaks his assumptions and thirdly that it is just a matter of time before spamming servers also adopt retry schedules which defeat the remainder of his assumptions too.

The github project for postgrey reveals that it’s actually fairly old, fairly stable (or perhaps stagnant), has a single primary author (David Schweikert) who has many other projects as well. That makes me wonder if David wouldn’t perhaps be approachable on the matter after all. Perhaps he’s just as sick and tired of the arms-race of trying to keep postgrey effective that he’d actually be keen to show that his original and subsequent strategies have since been defeated.

@JoshData, have you been in touch with David Schweikert? Would you (or anone else for that matter) care to join me in reaching out to David as a group rather than just one user? It might even be that as a group we can offer David some alternative strategies to implement if he’s still keen on getting ahead in the arms-race against spam.

MarthinL · April 10, 2024, 8:31am

@JoshData well, sure, I guess. Do we need to be sympathetic to those who send out spam? Would any spammer choose MiaB? Would you want them to?

I think that in order to do what it does postgrey has done the tough bit of inserting itself into the mail chain where it can monitor incoming messages and the behaviour patterns of its sources. I don’t wish to see that disappear completely, but I do think it needs to move with the times and start using its position in the MTA toolchain to better effect.

The MiaB community might be relatively small and completely adverse to complicated setups, but I hold that as a group under your guidance we offer unique value of our own. We don’t control what gets agreed to behind the closed doors of the major players in email, but we do control what we agree to do ourselves or in MaiB. As such, and if we condemn the entire practice of spamming, we can pool meta-data from all MiaB servers to identify legitimate senders, spam senders and their emails far quicker and more acurate. If we do it right, with the help of the right people, we can expect to make life quite difficult for someone to get a spam message delivered to more than a handful of MiaB users before all our servers know to reject those hard and permanently.

How I see it working is that we use postgrey or a derivative of it to gather meta-data about senders and their emails. But, instead of reacting to it by rejecting and later accepting in an autonomous and isolated manner, all MiaB servers send that data to a central pool. At the pool we’d then have a much broader view of who is sending out bulk email (to MiaB users). We’d then arrange for MiaB to show a communal whitelist of legitimate bulk email sources and a collective greylist of bulk senders which have not been approved by the MiaB community. The MiaB community, through that user interface, would then have the opportunity to point to a suspected bulk mail sender to identify that sender as legitimate as a suggestion. If nobody objects, the source is moved to the communal whitelist. But as soon as a new entry is added to the communal greylist, each MiaB server would have cause to either reject emails from that source permantly or mark them very clearly as SPAM.

I understand there are reputation based efforts similar to this underway on a much larger scale. I also understand that spam originators have cause to make ife difficult for those efforts on the grounds of “hey, we’ve got the right to make a living too”. Doing this as a community will achieve two things. Firstly it will ensure better results quicker and secondly it will expose any community member with any sympathy for spammers.

JoshData · April 10, 2024, 10:22am

I haven’t. My empathy for other solo maintainers says that you shouldn’t reach out just to brainstorm. If he’s not actively working on postgrey, he probably has other priorities.

Haha no I meant that postgrey may work better for some people than for others, and the statistics you gather from one machine are not likely to be representative of all Mail-in-a-Box users. Everyone receives different spam.

Building a reputation system from scratch is probably something I don’t have time to get involved in, which means I probably would not accept the change.

In earlier versions of Mail-in-a-Box I modified postgrey to use an existing reputation service. Postgrey would query DNSWL.org for the sender’s IP address and skip greylisting if they are in DNSWL’s list. I dropped those changes from Mail-in-a-Box at some point because of the complexity of integrating a modified package. Those changes are here:

I would be open to bringing that idea back in some way.

MarthinL · April 10, 2024, 2:51pm

I’d never suggest it.

Agreed. I’d put something more concrete on the agenda.

While I accept that along with spam enablers (bulk mailers et al) wisening up to the resend trick they’ve also traded in their old indiscriminant mailing practices to sell more targeted campaigns. If everyone really got different spam it would by some definitions no longer be spam, now would it? We decide what email we want and what we don’t want, not some corporation, government agency, coalition of service providers or internet task force. If we act in concert as a community we can make and enforce our own decisions. Anyone sending out legitimate emails would know to avoid using bulk email providers who allow spam or sell targeted email campaigns themselves. As such we’d be within our rights to identify and blacklist shady email practitioners wholesale. The internet’s formal overseers cannot do the same since it hinges on guilt by association which is easily overturned.

Accepted. I wasn’t aiming to burden you with the implementation as such. My plan was and for the time being remain to strike up a constructive conversation with David Schweikert with the intent of enabling his postgrey soltion with communal whitelist and greylist capabilities. I don’t want to approach him empty handed though. I’d like to be able to say to him that the you as MaiB maintainer have the support of the MiaB community to serve as anchor tenants for the new communal facilities. I’d also be saying that am able and willing to help design and implement the solution I am proposing to run as a distributed system so that the load isn’t on any central server but spread over the participating MiaB servers.

In light of the above, I’m not actually asking that you accept the change in the sense of taking resposibility for writing or even reviewing the code involved. What I am asking is that you consult or lend your support when I consult the MiaB community to test if they will be in favour of asserting such control over spamming practices and pracitioners as a community of self-hosting email providers and its users. If there is general consensus that it would be desirable, then I’d simply ask that you’d allow me to tell David that you’re committed to integrate the new version or derivative of postgrey into MiaB as it becomes available as first adopters.

Like you, and for the exact same reasons, I am not about to commit any of my time or attention to something unwanted or ineffective. It’s presumably easy enough to disable postgrey and unless someone makes a ignificant move the spam sourge will keep growing. But there are things we can do, like actively choosing to work together in communities like this. When I happen to be the one who spots the opportunity and knows how to make it happen my I will always aim towards wanting everyone to benefit from it. Don’t ask me too much or too soon about the economics of it all, because I believe that finances are only a problem when something costs too much to produce for what value it adds.

I’ve looked at dnswl.org. Their product is probably valid but from my perspective completely upside down. They’re compelled to (it’s impossible to say whether it’s their choices, circumstance or client base that compel them) to enable senders, including bulk email providers which in turn include nefarious bulk email providers to get whitelisted with the intent on getting emails from those providerfs through the defences people put up to reduce unsolicited email. I tend to read email headers of unsolicited emails to see why my filters did not reject it. Quite often I end up just shaking my head in disbelief because of how easily the DKIM-, SPF-, DMARC- and whatever else -based rules are getting defeated. Blatant stuff, yet by the time any one of those rules and weights are considered fit for general consumption they are utterly toothless. That’s the dilemma and it’s ever apparent in dnswl.org’s solution.

The opportunity we have, since we have the makings of a community of relatively like-minded email users in the sense that none of us are bulk email enablers and all of us are impacted by unsolicited email being sent from bulk email providers, is to take a much harsher and more direct approach to identifying and dealing with unsolicited email. Because we can agree to work together to shared objectives, and because we’re generally speaking “on-it” in the sense of each one of us more or less actively manage and support a number of users directly, the odd false negative or positive will soon be discovered and specifically addressed as users report on expected messages not arriving or spam received anyway. Unlike public facilities we’re not obliged to recognise every bulk email provider’s right to make a living selling targeted email campaigns.

As far as a workable (re)definition of spam is concerned, I will add this 2c’s worth of opinion into the mix. Legislation in most jurisdictions today require that emails sent to mailing lists offer an unsubscribe option, and by and large most do. One trouble is that fake emails usually breaks the unsubscribe link in one way or another. The bigger problem though are mailing lists that offer unsubscribe options what are not adhered to, meaning you unsubscribe but they keep sending like nothing happened. My 2c would be that within this community of email users I’m proposing we adopt a specific position. In general we’d let people vouch for email senders they consider legitimate. But if someone reports that such a provider fails to adhere to a request to unsubscribe, that becomes grounds for taking the provider off the whitelist.

Generally speaking I’m convinced that the existence of some insanely valuable email coming from a source you’ve never interacted with by way of a bulk email provider is pure myth. Not all email from people you’ve never heard from are spam. Not all email sent from well-known sources sending out millions of individually addressed mails each day is spam. But it’s highly probable that every email sent by servers being used for indiscriminate or targeted email campaigns are unwanted or at least deserving of landing in a junk mail folder.

If we end up go through with this idea, the result would be that email from your average individual or company would land in your inbox without delay and unmarked. If you get an email from a mailing list you don’t approve of, and either unsubscribe doesn’t work or you don’t even want to click on that link either in case it’s a trap, then you get to tell your email admin and they get to declare that email as unwanted. If enough people (we can decide on a number) call out emails coming from servers that clearly defeat all the protection we’ve put up as being unwanted, that entire server gets identified as a spam facilitator and either black- or greylisted.

Of course the question of gmail and the likes will come up. Will it be feasible and fair to consider all email sent from google as the same. Well, that remains to be seen, but I suspect the answer has to do with whether Google is able and willing to guard against people using their services to send bulk email. I believe they have the mechanisms for that in place but I haven’t personally seen evidence it getting engaged. I do check the both microsoft’s and google’s TLS reports from time to time though, and I’m yet to see any indication that some of the spam I get had been sent from either source. I’d say on that evidence it’s probably a fair initial assumption that whitelisting Google’s servers would open any flood gates for spam to rush through. If it happens, it seems both Google and Microsoft have prepared themselves to disallow their free or paid services from being abused for sending unsolicited bulk email. Neither is keen on giving away something they could charge for. The only worry is if either of them are complicit in sending unsolicited bulk email as a paid service, or more specifically if the servers involved with that can be identified as such without impacting free mail customers.

Anyway, this post has become far longer than I intended. Sorry. I’ll keep trying to shorten things in future but I did have a lot to get off my chest on this topic.

JoshData · April 10, 2024, 6:48pm

You are welcome to post on this forum to get input from the community about it.

stylnchris · April 10, 2024, 11:44pm

I’m still failing to see how this is an issue either way.

You dont like it, then make an entry for yourself in /etc/postgrey/whitelist_recipients

root@mail:/etc/postgrey# cat whitelist_recipients
# postgrey whitelist for mail recipients
# --------------------------------------
# put this file in /etc/postgrey or specify its path
# with --whitelist-recipients=xxx
postmaster@
abuse@
admin@
ADD-YOUR-EMAIL-PREFIX-HERE@

then restart postgrey

service postgrey restart

End of discussion…

MarthinL · April 11, 2024, 8:11am

I’m sorry if that’s my fault. If you’re actually interested in understanding just say so and I will make another attempt at explaining the situation.

Liking it or not is irrelevant here.

Not even close.

Defeating or disabling postgrey is an option of last resort which will at best negate its negative impact if it proves pointless and unwilling to evolve. Bypassing postgrey will not prevent or reduce spam, which is ultimately the objective, right?

Unless you rely on, cooperate with, participate in or otherwise sympathise with any spam enabling operation (if you were, would you openly admit it) I can’t imagine why you wouldn’t wish to see bulk email campaigns for unsolicited advertising or disruptive (phishing, virusses, etc) purposes disappear from our lives. That’s what the discussion is about, and it hasn’t ended, it barely begun.

MarthinL · April 11, 2024, 8:32am

Thank you, I’ll probably do that and hope you’d voice your support for what I’m proposing when I do.

Since you’re likely most familiar with the numbers pertaining to MiaB, what percentage of your user base (counting administrators rather than the users and mailboxes they manage) are active enough on this forum to be reached through such a post? i.e. if I manage to get consensus on this forum from those who’d read and respond to something I post, how much work would be left to get all MiaB users to agree to support what I propose?

MarthinL · April 11, 2024, 9:48am

Case in point, i quote from the source of an email I keep getting, keep listing as spam and keep blocking sender.

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on box.xxxxxxx
X-Spam-Level: ***
X-Spam-Status: No, score=3.0 required=5.0 tests=DMARC_NONE,FSL_BULK_SIG,
HTML_IMAGE_ONLY_16,HTML_IMAGE_RATIO_02,HTML_MESSAGE,PYZOR_CHECK,
SPF_HELO_NONE,SPF_PASS,T_TVD_MIME_EPI autolearn=no autolearn_force=no
version=3.4.6
X-Spam-Report:
* -0.1 SPF_PASS SPF check passed
* 0.1 DMARC_NONE DMARC record not found
* 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
* 0.0 T_TVD_MIME_EPI BODY: No description available.
* 0.0 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image
* area
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 1.0 HTML_IMAGE_ONLY_16 BODY: HTML: images with 1200-1600 bytes of
* words
* 2.0 PYZOR_CHECK Listed in Pyzor
* (Welcome to Pyzor’s documentation! — Pyzor 1.0 documentation)
* 0.0 FSL_BULK_SIG Bulk signature with no Unsubscribe
X-Spam-Score: 3.0

I’ve so many questions and issues with that, but for the moment let’s just consider this:

I have no doubt whatsoever that this is an unsolicited and unwanted email. I’ve received the same message hundreds of times. It is listed in Pyzor and recognised as a bulk email with no unsubscribe. It fails the most basic SPF and DMARC tests. Yet somehow it manages to accumulate a spam score that allows it through every filter we’ve put up, every time.
How it comes about that such dead giveaways result in +0.0 spam scores is the million dollar question. Specifically, is SpamAssassin somehow complicitly funded by spam facilitators or do they find themselves defenceless against the bureauchracy and the social justice lobby forcing them to compromise on their intended standards?
Today’s bulk mail providers offering email campaigns (spam) are resourceful and influential. They can fend for themselves and don’t need our protection or sympathy. What we need is a way to enable ourselves so as a community of people we can be as resourceful and influential as we need to be to balance out these companies who think abusing us is their birthright.

vele · April 11, 2024, 10:30am

Now you can block the email address forever with a single click. No need to waste time. Remember the old days of junk mail that you get in your actual house mailbox. How did you fend yourself against that? And all those salesmen selling you encyclopedias door-to-door?

MarthinL · April 11, 2024, 11:50am

In what universe does that still work? You know as well as I do that the spammers are not stupid enough to send a second spam message from the same address. Really?

I do remember those, yes, addressed to “Home Owner” or “Occupant” etc. They don’t happen anymore for two reasons. Email campaigns took over as cheaper and it becamse very clear to those who’d still try to use those snail mail options that they all go straight to the bin.

Spam will also endure for as long as they make sense to the sender, i.e. as long as some of them gets through. Now we have few actual options (seeing that they are clearly clever enough to never use the same source address and make any individual mail look legit enough for the filters): We could just accept it as an unavoidable nuisance. We could lobby the powers that be to better enforce the laws they already made against spam. We could stir up a global public outcry against spam. Or, by my suggestion, we can form a community of people with similar interests and take action against spam for ourselves and anyone else who feel about it the same way we do. It won’t break the international spam providers, not yet anyway, but it will be effective within the confines of our collective. Perhaps in time the collective could grow to such proportions that it does start to render the bulk email providers selling spam campaigns so ineffective they’re forced to reconsider their harmful ways, but that possibility is a long way down the track and not really in focus now.

Still, I’m by no means trying to convince anyone to see spam as more annoying than they already do. If you are able and willing to live with whatever spam the world at large throws at you, fine, just say so. But if you are keen on radically reducing the amount of spam stealing your time, then please realise that there are measures we can take as a relatively homogeonous group of email admins and users that can make the problem go away for us. Those outside our circle of MiaB users are not our concern at the moment.

JoshData · April 11, 2024, 6:39pm

It’s hard to know how many Mail-in-a-Boxes are in production and actively used (excluding malicious/spam uses, of which there are probably many). Probably thousands, or more. Shodan.io says there are 26,000 currently running. I don’t know how reliable that is or how many are not spammers and are truly active, and some many be maintained by the same administrator.

We probably have about a hundred people participating in the forum in some capacity. The average daily engaged users is… 7.

So, not much, as a percent.

Fezzy · April 12, 2024, 8:37pm

Great thread and conversation. I too have wondered about the efficacy of Postgrey and personally been frustrated by the wait time for expected emails. I’ll usually click the “resend” link to hasten delivery.

This thread has given me a couple ideas, especially re: individual prefix whitelisting. Lately I’ve been on a kick to just create filters on the round cube webmail interface for repeated list serves / newsletters I can’t seem to unsubscribe from, or unwanted spam. It’s annoying to create the filters but what else to do? Especially for all those 1x spammer email addresses knowing it’s mostly wasted effort. However, just this focused effort for the past couple of weeks has significantly cleaned up my inbox.

MarthinL · April 13, 2024, 9:19am

Seems like someone did (tbaker@bakerfl.org, 14 years ago!), and it is being installed with postgrey as the command (perl script) postgreyreport.

It does run (provided you give the correct path to the database (–dbdir=/home/user-data/mail/postgrey/db) but I can’t seem to get the results I’m looking for.

I’ve also manually processed (the combined current and the previous versions of) mail.log using awk, grep and Excel.

My findings from all three analysis paths support each other, yet they don’t add up.

Looking through the list of emails that postgrey supposedly blocked on the basis we’ve been discussing (i.e. for which there is an action=greylist, reason=new log entry but no action=pass, reason=triplet found log entry) I recognise many of the entries destined for mailboxes I monitor as mails that I’ve seen come through anyway. Several of those are from known sources I would not consider spam.

I don’t yet know how to correctly interpret the results I’m seeing. It could be that postgrey no longer works in the original way, or that it had been compromised at some point or most likely that what the spam enablers are doing today habitually defeats or confuses postgrey. Either way postgrey’s “own” analysis overstates it’s actual efficacy by counting emails that came through anyway as having been sucessfully blocked (the term used in the report script is fatally rejected).

If anyone would like to check their own systems and report back we might get a better picture of what is happening. The command I’ve used so far to see what postgrey supposedly blocked was:

Firstly, to get all the old and current mail.log file entries in one place,

sudo zcat -f /var/log/mail.log.4.gz > /tmp/mail.log
sudo zcat -f /var/log/mail.log.3.gz >> /tmp/mail.log
sudo zcat -f /var/log/mail.log.2.gz >> /tmp/mail.log
sudo cat /var/log/mail.log.1 >> /tmp/mail.log
sudo cat /var/log/mail.log >> /tmp/mail.log

Depending on how long your system have been running, the first few commands might fail because those files don’t exist yet,

and then,

sudo postgreyreport --show_tries --dbdir=/home/user-data/mail/postgrey/db < /tmp/mail.log | grep ^1

P.S.
I’ve tried to contact David Scheikert (postgrey) but haven’t heard back from him.

vele · April 13, 2024, 10:42am

Since March 17 my logs show out of TOTAL 60, 45 are real spam messages and 15 are false possitives.
How do I check if the false possitives ended up in either junk or Inbox. See the excel file I sent you in a private message? I replaced the final recipent with XX@somedomain.com