What is SPAM?
SPAM is basically unsolicited mail which arrives in your Inbox and can often contain offensive content, crude promises and generally costs the Internet community millions every year. Only a few years ago, email marketing was seen as a low cost and very effective marketing tool but in reality it turned out to be something quite different. Today, SPAM is seen as one of the largest problems on the Internet, second only to viruses. Ocean Mail Server provides several means of reducing SPAM. These are described below.
Real-time Black Lists (RBLs)
Real-time black lists are a simple but effective method of SPAM filtering. The real-time black lists are publicly accessible lists of know bad IPs (usually IPs that have been associated with SPAM and unsolicited bulk email (UBE) in the past). By knowing which IPs are black listed in advance mail can be flagged for further content filtering scrutiny or even rejected before it even enters the mail server.
Sender Policy Framework (SPF)
The sender policy framework is a relatively new SPAM identification technology. Domains can publish SPF records which describe which IPs are allowed to send mail from email addresses on that domain. Ocean Mail Server retrieves the SPF record for the domain the mail claims to be coming from and verifies whether the IP is authorized to send mail for that domain. If not authorized the mail can be rejected or other action taken accordingly depending on the severity of the SPF result.
Bayesian Filtering
Bayesian filtering is probably one of the best anti-SPAM technologies currently available. The system works by learning from mail passing through your mail server to help determine the probability of future mails being SPAM. Once trained, it is possible to catch over 99% of all SPAM with less than 0.1% false positives (non-SPAM mails filtered as SPAM). The more mails the Bayesian filter trains from, the more accurate it will become. We recommend that you train from at least 1000 of each type (non-SPAM and SPAM) if possible. The Bayesian filter can be trained either manually or automatically.
Manual Training
If you already have some non-SPAM and SPAM mails available in your user accounts you can use these to train the Bayesian filter. All you need to do is sort them into 2 folders (non-SPAM and SPAM) and then feed these into the Bayesian filter for training via the admin interface. You can also specify a physical directory that contains non-SPAM or SPAM mails to train the Bayesian filter. The difference between training from user directories and physical directories is that when training from user directories, the mails are flagged to show that they have been learnt from to avoid duplicate training of those same mails in the future. Therefore, manual training from user directories is preferred to manual training from physical directories. Another good idea to improve training is to provide an administrative user account to which other users can send SPAM mails that the receive that got through the SPAM filtering systems. This allows the administrator to check through these mails and trigger a manual training of the Bayesian filter to help avoid these kind of mails getting through in the future.
Auto-Training
It is possible to automatically train the Bayesian filter based on the score given to each mail as it passes through the filter. If the mail scores higher than the auto-train SPAM threshold it can be automatically trained from as a SPAM mail. Similarly, if the mail scores lower than the auto-train non-SPAM threshold it can be automatically trained from as a non-SPAM mail. Automatic training is often the more preferable training method as it allows you to simply let the Bayesian train from mails over time without any human intervention. As the Bayesian database grows, the filter will become more accurate at determining the scores of mails passing through. For this reason, auto-training will be applied to an increasing percentage of mails passing through the system over time.
Content Filtering Auto-Training
Another method of auto-training is to use content filtering to train the Bayesian filter. There are 2 preset content filter rules available for Bayesian training which are very effective. One preset uses outgoing mail as a source of non-SPAM mails to train the bayesian database. The other preset rule uses other SPAM systems (RBLs and SPF) to identify mails as being SPAM and trains the bayesian filter with these mails appropriately. For these content filter rules to work you need to configure RBLs, SPF and Bayesian filtering to set the SPAM flag and/or set custom events instead of rejecting the mail to allow the mail to reach the content filtering part of the mail server. After content filtering auto-training, you can set up content filtering to delete the mail or redirect as necessary. It is advisable to store non-SPAM and SPAM emails in directories to be used later to retrain the bayesian database if needed. We also recommend that you make regular backups of your Bayesian database along with your configuration and account data via the mail server's export facility so that these are restorable if lost due to hard drive failure or other problems. As long as your database is safely backed up, the mails used to train it can be safely discarded if needed. |