Spam Confidence Level

Topic Last Modified: 2006-06-12

The spam confidence level (SCL) is the normalized value assigned to a message that indicates, based on the characteristics of a message (such as the content, message header, and so forth), the likelihood that the message is spam.

SCL Values

There are eleven values available to categorize spam, as outlined in the following table.

SCL Value Spam Categorization

-1

Reserved by Microsoft® Exchange Server 2003 for messages submitted internally. A value of -1 should not be overwritten because it is this value that is used to eliminate false positives for internally-submitted e-mail.

0

Assigned to messages that are not spam.

1to9

Extremely low likelihood that the message is spam....ranging to...Extremely high likelihood that the message is spam.

This array allows you to choose how aggressive or conservative you want your spam filtering to be by selecting a threshold value above which you consider a message to be spam. If you want to aggressively filter spam, you can choose a fairly low threshold, such as an SCL value of 5, which would catch a higher number of spam messages. However, a higher number of false positives would also be caught. To filter spam more conservatively, you can choose a higher threshold, such as an SCL value of 8, which would catch fewer spam messages, with a lower number of false positives being caught.

Normalization

Spam filtering algorithms assign spam ratings, scores, or probabilities to messages. This value is referred to as the algorithm's raw score. The raw scores are then normalized to a set of standard SCL values and assigned to a message by the spam filtering algorithm. Raw scores are normalized to a set of standard SCL values for the following reasons:

  • Configuration settings for the handling of spam are based on the SCL value. Actions performed on messages will typically be determined by thresholds, for example, "move all messages with an SCL value greater than x to the Junk E-mail folder."
  • As algorithms evolve, the raw scores they produce may change in meaning. Normalizing the raw scores ensures that the user experience stays relatively constant throughout the evolution of an algorithm.
  • Developers will create different spam filtering algorithms that will distinctly assign raw scores. Normalizing these varying raw scores will present a standard value to the end user.

Mapping

Because different filters will have unique methods of rating messages, the precise mapping of a raw score to an SCL value will vary. The following are general guidelines for mapping raw scores to SCL values:

  • Binary results. If an algorithm produces a binary result where the message is determined to be either not spam or spam, a rating of 0 should be used for not spam and a rating of 9 for spam.
  • Distribution. If an algorithm produces a distribution of raw scores, a rating of 0 should be assigned to messages determined to not be spam. The remaining raw scores should be mapped in the range of 1 to 9, determined by the probability that the message is spam.

Multiple Content Filters

Multiple content filters can be installed on a single server. When multiple filters are implemented to rate a message, it is possible that one filter may block the message, preventing the other filters from seeing it. If the message is not blocked, it is stamped with an SCL value and sent on. Because there is only a single SCL property on a message, the last filter to run generally controls the setting of the SCL property, which, in turn, affects the Store Action Threshold. To make the setting of the SCL value deterministic, it is recommended that the SCL value not be set if it has already been set to a higher value in the same range by an earlier filter. A value of -1 should not be overwritten.