## $Id: dspam.conf.in,v 1.63 2005/11/25 17:47:04 jonz Exp $ ## dspam.conf -- DSPAM configuration file ## # # DSPAM Home: Specifies the base directory to be used for DSPAM storage # Home /opt/zimbra/data/dspam # # StorageDriver: Specifies the storage driver backend (library) to use. # You'll only need to set this if you are using dynamic storage driver plugins. # The default when one storage driver is specified is to statically link. Be # sure to include the path to the library if necessary, and some systems may # use an extension other than .so. # # Options include: # # libmysql_drv.so libpgsql_drv.so libsqlite_drv.so # libsqlite3_drv.so libora_drv.so libdb4_drv.so # libdb3_drv.so libhash_drv.so # # IMPORTANT: Switching storage drivers requires more than merely changing # this option. If you do not wish to lose all of your data, you will need to # migrate it to the new backend before making this change. # StorageDriver /opt/zimbra/dspam/lib/dspam/libhash_drv.so # # Trusted Delivery Agent: Specifies the local delivery agent DSPAM should call # when delivering mail as a trusted user. Use %u to specify the user DSPAM is # processing mail for. It is generally a good idea to allow the MTA to specify # the pass-through arguments at run-time, but they may also be specified here. # # Most operating system defaults: #TrustedDeliveryAgent "/usr/bin/procmail" # Linux #TrustedDeliveryAgent "/usr/bin/mail" # Solaris #TrustedDeliveryAgent "/usr/libexec/mail.local" # FreeBSD #TrustedDeliveryAgent "/usr/bin/procmail" # Cygwin # # Other popular configurations: #TrustedDeliveryAgent "/usr/cyrus/bin/deliver" # Cyrus #TrustedDeliveryAgent "/bin/maildrop" # Maildrop #TrustedDeliveryAgent "/usr/local/sbin/exim -oMr spam-scanned" # Exim # TrustedDeliveryAgent "/usr/bin/procmail" # # Untrusted Delivery Agent: Specifies the local delivery agent and arguments # DSPAM should use when delivering mail and running in untrusted user mode. # Because DSPAM will not allow pass-through arguments to be specified to # untrusted users, all arguments should be specified here. Use %u to specify # the user DSPAM is processing mail for. This configuration parameter is only # necessary if you plan on allowing untrusted processing. # #UntrustedDeliveryAgent "/usr/bin/procmail -d %u" # # SMTP or LMTP Delivery: Alternatively, you may wish to use SMTP or LMTP # delivery to deliver your message to the mail server. You will need to # configure with --enable-daemon to use host delivery, however you do not need # to operate in daemon mode. Specify an IP address or UNIX path to a domain # socket below as a host. # #DeliveryHost 127.0.0.1 #DeliveryPort 24 #DeliveryIdent localhost #DeliveryProto LMTP # # Quarantine Agent: DSPAM's default behavior is to quarantine all mail it # thinks is spam. If you wish to override this behavior, you may specify # a quarantine agent which will be called with all messages DSPAM thinks is # spam. Use %u to specify the user DSPAM is processing mail for. # #QuarantineAgent "/usr/bin/procmail -d spam" # # DSPAM can optionally process "plused users" (addresses in the user+detail # form) by truncating the username just before the "+", so all internal # processing occurs for "user", but delivery will be performed for # "user+detail". This is only useful if the LDA can handle "plused users" # (for example Cyrus IMAP) and when configured for LMTP delivery above # # NOTE: Plused detail presently only works when usernames are provided and # not fully qualified email address (@domain). # #EnablePlusedDetail on # # Quarantine Mailbox: DSPAM's LMTP code can send spam mail using LMTP to a # "plused" mailbox (such as user+quarantine) leaving quarantine processing # for retraining or deletion to be performed by the LDA and the mail client. # "plused" mailboxes are supported by Cyrus IMAP and possibly other LDAs. # The mailbox name must have the + # #QuarantineMailbox +quarantine # # OnFail: What to do if local delivery or quarantine should fail. If set # to "unlearn", DSPAM will unlearn the message prior to exiting with an # un successful return code. The default option, "error" will not unlearn # the message but return the appropriate error code. The unlearn option # is use-ful on some systems where local delivery failures will cause the # message to be requeued for delivery, and could result in the message # being processed multiple times. During a very large failure, however, # this could cause a significant load increase. # OnFail error # Trusted Users: Only the users specified below will be allowed to perform # administrative functions in DSPAM such as setting the active user and # accessing tools. All other users attempting to run DSPAM will be restricted; # their uids will be forced to match the active username and they will not be # able to specify delivery agent privileges or use tools. # Trust root Trust zimbra #Trust nobody #Trust majordomo # # Debugging: Enables debugging for some or all users. IMPORTANT: DSPAM must # be compiled with debug support in order to use this option. DSPAM should # never be running in production with debug active unless you are # troubleshooting problems. # # DebugOpt: One or more of: process, classify, spam, fp, inoculation, corpus # process standard message processing # classify message classification using --classify # spam error correction of missed spam # fp error correction of false positives # inoculation message inoculations (source=inoculation) # corpus corpusfed messages (source=corpus) # #Debug * #Debug bob bill # #DebugOpt process spam fp # # Training Mode: The default training mode to use for all operations, when # one has not been specified on the commandline or in the user's preferences. # Acceptable values are: toe, tum, teft, notrain # TrainingMode teft # # TestConditionalTraining: By default, dspam will retrain certain errors # until the condition is no longer met. This usually accelerates learning. # Some people argue that this can increase the risk of errors, however. # TestConditionalTraining on # # Features: Specify features to activate by default; can also be specified # on the commandline. See the documentation for a list of available features. # If _any_ features are specified on the commandline, these are ignored. # # NOTE: For standard "CRM114" Markovian weighting, use sbph # #Feature sbph #Feature noise Feature tb=5 Feature whitelist # # Algorithms: Specify the statistical algorithms to use, overriding any # defaults configured in the build. The options are: # naive Naive-Bayesian (All Tokens) # graham Graham-Bayesian ("A Plan for Spam") # burton Burton-Bayesian (SpamProbe) # robinson Robinson's Geometric Mean Test (Obsolete) # chi-square Fisher-Robinson's Chi-Square Algorithm # # You may have multiple algorithms active simultaneously, but it is strongly # recommended that you group Bayesian algorithms with other Bayesian # algorithms, and any use of Chi-Square remain exclusive. # # NOTE: For standard "CRM114" Markovian weighting, use 'naive', or consider # using 'burton' for slightly better accuracy # # Don't mess with this unless you know what you're doing # #Algorithm chi-square #Algorithm naive Algorithm graham burton # # Tokenizer: Specify the tokenizer to use. The tokenizer is the piece # responsible for parsing the message into individual tokens. Depending on # how many resources you are willing to trade off vs. accuracy, you may # choose to use a less or more detailed tokenizer: # word uniGram (single word) tokenizer # Tokenizes message into single individual words/tokens # example: "free" and "viagra" # chain biGram (chained tokens) tokenizer (default) # Single words + chains adjacent tokens together # example: "free" and "viagra" and "free viagra" # sbph Sparse Binary Polynomial Hashing tokenizer # Creates sparse token patterns across sliding window of 5-tokens # example: "the quick * fox jumped" and "the * * fox jumped" # osb Orthogonal Sparse biGram # Similar to SBPH, but only uses the biGrams # example: "the * * fox" and "the * * * jumped" # Tokenizer chain # # PValue: Specify the technique used for calculating PValues, overriding any # defaults configured in the build. These options are: # graham Graham's Technique ("A Plan for Spam") # robinson Robinson's Technique # markov Markovian Weighted Technique # # Unlike algorithms, you may only have one of these defined. Use of the # chi-square algorithm automatically changes this to robinson. # # Don't mess with this unless you know what you're doing. # #PValue robinson #PValue markov PValue graham # # WebStats: Enable this if you are using the CGI, which writes .stats files WebStats off # # ImprobabilityDrive: Calculate odds-ratios for ham/spam, and add to # X-DSPAM-Improbability headers #ImprobabilityDrive on # # Preferences: Specify any preferences to set by default, unless otherwise # overridden by the user (see next section) or a default.prefs file. # If user or default.prefs are found, the user's preferences will override any # defaults. # #Preference "spamAction=quarantine" Preference "signatureLocation=headers" # 'message' or 'headers' Preference "showFactors=on" Preference "spamAction=tag" #Preference "spamSubject=SPAM" # # Overrides: Specifies the user preferences which may override configuration # and commandline defaults. Any other preferences supplied by an untrusted user # will be ignored. # AllowOverride trainingMode AllowOverride spamAction spamSubject AllowOverride statisticalSedation AllowOverride enableBNR AllowOverride enableWhitelist AllowOverride signatureLocation AllowOverride showFactors AllowOverride optIn optOut AllowOverride whitelistThreshold # --- MySQL --- # # Storage driver settings: Specific to a particular storage driver. Uncomment # the configuration specific to your installation, if applicable. # #MySQLServer /var/lib/mysql/mysql.sock #MySQLPort #MySQLUser dspam #MySQLPass changeme #MySQLDb dspam #MySQLCompress true # Use this if you have the 4.1 quote bug (see doc/mysql.txt) #MySQLSupressQuote on # If you're running DSPAM in client/server (daemon) mode, uncomment the # setting below to override the default connection cache size (the number # of connections the server pools between all clients). The connection cache # represents the maximum number of database connections *available* and should # be set based on the maximum number of concurrent connections you're likely # to have. Each connection may be used by only one thread at a time, so all # other threads _will block_ until another connection becomes available. # #MySQLConnectionCache 10 # If you're using vpopmail or some other type of virtual setup and wish to # change the table dspam uses to perform username/uid lookups, you can over- # ride it below #MySQLVirtualTable dspam_virtual_uids #MySQLVirtualUIDField uid #MySQLVirtualUsernameField username # UIDInSignature: MySQL supports the insertion of the user id into the DSPAM # signature. This allows you to create one single spam or fp alias # (pointing to some arbitrary user), and the uid in the signature will # switch to the correct user. Result: you need only one spam alias #MySQLUIDInSignature on # --- PostgreSQL --- #PgSQLServer 127.0.0.1 #PgSQLPort 5432 #PgSQLUser dspam #PgSQLPass changeme #PgSQLDb dspam # If you're running DSPAM in client/server (daemon) mode, uncomment the # setting below to override the default connection cache size (the number # of connections the server pools between all clients). # #PgSQLConnectionCache 3 # UIDInSignature: PgSQL supports the insertion of the user id into the DSPAM # signature. This allows you to create one single spam or fp alias # (pointing to some arbitrary user), and the uid in the signature will # switch to the correct user. Result: you need only one spam alias #PgSQLUIDInSignature on # If you're using vpopmail or some other type of virtual setup and wish to # change the table dspam uses to perform username/uid lookups, you can over- # ride it below #PgSQLVirtualTable dspam_virtual_uids #PgSQLVirtualUIDField uid #PgSQLVirtualUsernameField username # --- Oracle --- #OraServer "(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=127.0.0.1)(PORT=1521))(CONNECT_DATA=(SID=PROD)))" #OraUser dspam #OraPass changeme #OraSchema dspam # --- SQLite --- #SQLitePragma "synchronous = OFF" # --- Hash --- # # HashRecMax: Default number of records to create in the initial segment when # building hash files. 100,000 yields files 1.6MB in size, but can fill up # fast, so be sure to increase this (to a million or more) if you're not using # autoextend. # # NOTE: If you're using a heavy-weight tokenizer, such as SBPH, you should be # looking for settings in the 'millions' of records. # # Primes List: # 53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157, 98317, 196613, # 393241, 786433, 1572869, 3145739, 6291469, 12582917, 25165843, 50331653, # 100663319, 201326611, 402653189, 805306457, 1610612741, 3221225473, # 4294967291 # HashRecMax 98317 # HashAutoExtend: Autoextend hash databases when they fill up. This allows # them to continue to train by adding extents (extensions) to the file. There # will be a small delay during the growth process, as everything needs to be # closed and remapped. # HashAutoExtend on # HashMaxExtents: The maximum number of extents that may be created in a single # hash file. Set this to zero for unlimited # HashMaxExtents 0 # # HashExtentSize: The initial record size for newly created extents. Creating # this too small could result in many extents being created. Creating this too # large could result in excessive disk space usage. Typically, a value close # to half of the HashRecMax size is good. # HashExtentSize 49157 # # HashPctIncrease: Increase the next extent size by n% from the size of the # last extent. This is useful in accommodating systems where the default # HashExtentSize can be too small for certain high-volume users, and can also # help keep seeks nice and speedy and/or prevent too many unnecessary extents # from being created when using a low HashMaxSeek. The default behavior, when # HashPctIncrease is not used, is to always use # HashExtentSize with no # increase. # HashPctIncrease 10 # HashMaxSeek: The maximum number of records to seek to insert a new record # before failing or adding a new extent. Setting this too high will exhaustively # scan each segment and kill performance. Typically, a low value is acceptable # as even older extents will continue to fill over time. # HashMaxSeek 100 # HashConcurrentUser: If you are using a single, stateful hash database in # daemon mode, specifying a concurrent user will cause the user to be # permanently mapped into memory and shared via rwlocks. # #HashConcurrentUser user # HashConnectionCache: If running in daemon mode, this is the max # of # concurrent connections that will be supported. NOTE: If you are using # HashConcurrentUser, this option is ignored, as all connections are read- # write locked instead of mutex locked. HashConnectionCache 10 # LDAP: Perform various LDAP functions depending on LDAPMode variable. # Presently, the only mode supported is 'verify', which will verify the existence # of an unknown user in LDAP prior to creating them as a new user in the system. # This is useful on some systems acting as gateway machines. # #LDAPMode verify #LDAPHost ldaphost.mydomain.com #LDAPFilter "(mail=%u)" #LDAPBase ou=people,dc=domain,dc=com # Optionally, you can specify storage profiles, and specify the server to # use on the commandline with --profile. For example: # #Profile DECAlpha #MySQLServer.DECAlpha 10.0.0.1 #MySQLPort.DECAlpha 3306 #MySQLUser.DECAlpha dspam #MySQLPass.DECAlpha changeme #MySQLDb.DECAlpha dspam #MySQLCompress.DECAlpha true # #Profile Sun420R #MySQLServer.Sun420R 10.0.0.2 #MySQLPort.Sun420R 3306 #MySQLUser.Sun420R dspam #MySQLPass.Sun420R changeme #MySQLDb.Sun420R dspam #MySQLCompress.Sun420R false # #DefaultProfile DECAlpha # # If you're using storage profiles, you can set failovers for each profile. # Of course, if you'll be failing over to another database, that database # must have the same information as the first. If you're using a global # database with no training, this should be relatively simple. If you're # configuring per-user data, however, you'll need to set up some type of # replication between databases. # #Failover.DECAlpha SUN420R #Failover.Sun420R DECAlpha # If the storage fails, the agent will follow each profile's failover up to # a maximum number of failover attempts. This should be set to a maximum of # the number of profiles you have, otherwise the agent could loop and try # the same profile multiple times (unless this is your desired behavior). # #FailoverAttempts 1 # # Ignored headers: If DSPAM is behind other tools which may add a header to # incoming emails, it may be beneficial to ignore these headers - especially # if they are coming from another spam filter. If you are _not_ using one of # these tools, however, leaving the appropriate headers commented out will # allow DSPAM to use them as telltale signs of forged email. # #IgnoreHeader X-Spam-Status #IgnoreHeader X-Spam-Scanned #IgnoreHeader X-Virus-Scanner-Result # # Lookup: Perform lookups on streamlined blackhole list servers (see # http://www.nuclearelephant.com/projects/sbl/). The streamlined blacklist # server is machine-automated, unsupervised blacklisting system designed to # provide real-time and highly accurate blacklisting based on network spread. # When performing a lookup, DSPAM will automatically learn the inbound message # as spam if the source IP is listed. Until an official public RABL server is # available, this feature is only useful if you are running your own # streamlined blackhole list server for internal reporting among multiple mail # servers. Provide the name of the lookup zone below to use. # # This function performs standard reverse-octet.domain lookups, and while it # will function with many RBLs, it's strongly discouraged to use those # maintained by humans as they're often inaccurate and could hurt filter # learning and accuracy. # #Lookup "sbl.yourdomain.com" # # RBLInoculate: If you want to inoculate the user from RBL'd messages it would # have otherwise missed, set this to on. # #RBLInoculate off # # Notifications: Enable the sending of notification emails to users (first # message, quarantine full, etc.) # Notifications off # # Purge configuration: Set dspam_clean purge default options, if not otherwise # specified on the commandline # PurgeSignatures 14 # Stale signatures PurgeNeutral 90 # Tokens with neutralish probabilities PurgeUnused 90 # Unused tokens PurgeHapaxes 30 # Tokens with less than 5 hits (hapaxes) PurgeHits1S 15 # Tokens with only 1 spam hit PurgeHits1I 15 # Tokens with only 1 innocent hit # # Purge configuration for SQL-based installations using purge.sql # #PurgeSignature off # Specified in purge.sql #PurgeNeutral 90 #PurgeUnused off # Specified in purge.sql #PurgeHapaxes off # Specified in purge.sql #PurgeHits1S off # Specified in purge.sql #PurgeHits1I off # Specified in purge.sql # # Local Mail Exchangers: Used for source address tracking, tells DSPAM which # mail exchangers are local and therefore should be ignored in the Received: # header when tracking the source of an email. Note: you should use the address # of the host as appears between brackets [ ] in the Received header. # LocalMX %%zimbraLocalBindAddress%% # # Logging: Disabling logging for users will make usage graphs unavailable to # them. Disabling system logging will make admin graphs unavailable. # SystemLog on UserLog on # # TrainPristine: for systems where the original message remains server side # and can therefore be presented in pristine format for retraining. This option # will cause DSPAM to cease all writing of signatures and DSPAM headers to the # message, and deliver the message in as pristine format as possible. This mode # REQUIRES that the original message in its pristine format (as of delivery) # be presented for retraining, as in the case of webmail, imap, or other # applications where the message is actually kept server-side during reading, # and is preserved. DO NOT use this switch unless the original message can be # presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS. # #TrainPristine on # # Opt: in or out; determines DSPAM's default filtering behavior. If this value # is set to in, users must opt-in to filtering by dropping a .dspam file in # /var/dspam/opt-in/user.dspam (or if you have homedirs configured, a .dspam # folder in their home directory). The default is opt-out, which means all # users will be filtered unless a .nodspam file is dropped in # /var/dspam/opt-out/user.nodspam # Opt out # # TrackSources: specify which (if any) source addresses to track and report # them to syslog (mail.info). This is useful if you're running a firewall or # blacklist and would like to use this information. Spam reporting also drops # RABL blacklist files (see http://www.nuclearelephant.com/projects/rabl/). # #TrackSources spam nonspam # # ParseToHeaders: In lieu of setting up individual aliases for each user, # DSPAM can be configured to automatically parse the To: address for spam and # false positive forwards. From there, it can be configured to either set the # DSPAM user based on the username specified in the header and/or change the # training class and source accordingly. The options below can be used to # customize most common types of header parsing behavior to avoid the need for # multiple aliases, or if using LMTP, aliases entirely.. # # ParseToHeader: Parse the To: headers of an incoming message. This must be # set to 'on' to use either of the following features. # # ChangeModeOnParse: Automatically change the class (to spam or innocent) # depending on whether spam- or notspam- was specified, and change the source # to 'error'. This is convenient if you're not using aliases at all, but # are delivering via LMTP. # # ChangeUserOnParse: Automatically change the username to match that specified # in the To: header. For example, spam-bob@domain.tld will set the username # to bob, ignoring any --user passed in. This may not always be desirable if # you are using virtual email addresses as usernames. Options: # on or user take the portion before the @ sign only # full take everything after the initial {spam,notspam}-. # #ParseToHeaders on #ChangeModeOnParse on #ChangeUserOnParse on # # Broken MTA Options: Some MTAs don't support the proper functionality # necessary. In these cases you can activate certain features in DSPAM to # compensate. 'returnCodes' causes DSPAM to return an exit code of 99 if # the message is spam, 0 if not, or a negative code if an error has occured. # Specifying 'case' causes DSPAM to force the input usernames to lowercase. # Spceifying 'lineStripping' causes DSPAM to strip ^M's from messages passed # in. # #Broken returnCodes #Broken case #Broken lineStripping # # MaxMessageSize: You may specify a maximum message size for DSPAM to process. # If the message is larger than the maximum size, it will be delivered # without processing. Value is in bytes. # #MaxMessageSize 4194304 # # Virus Checking: If you are running clamd, DSPAM can perform stream-based # virus checking using TCP. Uncomment the values below to enable virus # checking. # # ClamAVResponse: reject (reject or drop the message with a permanent failure) # accept (accept the message and quietly drop the message) # spam (treat as spam and quarantine/tag/whatever) # #ClamAVPort 3310 #ClamAVHost 127.0.0.1 #ClamAVResponse accept # # Daemonized Server: If you are running DSPAM as a daemonized server using # --daemon, the following parameters will override the default. Use the # ServerPass option to set up accounts for each client machine. The DSPAM # server will process and deliver the message based on the parameters # specified. If you want the client machine to perform delivery, use # the --stdout option in conjunction with a local setup. # #ServerPort 24 #ServerQueueSize 32 #ServerPID /var/run/dspam.pid # # ServerMode specifies the type of LMTP server to start. This can be one of: # dspam: DSPAM-proprietary DLMTP server, for communicating with dspamc # standard: Standard LMTP server, for communicating with Postfix or other MTA # auto: Speak both DLMTP and LMTP; auto-detect by ServerPass.IDENT # #ServerMode dspam # If supporting DLMTP (dspam) mode, dspam clients will require authentication # as they will be passing in parameters. The idents below will be used to # determine which clients will be speaking DLMTP, so if you will be using # both LMTP and DLMTP from the same host, be sure to use something other # than the server's hostname below (which will be sent by the MTA during a # standard LMTP LHLO). # #ServerPass.Relay1 "secret" #ServerPass.Relay2 "password" # If supporting standard LMTP mode, server parameters will need to be specified # here, as they will not be passed in by the mail server. The ServerIdent # specifies the 250 response code ident sent back to connecting clients and # should be set to the hostname of your server, or an alias. # # NOTE: If you specify --user in ServerParameters, the RCPT TO will be # used only for delivery, and not set as the active user for processing. # #ServerParameters "--deliver=innocent -d %u" #ServerIdent "localhost.localdomain" # If you wish to use a local domain socket instead of a TCP socket, uncomment # the following. It is strongly recommended you use local domain sockets if # you are running the client and server on the same machine, as it eliminates # much of the bandwidth overhead. # #ServerDomainSocketPath "/tmp/dspam.sock" # # Client Mode: If you are running DSPAM in client/server mode, uncomment and # set these variables. A ClientHost beginning with a / will be treated as # a domain socket. # #ClientHost /tmp/dspam.sock #ClientIdent "secret@Relay1" # #ClientHost 127.0.0.1 #ClientPort 24 #ClientIdent "secret@Relay1" # RABLQueue: Touch files in the RABL queue # If you are a reporting streamlined blackhole list participant, you can # touch ip addresses within the directory the rabl_client process is watching. # #RABLQueue /var/spool/rabl # DataSource: If you are using any type of data source that does not include # email-like headers (such as documents), uncomment the line below. This # will cause the entire input to be treated like a message "body" # #DataSource document # ProcessorWordFrequency: By default, words are only counted once per message. # If you are classifying large documents, however, you may wish to count once # per occurrence instead. # #ProcessorWordFrequency occurrence # ProcessorURLContext: By default, a URL context is generated for URLs, which # records their tokens as separate from words found in documents. To use # URL tokens in the same context as words, turn this feature off. # ProcessorURLContext on # ProcessorBias: Bias causes the filter to lean more toward 'innocent', and # usually greatly reduces false positives. It is the default behavior of # most Bayesian filters (including dspam). # # NOTE: You probably DONT want this if you're using Markovian Weighting, unless # you are paranoid about false positives. # ProcessorBias on ## EOF