Wednesday, September 22, 2004

Suri, a Spamvertised URIs filter using SURBL

Image Hosted by ImageShack.usI am making a pair of postfix (and related) tools in perl that could be useful for some people, in some cases.

One is a "SURBL" technique filter to be plugged into amavis, that will check for spamvertised URI's against a "SURBL" server. It acts as an antivirus, checking for the content of the message. If it's configuerd for denial, it could lead to false positives, if the SURBL list in used is not very precise.

It was written based on a qpsmtpd plugin developed by Devin Carraway.

The other, which is in early stages of development, has the same objective, but is supposed to be used as a transparent (or not) SMTP proxy for postfix. Messages will be filtered and content will be DENIED in real time, so the sender will know the message was not delivered. Spammers don't care about smtp error codes, and real senders will be notified of the error.

I am aware that using transparent proxy is a bad idea in very loaded servers, so I am making different tools for different needs.

These tools aren't of much use to most people, they're mostly a lab for learning perl. Thus, I am not willing to open a project at sourceforge just for them. So, I'll just paste the code them here. If this tools proves itself useful, please leave a feedback comment. Thanks.

#!/usr/bin/perl -w
# Suri, a Spamvertised URIs filter.
# Suri Copyright 2004, Yves Junqueira <yves.junqueira at>
# uribl Copyright 2004, Devin Carrway <>
# Distributed under the GNU Public License
# or the Perl Artistic License
# Suri is a SpamVertisedURIs check script that should be called
# from amavisd-new or any other antivirus frontend that can
# use tcp socks to connect to the daemon and trigger the checks.
# When a client (in this case, amavis) connects to it and issue
# a "SCAN /dir" command, it will scan that dir's files screening
# for messages containing SpamVertisedURIs.
# When it finds a "(http|ftp|etc)://sub.domain.tld/ it will check
# the SURBL server if that domain or subdomain is blacklisted.
# Suri runs as a pre-forked daemon, so it's supposed to perform
# well and not use many resources.
# As in any other RBL checks, you shouldn't rely on remote
# RBL servers if you do many checks. Grab a SURBL zone if you
# can, and check against a local rbldnsd.
# Beware that mail classified using Suri are considered VIRUS,
# so you should be careful with false positives. I suggest
# using only trustful SURBL databases and that you quarantine
# viruses. Or you can set a non-"deny" value to $action to just
# see WARNNING logs at syslog.
# New versions probably in
# Usage:

# 1) Insert this after your main antiviruses
# in @av_scanners, at amavisd.conf
# ['Suri', \&ask_daemon,
# ["SCAN {}/../email.txt\n", ''],
# qr/^OK/, qr/^DENY/, qr/^SPAMMEDURL:.*[(](.+)[)]/
# ],
# 2) Run suri.
# 3) Restart amavisd

# Options
my $version = '0.8';
my $min_servers = 2;
my $max_spare_servers = 2;
my $user = 'clamav';
my $group = 'clamav';
my $port = 20098;
my @uribl_zones = ('');
my $action = "deny";


use Net::DNS::Resolver;
my $dir;
my %sockets;
my $res = new Net::DNS::Resolver or return DECLINED;


package Suri;

use Unix::Syslog qw(:macros);
use Unix::Syslog qw(:subs);
use strict;
use vars qw(@ISA);
use Net::Server::PreFork;

@ISA = qw(Net::Server::PreFork);
my $self = bless(
'server' => {
'user' => $user,
'group' => $group,
'min_servers' => $min_servers,
'max_spare_servers' => $max_spare_servers,
'background' => 1,

openlog "suri", LOG_PID | LOG_PERROR, LOG_INFO;
syslog LOG_INFO, "Suri v$version starting.";

$self->run( port => $port );

sub process_request {
my $self = shift;

#import Unix::Syslog;

eval {
openlog "suri", LOG_PID | LOG_PERROR, LOG_INFO;

local $SIG{ALRM} = sub { die "Timed Out!\n" };
my $timeout = 30;

my $previous_alarm = alarm($timeout);
while (<STDIN>) {
$_ =~ s/\r//g;
if ( $_ =~ /SCAN (\/.*)/ ) {
$dir = $1;
print "Scanning $dir\n";

syslog LOG_INFO, "Scanning %s", $dir;

my $result = &check;
# my $result = '';
if ( $result =~ /DENY/ ) { print "DENY\n"; last; }
else { print "OK. $result\n"; last; }
else { print "Oops\n"; }


if ( $@ =~ /timed out/i ) {
print STDOUT "Timed Out.\r\n";

sub check {
openlog "suri", LOG_PID | LOG_PERROR, LOG_INFO;

while ( defined( my $arquivo = glob($dir."*") ) ) {
next unless -r $arquivo && -r $arquivo;
print "Checking $arquivo\n";
open( ARQUIVO, "<$arquivo" )
or print "Couldn't open file: $arquivo: $!";
while (<ARQUIVO>) {

#print $_; }


# Undo URI escape munging
$_ =~ s/%([0-9A-Fa-f]{2,2})/chr(hex($1))/ge;

# Undo HTML entity munging (e.g. in parameterized redirects)
$_ =~ s/&#(\d{2,3});?/chr($1)/ge;

while (
$_ =~ m{
\w{3,16}:/+ # protocol
(?:\S+@)? # user/pass
(\d{7,}) # raw-numeric IP
(?::\d+)?([/?\s]|$) # port, slash
# or EOL
my @octets = (
( ( $1 >> 24 ) & 0xff ),
( ( $1 >> 16 ) & 0xff ),
( ( $1 >> 8 ) & 0xff ),
( $1 & 0xff )
my $fwd = join( '.', @octets );
my $rev = join( '.', reverse @octets );

# print("uribl: matched pure-integer ipaddr $1 ($fwd)" );

$sockets{"$rev\t$_"} ||= $res->bgsend( "$rev.$_.", 'txt' )
for @uribl_zones;
while (
$_ =~ m{
\w{3,16}:/+ # protocol
(?:\S+@)? # user/pass
(\d+|0[xX][0-9A-Fa-f]+)\. # IP address
my @octets = ( $1, $2, $3, $4 );

# return any octal/hex octets in the IP addr back
# to decimal form (e.g. http://0x7f.0.0.00001)
for ( 0 .. $#octets ) {
$octets[$_] =~ s/^0([0-7]+)$/oct($1)/e;
$octets[$_] =~ s/^0x([0-9a-fA-F]+)$/hex($1)/e;
my $fwd = join( '.', @octets );
my $rev = join( '.', reverse @octets );

#print( 8, "uribl: matched URI ipaddr $fwd" );
$sockets{"$rev\t$_"} ||= $res->bgsend( "$rev.$_.", 'txt' )
for @uribl_zones;
while (
$_ =~ m{
\w{3,16}:/+ # protocol
(?:\S+@)? # user/pass
([\w\-.]+\.[a-zA-Z]{2,8}) # hostname
my $host = $1;
my @host_domains = split /\./, $host;

#print( 8, "uribl: matched URI hostname $host" );

while ( @host_domains >= 2 ) {
my $subhost = join( '.', @host_domains );

#print("URIBL: checking sub-host $subhost\n" );
$sockets{"$subhost\t$_"} ||=
$res->bgsend( "$subhost.$_.", 'txt' )
for @uribl_zones;
shift @host_domains;

my %matches;
while ( keys %sockets ) {
my $c = 0;
for my $s ( keys %sockets ) {
unless ( $sockets{$s} ) {
delete $sockets{$s};
next unless $res->bgisready( $sockets{$s} );
my $packet = $res->bgread( $sockets{$s} );
unless ($packet) {
delete $sockets{$s};
for my $rr ( $packet->answer ) {
$matches{$s} = $rr->txtdata
if $rr->type eq 'TXT';
delete $sockets{$s};
sleep 0.1 if keys %sockets and !$c;
for ( keys %matches ) {
my ( $host, $uribl ) = split /\t/, $_;
my $note = "SPAMMEDURL: $host in $uribl ($matches{$_})";
print "\n$note\nAction: $action\n";

if ( $action eq 'deny' ) {
syslog LOG_INFO, "SPAMMEDURL: $host in $uribl ($matches{$_}) Action: deny";
return ("DENY $note");
else {
syslog LOG_INFO, "SPAMMEDURL: $host in $uribl ($matches{$_}) Action: $action";
return ("WARN: $note");
} }
return "OK";
# return "WARN";


Sunday, September 19, 2004

Wikipedia daily articles: pills of knowledge

I've subscribed a few weeks ago to the Wikipedia daily articles list.

It's a nice way to relax and have a good reading, while taking courage to read all those bugtraq or postfix-users messages.

Friday, September 10, 2004

Fedora Core with Mysql 4

David Martínez was kind enough to put up a repository of MySQL 4 and further dependencies compiled for FC2.

You may get the files directly from:

These packages fixed a very ugly behaviour I was getting with my mixed mysql lib's.
Every time I ran a perl with a MySQL DBI, it ended with Segmentaion Fault.
Now it's fine. Thanks David!

Thursday, September 9, 2004

Another GMAIL INVITATION - gmail account

Follow this:

Everybody deserves a GMAIL account!

Tuesday, September 7, 2004

There were errors

Weeee. Blogger couldn't publish anything for several hours.

001 Connection timed out

It still can't.

To my usual thousands of daily readers, I can only apologize.

Please stop commenting at every posts. I can't read all your comments!

Domain hijackers

While testing for a better layout for the google ad you see in the left, I've got interested in one very evil service.

They call it Expired Domains Traffic. They are most of the times a disservice for the internet users, but I'm linking to them for public interest.

The idea is quite simple. They have a bot searching for the expiration of domains. When a domain expire, they buy it, and them make it redirect the traffic to their customers site.

What's the point there? Let them explain it:

About "Expired DomTraffic"

Every day 1000s of previously registered domains expire, because the owner did not extend domain registration . If the owner does not pay the annual fee, the domain registrar will put the name on hold. With most registrars, an "on hold" domain stops working. Most registrars allow an additional grace period of 30-90 days for the domain owner to pay the annual fee. During this period, the registrar will generally contact the domain owner many times with attempts to get them to pay the fee and reactivate the domain name. If the domain owner fails to pay on time, and fails to respond during the 30-90 day hold period, the registrar will drop the domain name. At this point, anyone can register the name. We assume that the previous owner no longer wants a dropped name and we will register the name if we feel that it will generate traffic. After we own the name, we direct it to our server and send out the expired domain traffic (= guaranteed visitors) to the campaigns we serve.

These expired domains have traffic on them, and the previous owners marketed them so you now reap their hard work by having us direct this traffic to your website. This type of redirection is better than popups or popunders. AOL browsers now completely block pop-ups and about 60% of all internet surfers have installed pop-up blocker software anyway.This is why expired domain traffic will generate much better results than pop-up or pop-unders!

It's interesting to have a glimpse on how things really work on (under) the internet.

Dedicated Servers

I've start making a Dedicated Servers research, looking for good prices and interesting service.

The first I knew about was ServerMatrix. They have a nice site, the company is big (a subsidiary of The Planet) and the service seems fine. They even publish a live cam picture of their data center. Quite impressive.

The problem is their prices were raised lately. There was a no-setup promotional fee for one of the server options, and that is gone. Also, I believe they removed the cheapest server option. Finally, even the setup fee is now USD199.00. AFAIR, it used to be USD149.

Then I've found HiVelocity. I believe it was in either a google banner, or a simple google search link. As I was looking for a cheap server, the one that fit was 2.0 GHz Celeron with 1000gb metered bandwidth.

What impressed me was their very good use of PHP Live. I've talked to a sales person there, Drew Adams, who was a very competent guy at his job. It's obvious that sales team is very important, but I'm very impressed with the quality of sales department in the US.

ServerMatrix, whose PHP Live didn't work for me when I tried, was very fast in the mail pre-sales support. Superb Servers also uses PHP Live - through what I was told that they asked 10% over payments using PayPal. That was bad.

Now what was new to me, was that PHP Live allows the sales guys to ACTIVELY talk to the visitor. That was a nice experience with HiVelocity. After you called them for the first sales support, the next time you enter their site, if they want so, they will THEMSELVES start a new conversation (like "Hey, any other doubts?"). That can bother some people, but I wasn't bothered. That is technologically simple, but commercially amazing!

Let's get back to the subject. Superb Servers, another service I've found, is ok. One thing that I didn't like, and you will probably agree, is that their site is VERY ugly hehe. After 5 minutes there, I thought, "argh, let me out!!".

Finally, some other competitive guys are They have a nice looking site, and good prices. One very important note is that you should really pay attention is that in their entry level service, the cheapest one, remote reboots are not included. I don't know what would happen in case I need a reboot (getting locked outside when setting up the firewall or changing the SSH daemon). That is very strange. You have to pay for extra reboots.

The coolest thing about is that they have ALIEN MODELS posing for their site pictures.

Image Hosted by

Isn't the girl from the right's very BIG? They should hire a new Photoshop guy...

Sunday, September 5, 2004

Another Antispam Solution, or anti-spam solution

The amount of spam targeted to our servers is huge.

Sometimes, as much as 80% of e-mail that would be delivered to the servers is either spam or virus.

Dealing with that is part of my job, as the mail system admin. It's interesting, because it's challenging and results are fast and noticeable, if you apply the right techniques.

In our servers, we have some very old mail boxes that are in ALL spam lists. So we have a very worthy tool in our hands. We can use these accounts as tests for current tools and use them to train whatever other tool we'll be deploying.

In the last few days, I've been developing a new antispam solution that would be amazingly easy to manage and would give us dozens of possibilities on what to do with the information generated by the logs.

In a usual mail content scanning sollution, even if it's as powerful as DSPAM, you can't be sure wether you will have false positives, so you can't use that for black listing sources or whatever.

This technique I am now using, which I read is used in some RBL providers, requires people to maintain daily, but scales very well. The more you maintain, the bigger are the results. My bet is that what outstands in this particular case is that I've made it very quick to maintain. Just a few minutes every day, and it's alright.

Also, deploying it in other servers would be very easy, differently from DSPAM, which could be a pain for new sysadmins.

I won't discuss it further because there isn't anybody reading anyway. This is for historic record :-)

Saturday, September 4, 2004

Free Gmail Invitation

I have some spare Gmail invitations.
Follow this link. If you're lucky, you will get yourself a gmail account.