Virtual OS/2 International Consumer Education
VOICE Home Page: http://www.os2voice.org
December 2003

[Newsletter Index]
[Previous Page] [Next Page]
[Feature Index]

editor@os2voice.org


Letters, Addenda, Errata

Translation: Christian Hennecke

If you have any comments regarding articles or tips in this or any previous issue of the VOICE Newsletter, please send them to editor@os2voice.org. We are always interested in what our readers have to say.

Last month, I wrote about SPAM and the problems that it causes.  I asked you, the readers, to write to me with your experiences and solutions for SPAM.  Interestingly,  Users are engaging every type of filter from the network level all the way down to simple word list filtering at the application level with wide variances in success.  Several people, it seems, have implemented to a combination of filters at different levels to keep their mailbox clean.

First, I'd like to start with a letter from Phil Parker, who gives us an interesting report on what makes us susceptible to SPAM.

I thought this might be worth an announcement. The bit in braces {} below
is the required attribution for duplication according to their policies.

17. Why Am I Getting All This Spam?: Unsolicited Commercial E-mail
Research Six Month Report
http://www.cdt.org/speech/spam/030319spamreport.shtml

The Center for Democracy and Technology (CDT) presents this insightful
report, which discusses some of the key factors that make email users
susceptible to spam. Based upon a study conducted by CDT, the report
outlines the primary sources of spam and how the spammers obtain email
addresses. By setting up 100 different email addresses and using each in a
particular manner, the study's administrators documented the kinds of
online behavior that result in the most spam. The report dispels some of
the common myths about spam and provides suggestions for staying off the
mailing lists of advertisers. A note at the end of the report recommends a
free online tool that can be used to obscure email addresses when posted
online. [CL]

{From The NSDL Scout Report for Math, Engineering, and Technology,
Copyright Internet Scout Project 1994-2003.  http://scout.wisc.edu/ }

Main recommendation: encode publically posted email addresses in numerical
HTML equivalents. A free encoder is at:
http://www.wbwip.com/wbw/emailencoder.html

Phil Parker
philDESPAM@math.twsu.edu

Thomas Muller takes the manual approach, something that's certainly accurate, but probably won't help users who get a few hundred SPAM a day.
Here is my answer to your article:
I am using PMMail with its remote control to delete spam on my ISP-Host.
Thats it.
Thank you for your job you are doing on os2voice.org.
Carry on, please.
Thomas
(from Germany)
Thomas Müller
th.muellerDESPAM@nwn.de
Jim Arnold and Christoph Kloeters both use a web based service approach to the problem.
Sirs,

        For my personal email I use the Netaddres.com mail service. They
have implemented a very good SPAM filtering service that allows me to
forward all SPAM to a junk folder on their server. I occasionally look
through this folder to ensure that there is no mail that I want to see and
then delete all the mail in the folder. This is a web based approach. I
only download the messages that stay in my Netaddress inbox to my local
mail client via POP access.

        While this releaves me of the burden of SPAM it doesn't do anything
for the congestion on the Internet that SPAM produces.

        Jim
jim.arnoldDESPAM@ec.gc.ca

Christoph Kloeters writes:

You have asked for SPAM filtering solutions in the Newsletter.
http://www.eleven.de offers free filters for end-users. These are based on an
email address to which the mail is forwarded. Then you either need
another mail account that is forwarded the filtered mail, or you have a
mail account with good filters (gmx recommended!) that won't send the
already filtered mail to eleven again. Eleven only sets a mark in the
headers of filtered mails and doesn't delete them itself, this can be
done on the client side or you can - like me - use the convenient
filters of gmx.

So far Eleven has had not a single (!) false positive, and from 70-120
SPAMs per day, only 1-5 SPAMs get through (those are partially Mails,
for which no criterion applies like empty address verification mails).

For commercial use, Eleven is not free - I have no idea about the costs
though.
Kind regards, Christoph

ch.kloetersDESPAM@gmx.de

Matt Walsh uses his provider's content based filter, producing good results, but is having trouble keeping up on other accounts using a simple word based filter.

As you mentioned with OS/2 it's not deadly, just a pain.  My ATT Global
account has great spam filtering.  Apparently they use Bright Mail which
got the best report in PC Mag's latest review.  For my home accounts I
use PM Mail and have made filters, but it's getting ahead of me and I'm
going to have to buy Junk Spy soon I guess.

PS: I agree and wish we could knee cap the boogers or at least fry their
machines.

Matt
mwalsh1DESPAM@elp.rr.com

Fritz Schori uses a network filter to block out viruses, plus a combination of simple word and statistical content based filtering.
In Injoy Firewall, I am filtering the first 16 bytes of Swen. That
particular binary format is used by Swen virii only.

For my normal account that runs via my provider, I use do-it-yourself
filters in PMMail. But since I also have my own server, I use Weasel
together with Weaselfilter. I like that you can specify a limit for
Weaselfilter. There is a certain value for each filter. For each email,
the values are added, and if the sums exceeds the limit, the mail is
rejected.

Regards, Fritz
fritzDESPAM@os2force.ch

Walter Metcalf uses a similar approach with Sundial Systems Junkspy.
I use Sundial's JunkSpy 2.0, but I've enhanced it with several filters
of my own.  I get relatively few false positives, but far too many SPAM
messages still slip through.  I'm looking for a better tool.

Walter F. Metcalf
walter.os2DESPAM@rogers.com

James Moe also uses Junkspy with Bogofilter and Mike Fry's Proxyc.
Hello,

  I use two programs: Junkspy and bogofilter.
  Junkspy is a commercial pop3 proxy that scans incoming
email using a content matching database. It is fairly effective especially when using
the "blackhole" option. It correctly tags 90 - 95% of the spam. It tends to be
conservative so false positives happen. It also provides a "whitelist" option (they
call it Global Exceptions). The database is regularly updated with new and improved
entries.

  Bogofilter is a bayesian classifier.
(OS/2 versions are available on hobbes.) It is a standalone program that must be called
by a mail agent to evaluate a message file. It returns Yes, No or (optionally) Unsure.
It can also add this info to the message header. Like other bayseian filters it becomes
very accurate after its initial training period. Its only downside is its CPU intensive
nature: on my athlon running at 1800 MHz it can process only 4 - 7 messages per second.
  Between the two programs no spam gets through. Literally. The occasional (2 - 3 per
week) Unsure is easily handled.

  To make using bogofilter more useful Mike Fry is developing Proxyc (also available on
hobbes), a clever pop3 proxy that makes bogofilter a daemon. This keeps the program in
memory which doubles the standalone classification speed. He is very responsive to user
feedback.
James Moe
jimoeDESPAM@sohnen-moe.com

Four others use Bayesian algorithm filters exclusively.
Bernd Hohmann writes:
Hi,

http://www.zerotoaster.net contains a Bayesian Filter, a stringfilter and an AV Plugin.

Together with its POP3 Fetchmail module it can keep the inbox clean.

Bernd
hohmannDESPAM@harddiskcafe.de

Per Johansson and Niels Jensen both use Polarbar Mailer, with it's built in Bayesian filter.
Per Johansson writes:
Hello.

I'm using Polarbar Mailer with Bayesian filtering, so bad messages are
put in a special folder. It usually works fine, with a few errors. I'm
on an ADSL connection, so spam download time isn't a big problem.

Per Johansson
perDESPAM@johansson.name

From Niels Jensen:
Hi,

Both a home and at work I use the Bayesian statistisk based filters
included in Polarbar Mailer. This filter looks for good and bad words
based on a set of user defined good and bad e-mail (the corpus).

I work the ratio of spam to useful e-mail is about 1 to 10. Currently
the filter catch about 80% of the spam. The filter moves the spam mail
to a separate folder, and about once a month I delete the content of
this folder. I don't inspect the content of the spam folder, but  on
another couple of filters to act on non-spam e-mail.

The setup is to filter all mail known to be good, i.e. mail from
co-workers (known domain), mail from mailing list (known from address),
mail from persons I co-operate with (known from address). Using this
approach moves mail, that is important away from the inbox to other
folders, and leaves only mail from new correspondents, mail from
persons, who have changed e-mail address and spam missed by the spam
filter in the inbox.

The filtering significantly slows download at work - I use a TP600 with
a 233 MHz processor. I believe, that especially the spam filter takes
CPU power.

At home I receive significantly less spam than at work. This is properly
because my work e-mail address can be harvested from the work web-site,
while my private e-mail address is not posted in clear text on web-sites
- as far as I know!

The SPAM problem has increased significantly in the last 6-12 months,
and many of the colleaques at work - especially the ladies - really
object to the many porn and viagra related mails.  There is also an
ethics issue here, can one require a co-worker to deal personally with
e-mail they find objectionable?

With kind regards from Niels Jensen, Slangerup, Denmark
njensenDESPAM@get2net.dk

Finally, Steven Zaveloff and Tom Brown use the newly implemented Bayesian filter built into Mozilla.
Steven writes:
I use the Mozilla mail client and have the junk mail controls enabled.
I still have have to check that non-junk mail is not filtered out but
that is a simple process of going to the Junk folder and quickly
scanning through the messages--maybe a couple of minutes a day. The
system, of course, misses some but I just mark them as junk and they
are automatically moved to the Junk folder. All in all a fairly good
balance, I think.

HTH
Steven H. Zaveloff
zaveloffDESPAM@earthlink.net

Last, but certainly not least, Tom Brown writes:
Responding to your request for information in the November newsletter...

I get very little SPAM on my RoadRunner accounts. I use Mozilla 1.5 for
my e-mail client, and it's Bayesian filter catches about half of what
comes in, about 5 or so per week. What I do get, I mark as such and
delete it, thus training the Bayesian filter, or so they say. I also
have my own filters, relics from prior times when I had other ISPs.

I used JunkSpy several years ago (not the current version). It did a
fair job, but let some slip through and gave false positives based on
word fragments in the header. I stopped using it when I switched to
Earthlink, and haven't needed it with RR. I checked Earthlink's
Spaminator occasionally, and never found a false positive. I don't know
of a way to check RR's spam filter.

HTH
 

Tom Brown
thombrownDESPAM@san.rr.com

After reading these emails and seeing the different approaches to a common problem, I wanted to know more about the subject of filtering.  The filtering programs mentioned do a reasonably good job of stopping SPAM, all of them do a better job than simple word lists (something SPAMmers have adapted to).  Specifically, I was curious about Bayesian filters and why they are so successful, rapidly overtaking other types of filtering as the users favorite weapon against SPAM.  About.com has a good layman's explaination of what Bayesian filters do.  For a more in-depth explaination that would appeal to technicians and programmers, I recommend a visit to Paul Graham's site.

Believe it or not, even Microsoft has looked into the problem of SPAM, although they've taken very little action in the real world to help stop it.  Thankfully, there's filters available in every category for OS/2 users.

Related links

   A Plan for Spam:    http://www.paulgraham.com/spam.html
   Better Bayesian Filtering:    http://www.paulgraham.com/better.html
   Mozilla Spam Filtering:    http://www.mozilla.org/mailnews/spam.html
   What You Need to Know About Bayesian Spam Filtering:    http://email.about.com/library/weekly/aa100702a.htm
   How Bayesian Spam Filtering Works:    http://email.about.com/cs/bayesianfilters/
   A Bayesian Approach to Filtering Junk E-mail:    http://research.microsoft.com/~horvitz/junkfilter.htm
   Bogofilter Homepage:    http://sourceforge.net/projects/bogofilter/
   Bogofilter on Hobbes:    http://hobbes.nmsu.edu/cgi-bin/h-search?key=bogofilter
   Weaselfilter:    http://hobbes.nmsu.edu/cgi-bin/h-search?key=weaselfilter
   Using Weaselfilter Against Email Attachments:    http://www.os2voice.org/VNL/past_issues/VNL0902H/vnewsf3.htm
   PMMail:    http://www.pmmail2000.com
   InJoy Firewall:    http://www.fx.dk/
   Sundial System's Junkspy:    http://www.junkspy.com/
   Address encoder:    http://www.wbwip.com/wbw/emailencoder.html
   Eleven filtering service:    http://www.eleven.de
   Netaddress service:    http://www.netaddress.com/
   POPFile service:    http://popfile.sourceforge.net/


After Lothar Frommhold interviewed Walter A. Schmidt about VTeX, we asked him to write a few words about his experience with Ian Hutchinson's T E X to HTML (tth) which he used to format the text of the interview.

Thanks Lothar


Conversion of T E X  files to HTML

Recently I needed a document in the html format for web publishing. That particular document was previously created using VTeX/2. It is a multi-page, text-only, plain article without any figures, tables, mathematical formulas, using some standard font types (boldface, roman, italics). The document is somewhat structured in paragraphs and lists and contains url references to certain web pages. A universally applicable, robust conversion program from T E X to html does apparently not yet exist, but there are several beta versions of such conversion programs available which will work for some, but not for all types of documents (see M. Goosens and S. Rahtz, The LaTeX Web Companion, Addison-Wesley 1999, for details). A befriended T E X expert suggested that I try Ian Hutchinson's T E X to HTML (tth) converter. Since a binary executable to run tth on OS/2-eCS is not available [Editor's note: An OS/2 version (though requiring the EMX runtime) of tth has actually been available from Alexander Mai for a while: http://www.lesstif.org/~amai/os2/html ], I downloaded the source file tth_C.tgz from http://hutchinson.belmont.ma.us/tth/. Unzipping the file using InfoZip (type unzip tth_C.tgz on the eCS command line), generated (among other files) tth.c which I compiled with the Watcom C compiler, which comes with eCS, by typing

wcc386 -k2048k tth.c
on the command line. The k switch (-k2048k) was necessary to reserve a sufficient stack size for execution. The compilation took several minutes and produced an executable tth.exe which must then be copied to a subdirectory that is included in the PATH statement in config.sys. After that I changed to the directory with the VTeX file and executed tth "name".tex where "name" is the name of the VTeX (or T E X , LaTeX) file. This produced the "name".html file which looked quite good in the browser window (in my case, the IBM web browser); clicking on the urls connected me instantly to the desired web pages. It took me less than 30 minutes to set up tth.exe from scratch on my computer; the conversion of the 6-page VTeX file to html took but split seconds.

Admittedly, my document was relatively simple and it was perhaps not much of a challenge to convert it to the html format. However, tth.exe handles complex tables and color quite well, among several other things, and for mathematical formulas a companion program mth is available from the same source. It might be worthwhile to try these packages also with more complex T E X documents.

Lothar Frommhold


[Feature Index]
editor@os2voice.org
[Previous Page] [Newsletter Index] [Next Page]
VOICE Home Page: http://www.os2voice.org