Newsletters- Spring 2004

C# on the Rise

by John Stout

At Stout Systems, we create custom systems for our clients, including developing the specifications when that has not already been done. One of our most frequent requests in the past two years we have had is for the migration of old custom systems, including databases, to new platforms using current technologies.

If the number of requests we have received for .NET people and projects in the past year is any indication, .NET adoption--and C# in particular--has risen sharply (C-sharply?).

As with all new technologies, there are many things to learn—and C# has more than its fair share.

In our work with C# developers, we have found several concepts that are pretty uniformly missed, overlooked, devalued or just plain misunderstood. I thought it might be interesting to the C# community to know what they are. That way, you can bone up on them before your deficiency is noticed. Or you can pat yourself on the back for already knowing them!

Given my space constraints, this article assumes you already are familiar with C# and fundamental Windows and .NET concepts.

With any new language, there is always a question of whether or not to convert working code. If you have stable algorithmic code there probably is no compelling reason to convert it; just call it from C# with Windows API functions. I have been surprised at the percentage of C# developers who did not know how to make such calls into DLLs from C#.

Let’s say you have a function called ShowTitle in your DLL declared as follows:

  int ShowTitle(LPCSTR msg, int id);

You’ll need to use the DllImport attribute to show that your function is found in a particular DLL, in this case mydll.dll.

  [DllImport ("mydll.dll")]
  public static extern int
  ShowTitle(String msg. int id);

DllImport also has other parameters you may use to handle the interaction between your C# code and the function in the unmanaged DLL.

Another point of confusion is the ref, out and params keywords and what they mean when used in function parameter definitions. Briefly:

ref is used to indicate that a variable is passed by reference. If a ref is used as a parameter to a function, that function can change the parameter and actually make the ref point to an entirely different object (which is different than C++ usage). Any ref must be initialized to a value.

out is essentially a ref that doesn’t have to be initialized. You use it to return a value from a function.

params is used to indicate that a function takes a variable number of arguments. Two rules: params must be used with an array argument and has to be the final argument in a function’s argument list.

One last concept that I frequently discover C# developers haven’t yet grasped is the difference between const and readonly member variables.

const variables are set inline. The compiler may use the value of a const variable without its corresponding variable. A consequence of this is that if you use a const value from another assembly, your C# code will contain the value itself and no reference to its variable; if the other assembly changes its const value for some reason, your code will use the original value until you recompile it.

readonly variables can be set inline or in class constructors, but cannot be changed in any other context. The value can be dynamically determined at runtime.

These new features of the C# language are analogous to features in C++, so the concepts aren’t new--just the way in which they are used. If you have any other interesting challenges in learning or using C# and .NET, please feel free to write them up and send them our way for a future newsletter article.

John W. Stout is the founder and president of Stout Systems. He has twenty-four years experience in the software development industry consulting for many companies on a wide variety of projects. He is also sought after as a technology speaker, presenting sessions at developer conferences and user groups. 
Email .

-back to top-


Fighting Spam

by David Relson

The amount of spam has been increasing over recent months. In early 2003 my mail server received 100 spam a day. By December 2003 the average was 300. By April 2004, it exceeded 500.

What’s a person to do?

One option is to head for your nearest computer store and buy one of the many shrink-wrapped products. Another option is to Google for "spam filters" and see what comes up. Interestingly the second hit is an article titled "A Plan for Spam" by Paul Graham, and it’s that article that got me interested in spam filtering (in addition to all the junk in my in-box).

In his article, Paul wrote about a statistical approach to recognizing spam. The basic idea was to build two lists of words--one from spam messages and the other from non-spam "ham" messages. To classify a new message, compare the words it uses to the words in the two lists and compute a score indicating where on the "ham ... spam" scale the new message appears.

This is the well known Bayesian technique, named after 18th century mathematician Thomas Bayes. Graham’s article was the first to describe its application to spam filtering. The article has spawned many spam filters--both commercial and open source.

The Graham article appeared in August 2002 and the well-known open source advocate Eric S. Raymond took an interest and started implementing the technique in C. He named his program bogofilter. I learned about it while it was in its infancy, took an immediate interest in it, made suggestions and criticisms, and contributed modifications (patches) to further it. Within a few weeks it had become an official SourceForge project with an international team headed by Adrian in California with major contributions by Matthias in North Rhine-Westphalia (Germany), Gyepi in Boston, Greg in Toronto, and me in Ann Arbor.

How does a bayesian filter work?

To start with, the program needs to be told what is spam and what is ham. This process is called training and involves parsing messages identified as spam or ham and saving the tokens and a count of how often they occur. This produces a word list with information like "david occurs in the subject line of 30 spam messages and 2 ham messages" and "training has been done with 1000 spam and 800 ham".

Given a word list, when a new message arrives it is parsed and its tokens are looked up. The counts stored with each token allow a "spamicity" calculation, i.e. a calculation of how often the token occurs in spam versus how often it occurs in ham. The above example results in a score of 0.922853 (likely spam) for david when a message arrives with "david time to refinance" as its subject.

As part of parsing a message, a good spam filter needs to understand the format of email messages--headers, body, multi-part mime, plain text versus html, encoding (base64, quoted-printable, uuencoded), etc. To improve its ability to discriminate, bogofilter provides separate counts for tokens in the headers of a message and in the message body. Thus david in a message body has a different score than in the message subject.

Where/how do bayesian filters run?

The answer to that question depends on the particular program. For example bogofilter is written in C, is generally run on a mail server (though it can be run on a workstation), and runs under Linux, FreeBSD, Solaris, OS X, HP-UX, AIX, RISC OS, SunOS, OS/2, etc. Another open source spam filter called SpamBayes is written in Python. It is more oriented towards the end user and runs on Windows, UNIX/Linux and Mac OS. There are a multitude of other bayesian spam filters written in a variety of languages for a variety of operating systems and each offers its own set of features.

David Relson is the owner and lead consultant of Osage Software Systems, Inc. He started programming at about the time Dartmouth Basic was released and has spent many years since using a variety of languages, hardware architectures and operating systems. His current preferences are object-oriented languages--such as Smalltalk and Objective-C-- and the open source GNU/Linux operating system. His fluency includes C and perl, web and embedded programming--and whatever else may be needed. David has collaborated with Stout Systems on several projects.
Email .

-back to top-

 

 


Current Job Opportunities

View Our Candidates

Subscribe to Our Newsletter

Stout Systems, P.O. Box 2934, Ann Arbor, MI 48106 · Voice 734-663-0877 · Fax 734-663-7659 ·
Copyright © 1995-2008 Stout Systems Development Inc. All Rights Reserved. Trademark & Legal Notice. Site Map Design by Fast Forward