adzapper   adzapper Home | About Adzapper | Installing | Zaplets | Zaplet File Format | Zaplet Updates Python Powered  

Zaplets

Zaplets are small rule files that describe what to block and what not to block, on a per-website basis. They are easy to construct using adzapper's web user interface (web UI), or if you prefer editing them directly, you can: they are simple, human-readable XML-based text files.

The way adzapper works is this: adzapper examines each URL before downloading the requested file. If the URL's site matches the site described in the zaplet's 'host' statement, adzapper will apply the blocking rules contained in the zaplet.

If the URL matches an 'allow_url' statement in the zaplet, the object referred to by the URL is downloaded. If the URL matches a 'block_url' statement in the zaplet, adzapper sends a single-pixel transparent GIF instead of whatever file the URL really refers to.

Each zaplet must have at least one 'allow_url' or 'block_url' statement.

This behavior currently means that all blocked URLs are interpreted as pictures, but since most blocked URLs are ads, which are GIF or JPEG files, this works pretty well.

You can also build content filters for HTML that let you alter the HTML before it is displayed in your browser. This functionality is not accessible thru the web UI yet, but you can directly edit the zaplet files to use it.

Zaplets usually live in the zaplets/ directory in the directory where you installed adzapper, or in your ~/.adzapper/zaplets directory (under Unix), unless you change this directory using in the adzapper.conf file or use the command-line options.

The Zaplet Wizard

It's easy to make zaplets with adzapper's web UI. You can also make zaplets with a text editor, if you want; this is detailed in a later section.

The easiest way of all is to use the Zaplet Wizard, accessible from the web UI. I have this link bookmarked in my web-browser, which means I can easily get to it when I want to block a new ad. To use the Zaplet Wizard, load the image of the ad in your browser (usually you can do this by right-clicking on the image and selecting 'View Image'.) Then hit the Reload button on your browser- you need to do this so that adzapper can know the image URL.

Now click on the bookmark for the Zaplet Wizard, and follow the directions. The zaplet wizard will try to guess the best format for the zaplet, and let you change its guesses if you want.

When you are finished, hit the Back button on your browser until you get back to the image. Then hit Reload again. If the zaplet is working, the image should disappear!

You can use the Zaplet Wizard to add new rules to existing zaplets; if the zaplet already exists, the Zaplet Wizard will just add the new rules.

 

Making New Zaplets

If you don't want to use the Zaplet Wizard- because the URL is too complex- you can create new zaplets from scratch using the web UI.

First you have to know the format of an ad URL. this will probably require you to look at the HTML for the web page you are looking at (View Source) or else look at the URL for the ad picture by viewing the image. (Under Netscape/Linux, right click the image and select View Image. Then look at the URL in the Location: box.)

Zaplets describe how to block URLs. The thing to know about ads is that their URLs usually change, but according to a pattern-- since ad banners are mostly GIF files placed in a particular directory on the ad company's web server.

The goal is to find the most general expression that blocks the ad by matching its URL, but doesn't block (match) anything that isn't an ad. It may sound complicated, but once you've looked at a few ad URLs, it's pretty straightforward.

For instance, an ad URL might look like this:

http://adforce.imgis.com/?adserv|135|52407|1|1|MISC=276177797;

Everything from the server adforce.imgis.com is an ad, so we want to make a zaplet that blocks everything. That's the simplest zaplet to make.

From the adzapper GUI homepage, go to "Make a new zaplet". From this page, just enter the DNS address of the adserver in the 'host' box; in this case, it is 'adforce.imgis.com' (without the quotes). Now click "Save".

When you click "Save", the zaplet takes effect immediately, and is also saved to disk into the site_zaplet_dir that you specified in adzapper.conf.

 

Allows, Blocks, Literals, and Regexes

Often, you only want to block certain URLs from a site, not everthing. To do this, you need to use the allow and block statements. The simplest forms are Allow Literal and Block Literal.

For instance, if a site serves their own ads, and keeps all their ads in a directory called bannerads, the ad URLs might look like this:

http://foo.bar.com/bannerads/someadd345.gif

In this case, you don't want to block everything from this site; but you would want to block anything that came from the 'bannerads' directory. To do this, simply put 'bannerads/' (without the quotes) in the Block Literal box, and click "Save".

Sometimes, sites keep their ad images in the same directory as the rest of the images for their site. You don't want to block all the images from this directory, only the ads... this can get a bit more complicated. If most of the images are ads, and there are only a few buttons or logos that you want to allow, you can use a combination of Allow Literals and Block Literals to get what you want. the Allows are always cecked before the blocks, so that URLs that you Allow will always be let through.

If the ads are all in the same directory with the images, but follow a pattern, you can try useing a regular expression to match the URL. (Zaplets use Perl-style regular expressions, as detailed by the Python re module; for more information, see Python's regular expression HOWTO or Python's regular expression documentation.)

For instance, if all the images for the www.fubar.com site are in the 'images' directory, and all ads have the suffix '_ad.gif', like this sample add URL:

http://www.fubar.com/images/BigCo_ad.gif

In this case, the following Block Regex would do the trick: 'images/.*_ad.gif' (without the quotes).

 

Zaplet file format

Zaplets use XML as their structure. To see the XML Document Type Description (DTD) that describes the zaplet file format, go here.

You can create and edit zaplets using a the Web UI, or you can use a text editor, if you understand XML, the Zaplet DTD, and the meaning of the tags and fields.

Here's what a zaplet looks like in 'raw' form:


<zaplet>
    <version>0.9</version>
    <host>yimg.com</host>
    <allow_url type="literal">/yahoo.gif</allow_url>
    <allow_url type="regex">/main.*\.gif</allow_url>
    <allow_url type="regex">store.\.yimg.com</allow_url>
    <block_url type="everything"/>
</zaplet>

The 'version' tag sets the version of the zaplet file format that this zaplet uses. This must be a single-place decimal number, like '0.9'. In the future, the version tag will help adzapper remain compatible with past versions of the zaplet file format.

The 'host' tag sets the host that this zaplet is for. this can be a host in conventional Internet notation, like 'foo.bar.com', or an IP number in dotted notation, like '234.56.78.9'. If there is more than one zaplet that has the same host specified, the last one that is read in wins. Host matches go from most specific ('www.foo.bar.com' or '234.56.78.9') to least specific ('bar.com' also matches 'www.foo.bar.com', and '234' also matches '234.56.78.9'). The zaplet with the most specific host match wins.

There can only be one host per 'host' tag, and one host tag per zaplet.

If there is no 'host' tag , the zaplet is not valid: the 'host' statement is required. (Note that this is a change from previous versions of the zaplet file format!)

If there is no 'allow_url' tag or if it is empty, no URLs are specifically allowed.

If there is no 'block_url' tag or if it is empty, no URLs are specifically blocked.

There must be at least one 'block_url' or 'allow_url' per zaplet.

If it exists, the zaplet 'default-numeric' is checked if the host is numeric and there is no numeric host matched.

If it exists, the zaplet 'default' is checked if there is no host matched.

The 'allow_url' tag says that certain URLs are to be allowed through, despite whatever block tag follow. This allows for certain navigational images to be displayed even when a site puts all its images on a single server or in a single directory, for example.

Both literal strings and perl-style regular expressions can be used in block or allow tag. To specify a literal string, use a type='literal' attribute; to specifiy a regular expression, use type='regex'. The entire URL of the object is used as the target for the regular expression or literal match.

 

Blocking Popups

adzapper now has the ability to block popup windows. It does this by filtering HTML, removing javascript window.open calls. You can easily access this from the web ui by clicking the 'block all popups from this site' button in the Edit Zaplet screen, or by entering a literal string or regular string into the appropriate boxes. All pages that match the expression will be filtered.

The tag to do this is called 'block popups' and follows the same syntax as 'block_url':

<?xml version="1.0"?>
<zaplet>
    <version>0.9</version>
    <host>fool.com</host>
    <allow_url type="everything"/>
    <block_popups type="everything"/>
    <block_popups type="literal">offendingPage</block_popups>
</zaplet>

 

Content Filtering

adzapper now has the ability to filter HTML files. This functionality is not currently accessible from the web UI, so you will have to edit zaplet files directly to use it.

Filters are composed of three parts: the 'filter_match_url' tag, the 'filter_match_text' tag, and the 'filter_replace_url' tag. 'filter_match_url' specifies the range of URLs to apply the filter too. It takes a 'type' attribute like the 'allow_url' and 'block_url' tags.

'filter_match_text' specifies the range of text inside the matched document. It takes a 'type' attribute which must be either 'regex' or 'literal'. 'filter_replace_text' is a literal string to replace the matched text with.

Note: you must use CDATA sections if you use reserved XML characters in your content filter!

Here's what a content filter zaplet looks like:

<zaplet>
    <version>0.9</version>
    <host>dir.yahoo.com</host>
    <allow_url type="everything"/>

    <filter>
      <filter_match_url type="everything"/>
      <filter_match_text type="regex"><![CDATA[<center><a href="http://rd.yahoo.com/.*</a></center
>]]></filter_match_text>
      <filter_replace_text><![CDATA[<ADVERTISEMENT/>]]></filter_replace_text>
    </filter>
</zaplet>

Compatibility

I don't guarantee that the zaplet file format will remain stable over the 0.x.x releases -- this is alpha software right now! :-)

However, if you contribute zaplets, I will make sure to convert them to whatever new format I use in the future, if there is a file format change.

The idea, however, is for the file format to evolve, but still be backwards compatible with older zaplets.

 

Adding a zaplet

If you use the adzapper GUI to make a zaplet, it takes effect immediately, and is automatically saved in your site zaplet directory.

If you use a text editor to write the zaplet, save it to your site zaplet directory. Then stop adzapper and restart it. (On Unix you can send adzapper a SIGHUP, which accomplishes the same thing.)

 

Philosophy

The idea behind zaplets is to allow easy configurability without changing the adzapper program; and to allow people to share zaplets easily.

When you find yourself visiting a web site a lot that has ads, you can quickly make up a zaplet using the web UI. Voila! no more ads.

When you're happy with the results you are getting from a zaplet, if you send it to me, I will add it to the repository that is posted to the web and that gets distributed with adzapper.

When submitting zaplets, please just include them in the body of your message. I use a script to filter the zaplets out of the email message save them in the right place.

Eventually I hope to have an automated or semi-automated way of submitting or updating zaplets, and an easy way to search an archive of zaplets. for now I'll keep an updated copy of the zaplets directory and a tar.gz archive of the directory available at:

http://www.zaplet.org/adzapper/updates.html

Coming soon: automatic checking of remote zaplet repositories and one-click posting of zaplets to the repository

 

adzapper Home | About Adzapper | Installing | Zaplets | Zaplet File Format | Zaplet Updates


Adam Feuer
adamf at pobox.com (replace the 'at' with '@' to contact me via email)
 

http://www.zaplet.org | sitemap