[Esd-l] Why are urls in html decoded?
John D. Hardin
jhardin at impsec.org
Mon Mar 24 07:11:27 PST 2003
On 24 Mar 2003, Anders Nielsen wrote:
> I am using revision 1.138 of the sanitizer. I have noticed that it
> URL decodes the links in the html-part of a message. Is this this
> correct? I don't understand why it does that - isn't URLs in html
> supposed to have these encodings?
The encodings are quite often used by spammers to confuse
string-matching antispam filters. The sanitized decodes printable
characters (alphanumerics and certain punctuation marks) so that
something like "%46%52%45%45", that has no legitimate reason to be
encoded, becomes "FREE" and thus might contribute to the
classification of a message as spam.
This is about the only nod to spam filtering that the sanitizer makes.
> I other words: Why is &q=http%3A%2F%2Fwww.jobindex.dk turned into
> &q=http://www.jobindex.dk
Did that break the link?
--
John Hardin KA7OHZ ICQ#15735746 http://www.impsec.org/~jhardin/
jhardin at impsec.org pgpk -a jhardin at impsec.org
key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
...voice or no voice, the people can always be brought to the bidding
of the leaders. That is easy. All you have to do is tell them they
are being attacked and denounce the pacifists for lack of patriotism
and exposing the country to danger. It works the same way in any
country.
-- Hermann Goering
-----------------------------------------------------------------------
59 days until The Matrix Reloaded
More information about the esd-l
mailing list