[Esd-l] Has anyone tried removing HTML "code" via a sanitizer
process??
Joaquin Ferrero
atari at pucela.net
Fri Oct 24 04:23:41 PDT 2003
El jue, 23 de 10 de 2003 a las 19:55, Jim Bucks escribió:
> Hello All,
>
> I was wondering if anyone has tried removing HTML code via a sanitizer
> process. I know the resulting text is going to be extremely ugly - and
> probably unreadable.
>
We use this solution (from Randal Schwart, www.stonehenge.com/merlyn and
a little patch of mine). This strip the html part of a multipart
message.
Insert this at .procmailrc file:
--8<--
# Quitar la estupidez de doble version html
:0 Hfw
* ! ^From:.*alerta
* ^Content-type:.*multipart/alternative;
| $HOME/bin/Strip-HTML.pl
--8<--
and at $HOME/Strip-HTML.pl:
--8<--
#!/usr/bin/perl -w
#
# Filter messages for html part.
# Filtrado de mensajes para quitar la parte de html.
#
# Randal L. Schwartz. 2000
# Joaquin Ferrero 2002
#
use strict;
$|++;
my $envelope = <STDIN>;
use MIME::Parser;
use MIME::Entity;
use MIME::QuotedPrint ();
my $parser = MIME::Parser->new;
$parser->output_to_core(1);
$parser->tmp_to_core(1);
my $ent = $parser->parse(\*STDIN);
#$ent->dump_skeleton(\*STDERR); exit 1; #DEBUG
if ($ent->effective_type eq "multipart/alternative"
and $ent->parts == 2
and $ent->parts(0)->effective_type eq "text/plain"
and $ent->parts(1)->effective_type eq "text/html") {
my $charset = $ent->parts(0)->head->mime_attr('content-type.charset');
my $encoding = $ent->parts(0)->head->get('Content-Transfer-Encoding');
my $newbody =
$ent->parts(0)->body_as_string
. "\n\n[[HTML Version removed]]\n";
# . "Charset:$charset\nEncoding:$encoding\n"; #DEBUG
$newbody = MIME::QuotedPrint::decode($newbody) if defined $encoding;
my $newent = MIME::Entity->build(
Data => $newbody,
Charset => $charset,
Encoding => defined ($encoding) ? $encoding : '-SUGGEST',
);
$ent->parts([$newent]);
$ent->make_singlepart;
$ent->sync_headers(Length => 'COMPUTE', Nonstandard => 'ERASE');
print $envelope;
$ent->print;
}
--8<--
--
Joaquin Ferrero <atari at pucela.net>
More information about the esd-l
mailing list