You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
207 lines
7.3 KiB
207 lines
7.3 KiB
18 years ago
|
kses 0.2.2 README [kses strips evil scripts!]
|
||
|
=================
|
||
|
|
||
|
|
||
|
* INTRODUCTION *
|
||
|
|
||
|
|
||
|
Welcome to kses - an HTML/XHTML filter written in PHP. It removes all unwanted
|
||
|
HTML elements and attributes, no matter how malformed HTML input you give it.
|
||
|
It also does several checks on attribute values. kses can be used to avoid
|
||
|
Cross-Site Scripting (XSS), Buffer Overflows and Denial of Service attacks,
|
||
|
among other things.
|
||
|
|
||
|
The program is released under the terms of the GNU General Public License. You
|
||
|
should look into what that means, before using kses in your programs. You can
|
||
|
find the full text of the license in the file COPYING.
|
||
|
|
||
|
|
||
|
* FEATURES *
|
||
|
|
||
|
|
||
|
Some of kses' current features are:
|
||
|
|
||
|
* It will only allow the HTML elements and attributes that it was explicitly
|
||
|
told to allow.
|
||
|
|
||
|
* Element and attribute names are case-insensitive (a href vs A HREF).
|
||
|
|
||
|
* It will understand and process whitespace correctly.
|
||
|
|
||
|
* Attribute values can be surrounded with quotes, apostrophes or nothing.
|
||
|
|
||
|
* It will accept valueless attributes with just names and no values (selected).
|
||
|
|
||
|
* It will accept XHTML's closing " /" marks.
|
||
|
|
||
|
* Attribute values that are surrounded with nothing will get quotes to avoid
|
||
|
producing non-W3C conforming HTML
|
||
|
(<a href=http://sourceforge.net/projects/kses> works but isn't valid HTML).
|
||
|
|
||
|
* It handles lots of types of malformed HTML, by interpreting the existing
|
||
|
code the best it can and then rebuilding new code from it. That's a better
|
||
|
approach than trying to process existing code, as you're bound to forget about
|
||
|
some weird special case somewhere. It handles problems like never-ending
|
||
|
quotes and tags gracefully.
|
||
|
|
||
|
* It will remove additional "<" and ">" characters that people may try to
|
||
|
sneak in somewhere.
|
||
|
|
||
|
* It supports checking attribute values for minimum/maximum length and
|
||
|
minimum/maximum value, to protect against Buffer Overflows and Denial of
|
||
|
Service attacks against WWW clients and various servers. You can stop
|
||
|
<iframe src= width= height=> from having too high values for width and height,
|
||
|
for instance.
|
||
|
|
||
|
* It has got a system for whitelisting URL protocols. You can say that
|
||
|
attribute values may only start with http:, https:, ftp: and gopher:, but no
|
||
|
other URL protocols (javascript:, java:, about:, telnet:..). The functions that
|
||
|
do this work handle whitespace, upper/lower case, HTML entities
|
||
|
("javascript:") and repeated entries ("javascript:javascript:alert(57)").
|
||
|
It also normalizes HTML entities as a nice side effect.
|
||
|
|
||
|
* It removes Netscape 4's JavaScript entities ("&{alert(57)};").
|
||
|
|
||
|
* It handles NULL bytes and Opera's chr(173) whitespace characters.
|
||
|
|
||
|
* There is a procedural version and two object-oriented versions (for PHP 4
|
||
|
and PHP 5) of kses.
|
||
|
|
||
|
|
||
|
* USE IT *
|
||
|
|
||
|
|
||
|
It's very easy to use kses in your own PHP web application! Basic usage looks
|
||
|
like this:
|
||
|
|
||
|
|
||
|
<?php
|
||
|
|
||
|
include 'kses.php';
|
||
|
|
||
|
$allowed = array('b' => array(),
|
||
|
'i' => array(),
|
||
|
'a' => array('href' => 1, 'title' => 1),
|
||
|
'p' => array('align' => 1),
|
||
|
'br' => array());
|
||
|
|
||
|
$val = $_POST['val'];
|
||
|
if (get_magic_quotes_gpc())
|
||
|
$val = stripslashes($val);
|
||
|
# You must strip slashes from magic quotes, or kses will get confused.
|
||
|
|
||
|
$val = kses($val, $allowed); # The filtering takes place here.
|
||
|
|
||
|
# Do something with $val.
|
||
|
|
||
|
?>
|
||
|
|
||
|
|
||
|
This definition of $allowed means that only the elements B, I, A, P and BR are
|
||
|
allowed (along with their closing tags /B, /I, /A, /P and /BR). B, I and BR
|
||
|
may not have any attributes. A may only have the attributes HREF and TITLE,
|
||
|
while P may only have the attribute ALIGN. You can list the elements and
|
||
|
attributes in the array in any mixture of upper and lower case. kses will also
|
||
|
recognize HTML code that uses both lower and upper case.
|
||
|
|
||
|
It's important to select the right allowed attributes, so you won't open up
|
||
|
an XSS hole by mistake. Some important attributes that you mustn't allow
|
||
|
include but are not limited to: 1) style, and 2) all intrinsic events
|
||
|
attributes (onMouseOver and so on, on* really). I'll write more about this in
|
||
|
the documentation that will be distributed with future versions of kses.
|
||
|
|
||
|
It's also important to note that kses' HTML input must be cleaned of all
|
||
|
slashes coming from magic quotes. If the rest of your code requires these
|
||
|
slashes to be present, you can always add them again after calling kses with
|
||
|
a simple addslashes() call.
|
||
|
|
||
|
You should take a look at the documentation in the docs/ directory and the
|
||
|
examples in the examples/ directory, to get more information on how to use
|
||
|
kses. The object-oriented versions of kses are also worth checking out, and
|
||
|
they're included in the oop/ directory.
|
||
|
|
||
|
|
||
|
* UPGRADING TO 0.2.2 *
|
||
|
|
||
|
|
||
|
kses 0.2.2 is backwards compatible with all previous releases, so upgrading
|
||
|
should just be a matter of using a new version of kses.php instead of an old
|
||
|
one.
|
||
|
|
||
|
|
||
|
* NEW VERSIONS, MAILING LISTS AND BUG REPORTS *
|
||
|
|
||
|
|
||
|
If you want to download new versions, subscribe to the kses-general mailing
|
||
|
list or even take part in the development of kses, we refer you to its
|
||
|
homepage at http://sourceforge.net/projects/kses . New developers and beta
|
||
|
testers are more than welcome!
|
||
|
|
||
|
If you have any bug reports, suggestions for improvement or simply want to tell
|
||
|
us that you use kses for some project, feel free to post to the kses-general
|
||
|
mailing list. If you have found any security problems (particularly XSS,
|
||
|
naturally) in kses, please contact Ulf privately at metaur at users dot
|
||
|
sourceforge dot net so he can correct it before you or someone else tells the
|
||
|
public about it.
|
||
|
|
||
|
(No, it's not a security problem in kses if some program that uses it allows a
|
||
|
bad attribute, silly. If kses is told to accept the element body with the
|
||
|
attributes style and onLoad, it will accept them, even if that's a really bad
|
||
|
idea, securitywise.)
|
||
|
|
||
|
|
||
|
* OTHER HTML FILTERS *
|
||
|
|
||
|
|
||
|
Here are the other stand-alone, open source HTML filters that we currently know
|
||
|
of:
|
||
|
|
||
|
* Htmlfilter for PHP - the filter from Squirrelmail
|
||
|
PHP
|
||
|
Konstantin Riabitsev
|
||
|
http://linux.duke.edu/projects/mini/htmlfilter/
|
||
|
|
||
|
* HTML::StripScripts and related CPAN modules
|
||
|
Perl
|
||
|
Nick Cleaton
|
||
|
http://search.cpan.org/perldoc?HTML%3A%3AStripScripts
|
||
|
|
||
|
* SafeHtmlChecker [is this really open source?]
|
||
|
PHP
|
||
|
Simon Willison
|
||
|
http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker
|
||
|
|
||
|
There are also a lot of HTML filters that were written specifically for some
|
||
|
program. Some of them are better than others.
|
||
|
|
||
|
Please write to the kses-general mailing list if you know of any other
|
||
|
stand-alone, open-source filters.
|
||
|
|
||
|
|
||
|
* DEDICATION *
|
||
|
|
||
|
|
||
|
kses 0.2.2 is dedicated to Audrey Tautou and Jean-Pierre Jeunet.
|
||
|
|
||
|
|
||
|
* MISC *
|
||
|
|
||
|
|
||
|
The kses code is based on an HTML filter that Ulf wrote on his own back in 2002
|
||
|
for the open-source project Gnuheter ( http://savannah.nongnu.org/projects/
|
||
|
gnuheter ). Gnuheter is a fork from PHP-Nuke. The HTML filter has been
|
||
|
improved a lot since then.
|
||
|
|
||
|
To stop people from having sleepless nights, we feel the urgent need to state
|
||
|
that kses doesn't have anything to do with the KDE project, despite having a
|
||
|
name that starts with a K.
|
||
|
|
||
|
In case someone was wondering, Ulf is available for kses-related consulting.
|
||
|
|
||
|
Finally, the name kses comes from the terms XSS and access. It's also a
|
||
|
recursive acronym (every open-source project should have one!) for "kses
|
||
|
strips evil scripts".
|
||
|
|
||
|
|
||
|
// Ulf and the kses development group, February 2005
|