PHP Stripping Certain Characters.

Aug 19 2011 11:31 AM AUG 19 2011 11:31 AM
Remove everything but Alphanumeric, spaces, dashes, and underscores.HTML | PHP | Tutorials

Filtering inputs is often a huge issue when figuring out which method is best. Typicallly its a matter of what exactly are your needs. Asking these few questions can resolve the problem.

  1. Is the input needing more than letters a-z and 0-9?
  2. Does the input need to be able to allow some tags but not others?
  3. Can you filter everything out and not worry about what it returns?
  4. Can you convert html characters to their entities so it still displays properly?

Often times its necessary to filter out basically everything in a string for security reasons. Its usually a good idea to do this to everything except for items such as WYSIWYG editors.. Their are many ways to handle this but one of the ways I like to do is below.


<?php  

   $string = "Hi! This is a string that does not allow #$#&@ or any <b>tags </b>";
   $string = preg_replace("/[^a-zA-Z0-9s-_]/", "", $string);
   echo $string;
?>

This method is great however, there are times as stated above that we want to allow characters and symbols. preg_replace tends to be a bit bias and fails to allow anything more beyond it. Another issue here is that it will also filter out everything leaving "b's."


<?php

$string = "Hi! This is a string that does not allow #$#&@ or any <b>tags </b>";

$string = filter_var($string, FILTER_SANITIZE_STRING);

echo $string;

?>

Works great however, its not always the best route as you may want only certain tags to be stripped.

Another script I recently found on PHP.net


<?php

function strip_only_tags($str, $tags, $stripContent=false) {
   
$content = '';
    if(!
is_array($tags)) {
       
$tags = (strpos($str, '>') !== false ? explode('>', str_replace('<', '', $tags)) : array($tags));
        if(
end($tags) == '') array_pop($tags);
    }
    foreach(
$tags as $tag) {
        if (
$stripContent)
            
$content = '(.+.$tag.'(>|s[^>]*>)|)';
        
$str = preg_replace('#.$tag.'(>|s[^>]*>)'.$content.'#is', '', $str);
    }
    return
$str;
}
?>

Allows you to input what you want to strip out. Only the issue here is that you have to type in several variations to get exactly what you need.

HTMLPurifier is another tool, but is heavier than the above methods.