A while ago the Clean Options post got a trackback comment from a Turkish blog. After upgrading to the new WordPress version I validated the page.
The w3c validator complained
Sorry! This document can not be checked.
…..
Sorry, I am unable to validate this document because on line 292 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: utf8 "\xFC" does not map to Unicode
Ideally, if I knew Turkish, I would prefer to replace the characters in the excerpt with similar looking ones. But I don't. So I wrote a new function and added it to the wp-includes/comment-template.php file.
/**
* replace_non_unicode_chars($comment_string = '')
*
* Hack
* Replaces non-unicode characters that cause validation problems.
* 0-127 range of decimal characters are "universal" and should be OK to keep as is.
* 128-255 range of decimal characters may pass validation but could render inconsistently.
* Other replacement characters (or none) can be used instead of question mark.
* TO DO: look into possibility of using Multibyte String Functions instead.
*
* @param string $comment_string Passed from get_comment_author_link() and get_comment_text()
*
* $author = get_comment_author();
* $author = replace_non_unicode_chars($author); //new line
*
* global $comment;
* $comment->comment_content = replace_non_unicode_chars($comment->comment_content); //new line
*
* @return string The string with upper decimal characters replaced.
*/
function replace_non_unicode_chars($comment_string = '')
{
$cslen = strlen($comment_string);
$valid_str = "";
$upper_decimal = 127;
$replacement_character = "?";
for($ci = 0; $ci < $cslen; $ci++)
{
if ( ord(substr($comment_string, $ci, 1)) > $upper_decimal )
{
$valid_str .= $replacement_character;
}
else
{
$valid_str .= substr($comment_string, $ci, 1);
}
}
return $valid_str;
}
Then I filtered the strings that were causing the validation problems by adding lines to the file's get_comment_author_link and get_comment_text functions
function get_comment_author_link() {
/** @todo Only call these functions when they are needed. Include in if... else blocks */
$url = get_comment_author_url();
$author = get_comment_author();
/* begin non-unicode hack */
$author = replace_non_unicode_chars($author);
/* end non-unicode hack */
if ( empty( $url ) || 'http://' == $url )
$return = $author;
else
$return = "<a href='$url' rel='external nofollow'>$author</a>";
return apply_filters('get_comment_author_link', $return);
}
and
function get_comment_text() {
global $comment;
/* begin non-unicode hack */
$comment->comment_content = replace_non_unicode_chars($comment->comment_content);
/* end non-unicode hack */
return apply_filters('get_comment_text', $comment->comment_content);
}
Not the best solution perhaps, but it's the best I can do for now.