-
Special characters show up as a question mark inside of a black diamond
Posted on June 6th, 2009 43 commentsAlmost every web developer has run into the problem of character sets and character encoding. Joel On Software has the most succinct post on the topic of Unicode.
Here’s the problem. Your web page has certain characters that cannot be displayed properly. Instead of typographer’s quotes (“curly quotes” instead of foot ' and inch " marks), ‘e acute’ (as in the word résumé), the copyright symbol (©), registered symbol (®), etc., usually copied from a program like Microsoft Word, your webpage renders with the dreaded “black diamond question mark” symbol: �
Since the earliest days of the web, we’ve been using HTML Entities to create these characters. HTML Entities are escape sequences to represent special characters in your web page markup. For example, the syntax
©
renders as ©, in a webpage. I realize I can simply use these escape codes to get special characters to display correctly, but why? What if I have hundreds of pages of content with curly quotes in them and I just want to be able to render a page without using HTML entities?
When I develop websites, I run WAMPServer, which uses PHP 5, MySQL 5, and Apache 2 on Windows XP. I’ve been confused by this topic off and on for over 2 years now. And I’m not the only one.
I’ve tried trouble-shooting the character encoding and serving problem from the top down, starting with the web server software on down the line.
I have edited my Apache httpd.conf file with
AddDefaultCharset UTF-8
I have edited my PHP.ini file with
default_charset = "utf-8"
I have also made sure that MySQL is using UTF-8. This includes both the MySQL Database itself…
…the MySQL connection, the MySQL table, and the MySQL field where my data is stored.
As you can see here, I even have gone into Firefox and set it to accept UTF-8 and receive UTF-8.
Still, I get unrenderable characters. WHY!?

I’m using Firebug to display the HTML Headers, and I’ve verified this is not a bug in Firefox. I’m seeing the dastardly � character whether I use Firefox 2, Firefox 3, Opera, Safari, or Chrome.
I’m sure there’s a character encoding guru out there somewhere that can tell me what I’m missing. I know, I know, I can just turn on iso-8859-1 (Windows Latin), anywhere along the chain of encoding, and everything will be fine. And indeed, this is true. It seems almost unfathomable that I’ve checked every possible setting related to the character set of the content type of the page I am trying to serve, and still get � everywhere.
Still, I thought the whole idea behind the move to UTF-8 was to prevent me from having to worry about all this stuff. I’d love to just happily store pages, create pages and serve pages in UTF-8 so all my characters look like they’re supposed to and I don’t have to escape them at all. Isn’t that the point?
I’m not convinced that I fixed the issue, but I have found a workaround. I decided to turn off the charset handling in both httpd.conf and php.ini, and added…
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
…to my page template. It works, but I still want to know why it works, or more accurately, why declaring everything UTF-8 doesn’t.
Update Oct. 2011
I’ve just discovered yet another issue that is not easy to figure out. It turns out that you can get the dreaded question-mark-in-diamond characters even in UTF-8 encoded files, if the file is written with a BOM (byte-order-mark). We had a PHP application including several files, one of which was encoded with a BOM. The special characters such as ç and õ were showing up fine on one part of the page, and as � on other parts of the page. We removed the BOM on one of the include files with NotePad++ on Windows and everything was fine again.
43 responses to “Special characters show up as a question mark inside of a black diamond”

-
I was just looking for the same thing and found an answer – its happening because your text has been written to the database in iso-8859-1 format, so you just need to convert the data from iso-8859-1 to utf8 before outputting it. E.g.
$text = “some iso-8859-1 string from database”;
$text = utf8_encode($text);
echo $text;hope that helps
-
Santosh Joshi October 28th, 2009 at 03:08
The UTF-8 encoding is restricted to show character between this range U+0000 to U+10FFFF.
Any character beyond this range is treated as invalid and the UTF-8 decoder just inserts a Replacement character for it which is �.
see:
1) http://en.wikipedia.org/wiki/UTF-8
2) http://en.wikipedia.org/wiki/Replacement_characterThe above two links nicely explains the same.
-
Make sure you set the client to UTF as well (e.g. in PHP):
mysqli_set_charset($dbc, ‘utf8′);
-
vikram February 26th, 2010 at 00:30
Hi All,
Followed through the discussion and i have a valid UTF8 character in Database(Séguin, Daniel). Php.ini and httpd.conf do not have any default character-set defined.
Current Character set in the Page isHowever in Chrome/FFox, the data is seen as S�guin and in IE as Sguin, Daniel
Is there something i am missing that i still find the �?.
Any Workaround please.
Vikram. -
phpguru, are you by chance performing any type of character manipulation using a PHP function? For example, strtolower() and ucwords() are not designed for operation on multi-byte characters, and if one uses such functions on multi-byte characters, the result will contain � wherever a multi-byte character would otherwise appear.
Most string manipulation functions have a multi-byte version. In the case of the examples cited above, one should instead use
mb_convert_case($str, MB_CASE_TITLE, ‘UTF-8′)
and
mb_strtolower($str, ‘utf-8′)
respectively.
You did not provide code samples from your MySQL queries, and as such, I assume that you have already done the following before executing the MySQL queries that return UTF-8 characters in the result set:
mysql_query(“SET NAMES ‘UTF8′”);
mysql_query(“SET CHARACTER SET ‘UTF8′”);While I have not had to employ this particular measure myself, others have stated that
mysql_set_charset(‘utf8′);
may also be necessary in some cases.
-
Logan May 19th, 2010 at 10:43
Were having the same problem on our website— almost all of our products have the “degree” character in them (tool website)… the problem is that we are seeing the black-diamond EVERYYYWHERE on all 5,000+ pages.. all of our UTF-8 stuff is set correctly… the ‘?’ actually appears IN the database, but if i use htmlentities($value) when inserting– magento doesnt decode it on the other side…
any thoughts??
thanks
-
Loragnor July 27th, 2010 at 16:53
I see this is an old discussion, but…
This can be caused by something else that has nothing to do with encoding, casts, conversions, etc. I ran across it while developing an error handling class. Part of the process was to create a backtrace. During development, I was printing it to the screen (using ‘pre’ tags) and to a log. The black diamond was there whenever an object’s private properties were being enumerated. This did not happen with public properties. In a browser it would look like this: ObjectNamepropName. Printed to the log, it looked like this: NULObjectNameNULpropName.
Come to find out the diamonds were actually the ASCII NULL character (a backslash followed by a zero, 0×00 in hex). I used this to clear it out:
/* Replace ASCII NULL with empty string */
$var = str_replace("\0",'',$var);
Hope this helps someone. Took me several hours and half a pack of smokes to figure it out. -
Aeronya Arai November 18th, 2010 at 01:37
Is there a solution for when this happens with a simple character such as a ‘space’, the one you get when you press the spacebar on your keyboard? I’m asking because I’ve recently begun to see random spaces in text I entered in my MySpace blog using the built in text editor being replaced with that question mark inside a diamond symbol. I didn’t use any escape sequences or any special characters just the regular space from the spacebar, additionally it’s not happening to every space I insert, only in a few locations where I used 2 spaces next to each other as proper typing etiquette dictates you should use when starting a new sentence, even then it’s not happening to all of those instances, only a couple in random places in the text I type, but only where 2 spaces have been placed. Anyone have any idea how to fix this, I have tried editing it removing the offending symbol but every time I submit the changes it comes back. Thanks for any help you can provide.
-
Alex Jones December 14th, 2010 at 05:58
The code from phpguru worked for me – no more diamonds!! I basically had a bunch of html code in a bunch of database fields that was “diamond infested” – phpguru’s template code above worked. Thanks again phpguru.
-
Javier Mosquera December 14th, 2010 at 12:12
Hey there. I had the EXACT same problem with those black diamonds.
I read a tutorial (in spanish, sadly for you), but it all comes to one simple line you have to add JUST after you selected your data base.
This is the line:
mysql_query (“SET NAMES ‘utf8′”);Example:
Do not decode nor encode the text. Do not do that, ok?
What else……??? Oh! you also have to set your charset as UTF-8 like this:
And also this…. you’ll have to set all your data base stuff to the charset:
utf8_unicode_ciHaving done all that, you should be OK.
The line of code…
… you’ll have to use it on ALL the pages involving the text you want to decode, that means it has to be on the “edit recordset page” (if there is one).
OK… that’s the solution I found. Hope you can use it same way I did. Cheers!
-
You can also have this issue if you have copy and pasted text directly from MS Word or another text editor.
I had the same issue and had to delete and re-enter the suspect characters like ” , ” and ‘ manually.
-
I’ve had to deal with UTF-8 from a slightly different perspective, but it may help the situation here.
UTF-8 is not the same as “allow all characters to go through.”
UTF-8 actually encodes most of the Unicode character set into multi-byte characters. The kicker here is that it means many bytes are not valid characters, unless they’re preceded by the proper prefix byte.
If memory serves, the Wikipedia article has a table of the UTF-8 character set and how it’s encoded. Any textual description of UTF-8 is hard to understand … and I’ve been a software engineer for over 30 years!
ISO8859-1, a.k.a. ISO-Latin1, doesn’t perform this extra encoding, allowing any 8-bit character to go through.
-
Angelo February 15th, 2011 at 09:48
Well, it looks like I’m not the only one having that issue, but mine is now slightly different.
I’m migrating server from FreeBSD to CentOS, so copied the DB, obviously files are exactly the same and magic happens, diamonds everywhere.
The database is set to latin1_swedish_ci, my HTML document is set to iso-8859-1, obviously this can be an issue (I think), but still how can I explain it works fine on the other server? the DB and files are exactly the same, only thing that has changed is the server.
Any ideas?
-
Bill Getas February 23rd, 2011 at 22:50
This is a very good thread and more useful than the Joel On Software page (which is also informative, but not as practical IMHO).
I’ve not yet found my answer — I’m copy/pasting “curly quotes” and other “special typographics” from EditPadPro, SlickEdit, Word and other apps and pasting directly into SlickEdit (which displays the spec chars properly), then saving to html file served by apache (using adddefaultcharset utf-8), but still getting the diamond question box…
It appears a function is necessary since the “curly quotes” exist only in WINDOWS-1252 but not in ISO-8859-1 or UTF-8, so THANK YOU (AGAIN) MICROSOFT. Let’s be sure to correctly identify to culprit, and, once again, it’s Micro$oft.
http://shiflett.org/blog/2005/oct/convert-smart-quotes-with-php
-
Faye Harriet February 25th, 2011 at 06:08
Hi, i found this page, i am having this issue on my computer. Certain forums I go on has this character for me and no one else.
In laymans terms what do i need to do??
-
Loragnor, your solution worked miracles,
well done!
-
@Loragnor, thanks, that solution really worked for me.
-
Al McNicoll June 12th, 2011 at 16:11
As this is high on Google results for the subject, thought I’d feed back that adding the $mysqli->set_charset(‘utf8′); line worked for me. To get to the point where that was the only remaining problem, I had already added the following into my standard all-pages header code:
@setlocale(LC_ALL,’en_GB’);
@define(‘CHARSET’,'utf-8′);
header(‘Content-type: text/html; charset=utf-8′);And thus ends 20 minutes of frustrated searching! Thanks for initiating the discussion.
Al
-
I bought oscommerce template and the helpdesk didnt give me answer, they only sent me to this forum.
So maybee someone in here can be able to tell me how i can chsnge the font or what i should do to get the webshop to show the character æ, å, ø
Is that possible on a easy way? -
I am having this same issue with a clients website @ http://5loaves2fishes.net The strange thing is I am using (at the clients request) a third party javascript. The CMS is Drupal 6 and the script is placed in the body of its own content type. So the content is not actually a node. I suppose the way to correct this would be to create a parser to store the content in the database where the proper utf-8 characters would be stored. Does anybody know of a way around this issue? In a simple way? The site has already consumed too much of my time.
-
I too came across the dreaded black diamond in a question mark, and found my cause!…
I couldn’t fathom why to begin with, as I was simply trying to store a description as a PHP variable to then be echo’d by PHP within a page. It turned out I was using a “closing apostrophe” (not supported) rather than a “regular one”.
The cause to this “closing apostrophe” was simple, it had appeared because I was copying and pasting content from MSWord into my PHP files, rather than typing it.
Try going through your text and retyping the apostrophes and other special characters.
-
Pawel Reszka September 18th, 2011 at 12:36
Hey, I am having this problem on my blog right now. Cannot figure this out. Talked to Hostgator but they told I would need to go through all posts manually to fix this LOL (hundreds of posts and thousands of comments)
Can you please help me with a solution?
My blog url is under my name. All I see is diamonds all over my content at this time.
-
Hi,
Even this is a old thread but I hope somebody can help me.
I don’t use PHP but asp.net and MySql. I have read whole thread and made any changes regardomg encodings but still I see only question marks on my site. I enter arabic letters.
U have set all my database stuff to the charset:
utf8_unicode_ci (except database mysql server itself becuase I don’t have access to the web host’s server).
I have all above combinations but still only question marks are saved in database table.
thank you for your help -
You’re missing a step:
Your forms must have accept-charset=”utf-8″ or the browser will convert on POST.
-
I ran into this problem while scraping product information from a supplier website and pasting it to the update form on our windows xp application that uses a mssql database. No obvious problems doing that.
However, I subsequently retrieve this data and display on our website (LAMP) and there the little blighters are.
Fixed this for now by doing the following to the content before I display it.
$bad = array("’","—","é","”"); // I hope these show up in the post!!!!
$good = array("'","-","é","\"");
$m_long = str_replace($bad,$good,$row['Description']);Where did I get those bad characters? Why I copied and pasted them into my code from the original supplier website.
Now I know that this is a bandaid. But it serves my purpose for now by editing the most frequent offenders!
-
Thank you for all this help. I had two pages which persisted with question marks and black diamonds, the words of a hymn were being used in a novel in quotes and with apostrophes and whenever I used Mozilla Firefox as browser, the dreaded black diamonds appeared.
As it was only two pages and as I haven’t got a clue about programming, I tried mmck’s useful tip about retyping them in… and YES !!! they disappeared and the right symbols reappeared. As my website is pretty DIY anyway, I am especially grateful as it made it look really terrible. Thank you everyone and especially mmck. -
If you are working with c# and find these characters popping up in your controls… See this link.
-
alex allotey December 25th, 2011 at 06:20
Javier Mosquera your solution worked like magic
thanks a million and one
-
Javier Mosquera, exactly as alex noted – everything worked like magic. Been struggling with this for hours. Thanks very much!
-
Can someone please help me figure this out? On my computer at work Facebook shows up as all characters only. No letters whatsoever. I think its some kind of encoding thing but Facebook is the only page that does this. I wouldn’t even be worried but since it’s at work im afraid its a virus and I dont want to get in trouble. Yesterday My tool bar completely disappeared so I was hitting f1-f12 randomly and changing everything I could on the tool bar to fix it-thinking it would fix it (I figured out how to make the tool bar reappear) but sometime later I noticed facebook looked like this: ����0�gs΄��;�. The whole page-and at the top of the tab it has the whole http://www.facebook.com instead of just facebook. This website seems a little in depth for my problem but Ive been looking for a couple days on how to fix this and this is the only one ive come across that has a spot to ask questions and isnt from 2007. If anyone AT ALL could help I would REALLY appreciate it. Thanks!
-
I found it really easy to correct this issue once I learned the character code was 160. For me, I was accessing content from a MySQL database that was saved by WordPress’ TinyMCE editor.
A PHP line of code like the following takes care of the issue. I wanted to correct it within PHP and not via a MySQL query:
$clean_content = str_replace(chr(160), ” “, $bad_content);
Leave a reply
-






Ian October 15th, 2009 at 15:53