The forums have permanently moved to forum.kirupa.com. This forum will be kept around in read-only mode for archival purposes. To learn how to continue using your existing account on the new forums, check out this thread.


Results 1 to 3 of 3

Thread: Weird characters; using UTF-8

  1. #1

    Weird characters; using UTF-8

    �Hi everyone,

    I'm building a website and although I'm using UTF-8, I'm getting strange characters on some words.

    Anyone has any idea of what could be causing this?

    Thanks in advance.

    Kind regards.

  2. #2
    jwilliam's Avatar
    476
    posts
    An intellectual carrot!
    If you send UTF-8 headers then the text you serve up must be encoded as UTF-8. In other words, save your text documents in UTF-8 (instead of Latin-1... or whatever). If the text is coming from a database, make sure it is encoded as UTF-8.

  3. #3
    This is almost always due decoding text using the wrong character encoding, and most commonly due to UTF-8-encoded text being interpreted as ISO-8859-1 or vice-versa. If you're not familiar with character encodings or unicode, read this article first.

    For websites in particular, you must always specify the character encoding of your HTML output. There are two ways to do this: through the HTTP Content-type header and through an HTML <meta> tag, but the HTTP header is pretty much the only one that really matters. In particular, you should send the following header:

    Code:
    Content-type: text/html; charset=UTF-8
    Or, if you're an XHTML purist:

    Code:
    Content-type: application/xhtml+xml; charset=UTF-8
    The point is to always append the "; charset=xxx" part for text-based content types. In Firefox, you can check whether you are serving the correct headers by right-clicking anywhere on the page and choosing View Page Info.

    Next, as jwilliam pointed out, you must make sure that everything you output is in fact UTF-8. This is usually not a problem for static text as UTF-8 is backwards-compatible with old and trusty ASCII, but if you're displaying database data, it's easy to run into character encoding issues. In particular, PHP's connection to MySQL uses the ISO-8859-1 encoding by default. This means that any text incoming from MySQL will be encoded as ISO-8859-1, and thus that any text from MySQL you're displaying on your page will be ISO-8859-1-encoded but UTF-8-decoded by your browser.

    You can change the default connection character encoding in your MySQL my.ini file, or alternatively you can send a SET NAMES 'utf-8' query after establishing your connection. This will cause any incoming text from MySQL to be UTF-8-encoded.

    Additionally, you must make sure to specify UTF-8 as the character set when using htmlspecialchars or htmlentities (which you should be using at all times when outputting dynamic data to the page), as ISO-8859-1 is assumed by default.
    Wait, what?

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

Home About kirupa.com Meet the Moderators Advertise

 Link to Us

 Credits

Copyright 1999 - 2012