MySQL Character Set

MySQL Character Set

 MySQL Character Set



Summary: in this tutorial, you will learn about the MySQL character set. After the tutorial, you will know how to get all character sets in MySQL, how to convert strings between character sets, and how to configure proper character sets for client connections.

Introduction to MySQL character set

A MySQL character set is a set of characters that are legal in a string. For example, we have an alphabet with letters from a  to z. We assign each letter a number, for example,  a = 1b = 2 etc. The letter  a  is a symbol, and the number 1  that associates with the letter is the encoding. The combination of all letters from a to z and their corresponding encodings is a character set.

Each character set has one or more collations that define a set of rules for comparing characters within the character set. Check it out the MySQL collation tutorial to learn about the collations in MySQL.

MySQL supports various character sets that allow you to store almost every character in a string. To get all available character sets in the MySQL database server, you use the SHOW CHARACTER SET  a statement as follows:

SHOW CHARACTER SET;

The default character set in MySQL is latin1. If you want to store characters from multiple languages in a single column, you can use Unicode character sets, which is utf8 or ucs2.

The values in the Maxlen column specifies the number of bytes that a character in a character set holds. Some character sets contain single-byte characters e.g., latin1 , latin2 , cp850 , etc., whereas other character sets contain multi-byte characters.

MySQL provides the LENGTH function to get a length of a string in bytes, and the CHAR_LENGTH function to get the length of a string in characters. If a string contains the multi-bytes character, the result of the LENGTH function is greater than the result of the CHAR_LENGTH() function. See the following example:

SET @str = CONVERT('MySQL Character Set' USING ucs2); SELECT LENGTH(@str), CHAR_LENGTH(@str);

The CONVERT the function converts a string into a specific character set. In this example, it converts the character set of the MySQL Character Set  string into ucs2 . Because ucs2 the character set contains 2-byte characters, therefore the length of the @str  string in bytes is greater than its length in characters.

Notice that some character sets contain multi-byte characters,  but their strings may contain only single-byte characters e.g., utf8  as shown in the following statements:

SET @str = CONVERT('MySQL Character Set' USING utf8); SELECT LENGTH(@str), CHAR_LENGTH(@str);

However, if a utf8 string contains special character e.g., Ã¼  in the pingüino string; its length in bytes is different, see the following example:

SET @str = CONVERT('pingüino' USING utf8); SELECT LENGTH(@str), CHAR_LENGTH(@str);

Converting between different character sets

MySQL provides two functions that allow you to convert strings between different character sets: CONVERT and CAST. We have used the CONVERT function several times in the above examples.

The syntax of the CONVERT the function is as follows:

CONVERT(expression USING character_set_name)

The CAST the function is similar to the CONVERT function. It converts a string to a different character set:

CAST(string AS character_type CHARACTER SET character_set_name)

Take a look at the following example of using the CAST  function:

SELECT CAST(_latin1'MySQL character set' AS CHAR CHARACTER SET utf8);

Setting character sets for client connections

When an application exchanges data with a MySQL database server, the default character set is latin1. However, if the database stores Unicode strings in the utf8 character set, using the latin1 character set in the application would not be sufficient. Therefore, the application needs to specify a proper character set when it connects to the MySQL database server.

To configure a character set for a client connection, you can do one of the following ways:

  • Issue the SET NAME  statement after the client is connected to the MySQL database server. For example, to set a Unicode character set utf8, you use the following statement:
SET NAMES 'utf8';
  • If the application supports the --default-character-set  option, you can use it to set the character set. For example, the MySQL client tool supports --default-character-set  and you can set it up in the configuration file as follows:
[mysql] default-character-set=utf8
  • Some MySQL connectors allow you to set the character set, for example, if you use PHP PDO, you can set the character set in the data source name as follows:
$dsn ="mysql:host=$host;dbname=$db;charset=utf8";

Regardless of which way you use, make sure that the character set used by the application matches the character set stored in the MySQL database server.

In this tutorial, you have learned about MySQL character set, how to convert strings between character sets, and how to configure proper character sets for client connections.

Reactions

Post a Comment

0 Comments

close