Wednesday, May 2, 2012

MySQL UTF8 varchar column size

MySQL documentation says that since 5.0, varchar lengths refer to character units, not bytes. However, I recently came across an issue where I was getting truncated data warnings when inserting values that should have fit into the varchar column it was designated.



I replicated this issue with a simple table in v5.1



mysql> show create table test\G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`string` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)


I then inserted multiple 10 characters values with differing amounts of UTF8 characters



mysql> insert into test (string) values 
-> ('abcdefghij'),
-> ('ãáéíçãáéíç'),
-> ('ãáéíç67890'),
-> ('éíç4567890'),
-> ('íç34567890');
Query OK, 5 rows affected, 4 warnings (0.06 sec)
Records: 5 Duplicates: 0 Warnings: 4

mysql> show warnings;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'string' at row 2 |
| Warning | 1265 | Data truncated for column 'string' at row 3 |
| Warning | 1265 | Data truncated for column 'string' at row 4 |
| Warning | 1265 | Data truncated for column 'string' at row 5 |
+---------+------+---------------------------------------------+

mysql> select * from test;
+------------+
| string |
+------------+
| abcdefghij |
| ãáéíç |
| ãáéíç |
| éíç4567 |
| íç345678 |
+------------+
5 rows in set (0.00 sec)


I think that this shows that the varchar size is still defined in bytes or at least, is not accurate in character units.



The question is, am I understanding the documentation correctly and is this a bug? Or am I misinterpreting the documentation?





No comments:

Post a Comment