-
UTF-8 in JBoss/Tomcat + MySQL + Hibernate + JavaMail
Posted on October 3rd, 2007 23 commentsWhile most of (web)applications communicate with the end user in English, a lot of them use native languages, which often have some special characters (not to look too far for an example, we have the Polish alphabet, with ą, ę, ś, etc). A widely accepted standard for coding such characters is UTF-8. However, it is not quite trivial to use the UTF-8 encoding in a Tomcat+MySQL+Hibernate+JavaMail combination, and have full UTF-8 support, in the database, web forms, jsp-s and e-mails.
Part I. Preliminaries
On every request, you have to set the encoding of characters manually; it is best to create a filter, with the following body:
public void doFilter(ServletRequest request,
ServletResponse response, FilterChain chain)
throws IOException, ServletException {
response.setCharacterEncoding(”UTF-8″);
request.setCharacterEncoding(”UTF-8″);
chain.doFilter(request, response);
}
This is needed by almost all successive parts.
Part II. JSPs
If you want to display native characters on a JSP page, you have to:
- at the top of the page, add
<%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%> - in the
headsection, add<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> - and, of course, you have to edit the .jsp file using the UTF-8 encoding (to set this in Eclipse, right-click a project, go to “Resource” tab, and set the “Text file encoding” value to “UTF-8″)
Part III. Java Strings
It may also be the case, that you have some strings in your code, that contain native characters, and, for example, you would like to pass them to a .jsp page using
request.setAttribute(String, String)or send them as an e-mail subject/body. To have them properly handled:- set the encoding of the java source files to UTF-8 (just as with .jsp files) in your favorite editor
- compile the sources using the
-encoding UTF-8option
Part IV. Forms
After displaying native characters, you may want to have some forms, where users can input text values using native characters. To have them properly handled by Tomcat, you need to edit the
server.xmlfile, which is located:- in JBoss 4.0.x:
$JBOSS_HOME/server/ <conf> /deploy /jbossweb-tomcat55.sar/server.xml - in JBoss 4.2:
$JBOSS_HOME/server/ <conf> /deploy /jboss-web.deployer/server.xml
and add to the appropriate
<Connector ...>(usually the first one) the following attribute:URIEncoding="UTF-8".Part V. MySQL and Hibernate
Storing strings in a database in UTF-8 is a bit more tricky. First of all, you have to tell MySQL that your varchar/text fields will be using UTF-8.
If you already have a database, or if your database was created by hibernate (using
hibernate.hbm2ddl.auto), you will have to run this statement for each column:
ALTER TABLE `<database>`.`<table_name>` MODIFY COLUMN `<column_name>` VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci;
(MySQL Administator can help you with that).If you are creating a database, you can set a default encoding for all text fields:
CREATE TABLE `<database>`.`<table_name>` (<column_list>) DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
There are other possibilities as well, for example compiling mysql with UTF-8 support set as default. For the complete list of options, see here.
But configuring your database is not all; you also have to tell hibernate that in your connection to MySQL, you will be using the UTF-8 encoding. To do this:
- if you are using a data source, to the connection URL add the parameters as in this example:
<connection-url>jdbc:mysql://localhost:3306/
<my_database>?useUnicode=true&characterEncoding=UTF-8
</connection-url> - if you are using EJB3/JPA, add to
persistence.xmlthe following properties (in the appropriate<persistence-unit>):
<property name=”hibernate.connection.useUnicode”
value=”true” />
<property name=”hibernate.connection.characterEncoding”
value=”UTF-8″ /> - in case of “plain” hibernate, just specify the above properties in your configuration file (
hibernate.propertiesorhibernate.cfg.xml)
Part VI. Java Mail
Finally comes the easiest part: sending e-mails with the subject and body in UTF-8. The only things you have to do here is use
MimeMessage, and give additional parameters when setting the subject and text of your message:
(…)
MimeMessage msg = new MimeMessage(session);
msg.setFrom(InternetAddress.parse(from, false)[0]);
msg.setSentDate(new Date());
msg.setRecipients(Message.RecipientType.TO, InternetAddress.parse(to, false));
msg.setSubject(subject, “UTF-8″);
msg.setText(body, “UTF-8″);
transport.sendMessage(msg, msg.getAllRecipients());
Do you know any other areas of Java which you have to configure to have full support for UTF8?
Thanks to Tomek Szymański for helping me in finding the above information.
22 responses to “UTF-8 in JBoss/Tomcat + MySQL + Hibernate + JavaMail”

-
That was a useful article for anyone interested in writing multilingual applications in Java.
Nonetheless, based on my experience I think there are easier ways to set the character encoding for MySql. You could set the character encoding during the installation and configuration and also while creating tables:
CREATE TABLE TsExceptionLevelTab (
…..) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Also, while writing multilingual JSF applications for my clients, I found the following very useful in my Java classes:
(The assumption is that your database is set up to handle UTF-8 encoding: e.g. as described above for MySql)
String utfValue = new String(dataBaseValue,”UTF-8″);
where “dataBaseValue” is value read from a database. Java gurrantees that utfValue will contain the correct string (no matter the locale).
Moreover, don’t forget that if you estimate that the size of your variable in English is say 12 characters long, in your database you should have a size of 12*4 + 2 to guarantee that all locales can be accommodated.
If you are using JSF with facelets, then all you need to do next is add:
to the tag of your template.
Easy, isn’t it!
-
Jonathan Ekwempu October 7th, 2007 at 02:21
If you are using JSF with facelets, then all you need to do next is add:
“”
to the tag of your template.
-
Jonathan Ekwempu October 7th, 2007 at 02:24
This is what is missing from my reply:
If you are using JSF with facelets, then all you need to do next is add:
meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″
to the head tag of your template.
-
I couldn’t understand some parts of this article UTF-8 in JBoss/Tomcat + MySQL + Hibernate + JavaMail, but I guess I just need to check some more resources regarding this, because it sounds interesting.
-
I’m using a DBCP connection pool configured in tomcat (common/lib) and I met a problem with UTF-8 encoding. When form data are submitted, UTF-8 characters are “well encoded” and well received in servlet. But, jdbc driver does not receive UTF-8 encoded string but ISO-8859-1 encoded string.
I have to run Tomat with jvm option -Dfile.encoding=UTF-8. That forces all applications to use UTF-8, which is inacceptable.
Don’t you use DBCP connection pool configured in tomcat ? -
I’m sure that everything is ok is Mysql.
I tested direct use of driver with DriverManager in my application like this :
Connection co = DriverManager.getConnection(…,…,…);
co.createStatement().executeUpdate(query);
and it’s ok, UTF-8 characters are written correctly in database.When i use directly DBCP in my application to obtain connection, characters are well encoded in UTF-8.
When i use datasource which is set up in Tomcat, characters are damaged with ISO 8859-1 encoding. It’s more than strange because i can read UTF-8 characters from database but write to database throw datasource/pool damages UTF-8 characters.
It seems that using Datasource is the cause of problems…If a run Tomcat with -Dfile.encoding=UTF-8 JVM option, everything is ok ; but i can’t use this work around because other applications use ISO 8859-1.
-
Thanks for such good tips . I struggled a lot before I found this page
But you saved my weekend . -
Thanks for these great tips! saved me a lot of time. The only part had a small problem with was the JBoss datasource configuratation, but using
…. jdbc:mysql://localhost:3306/test
true
utf8
….
instead of your format did the trick for me!Thanks again!
-
hi , I use tomcat+ mysql + jsp if my form is english I can insert Chinese character into my db, but if my jsp page also have Chinese character, I can’t get it right, what can I do
-
I use hibernate/mysql on my website http://www.toupil.fr and i appreciate this technology. tank you for your article.
-
badam571 August 19th, 2008 at 23:40
Hi Adam,
I am using Sybase and JBoss. And I am using datasource to connect to database. I ran a select statment, but I have an error saying:
Invalid column name ‘ ‘.
Invalid column name ‘:’.
Invalid column name ‘00′when I ran the sql statement on ISQL it ran fine. When I accessed through data-source, I have the above error.
here is my select statment:
SELECT convert(varchar(12), pdl_site.last_updt_tmsp, 101) + ” ” + convert(varchar(8), pdl_site.last_updt_tmsp, 108) + “:” + right(”00″ + datename(ms,pdl_site.last_updt_tmsp), 3) last_updt_tmsp, pdl_site.last_updt_usr, pdl_site.appl_version, pdl_site.default_archive_days, pdl_site.default_fund_type, pdl_site.email_protocol, pdl_site.general_info, pdl_site.license_key, pdl_site.id, pdl_site.password_expir_days, pdl_site.reg_instr, pdl_site.smtp_account, pdl_site.smtp_host, pdl_site.site_ip_addr, pdl_site.pd_version FROM pdl_site
–
What do you thing is the error?Best Regards
badam571 -
badam571 August 20th, 2008 at 15:54
Hi Adam
Thank you for your response. why you ruled out the encoding issue?
The SQL statement was not read correctly when I sent it through data connection. But when I ran the same statement directly on the SQL tool, it worked.If it is a username/pwd issue, I shouldn’t have gotten the connection in the first place?
or
if it is config files issue how I get the connection ..
Here is my question Adam .. do you think using the global jdbc/jndi binding has any effect in relation to using java:jdbc binding? I am using the global binding.
Best Regards
badam571 -
badam571 August 20th, 2008 at 18:19
Hi Adam
Please disregard my earlier questions — it is just getting interesting.
I followed your advise, I used simple select statment:
select * from pdl_st
and the news — I got connected successfully.
so, the issue is the presence of the 00, ”, : in the field names.
but, the question is why when those characters passed through data-source the db do not understand to interpert them correctly.
still I am hunting out ..
your earlier comment was a big help.
Best Regards
badam571 -
Miss 'B' January 16th, 2009 at 13:35
That is a very great article you saved my days man.
thankx alot -
Magnus March 8th, 2009 at 18:28
Thank you very much. It was most helpful.
To change encoding for the java compiler when using maven, add UTF-8 to the configuration section for the maven compiler plugin. -
first tkx a lot for this article. Really it saves my life.
But a i have a little problem in the filter.
someone can explain me how to integrate it? -
Can someone knows how can I configure the utf-8 encoding for embeddded HSQLDB in JBoss 4.2.3?, because I working know for a proof of concept for our customer using EJB3 Hibernate, Spring ws and JBoss.
-
RafalD January 27th, 2010 at 20:42
Thank you, very much
1 Trackbacks / Pingbacks
-
[...] by Piotr Gabryanczyk on May 7, 2008 I struggled a lot with encoding in Tomcat before I found this post on Adam’s Warski [...]
Leave a reply
- at the top of the page, add

Jonathan Ekwempu October 7th, 2007 at 02:18