UTF-8 in JBoss/Tomcat + MySQL + Hibernate + JavaMail

While most of (web)applications communicate with the end user in English, a lot of them use native languages, which often have some special characters (not to look too far for an example, we have the Polish alphabet, with ą, ę, ś, etc). A widely accepted standard for coding such characters is UTF-8. However, it is not quite trivial to use the UTF-8 encoding in a Tomcat+MySQL+Hibernate+JavaMail combination, and have full UTF-8 support, in the database, web forms, jsp-s and e-mails.

Part I. Preliminaries

On every request, you have to set the encoding of characters manually; it is best to create a filter, with the following body:

1
2
3
4
5
6
7
public void doFilter(ServletRequest request,
ServletResponse response, FilterChain chain)
throws IOException, ServletException {
response.setCharacterEncoding("UTF-8");
request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}

This is needed by almost all successive parts.

Part II. JSPs

If you want to display native characters on a JSP page, you have to:

  • at the top of the page, add <%@page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
  • in the head section, add <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  • and, of course, you have to edit the .jsp file using the UTF-8 encoding (to set this in Eclipse, right-click a project, go to “Resource” tab, and set the “Text file encoding” value to “UTF-8″)

Part III. Java Strings

It may also be the case, that you have some strings in your code, that contain native characters, and, for example, you would like to pass them to a .jsp page using request.setAttribute(String, String) or send them as an e-mail subject/body. To have them properly handled:

  • set the encoding of the java source files to UTF-8 (just as with .jsp files) in your favorite editor
  • compile the sources using the -encoding UTF-8 option

Part IV. Forms

After displaying native characters, you may want to have some forms, where users can input text values using native characters. To have them properly handled by Tomcat, you need to edit the server.xml file, which is located:

  • in JBoss 4.0.x: $JBOSS_HOME/server/ <conf> /deploy /jbossweb-tomcat55.sar/server.xml
  • in JBoss 4.2: $JBOSS_HOME/server/ <conf> /deploy /jboss-web.deployer/server.xml

and add to the appropriate <Connector ...> (usually the first one) the following attribute: URIEncoding="UTF-8".

Part V. MySQL and Hibernate

Storing strings in a database in UTF-8 is a bit more tricky. First of all, you have to tell MySQL that your varchar/text fields will be using UTF-8.

If you already have a database, or if your database was created by hibernate (using hibernate.hbm2ddl.auto), you will have to run this statement for each column:

1
ALTER TABLE `&lt;database&gt;`.`&lt;table_name&gt;` MODIFY COLUMN `&lt;column_name&gt;` VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci;

(MySQL Administator can help you with that).

If you are creating a database, you can set a default encoding for all text fields:

1
CREATE TABLE `&lt;database&gt;`.`&lt;table_name&gt;` (&lt;column_list&gt;) DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

There are other possibilities as well, for example compiling mysql with UTF-8 support set as default. For the complete list of options, see here.

But configuring your database is not all; you also have to tell hibernate that in your connection to MySQL, you will be using the UTF-8 encoding. To do this:

  • if you are using a data source, to the connection URL add the parameters as in this example: <connection-url>jdbc:mysql://localhost:3306/
    <my_database>?useUnicode=true&characterEncoding=UTF-8
    </connection-url>
  • if you are using EJB3/JPA, add to persistence.xml the following properties (in the appropriate <persistence-unit>):

    <property name="hibernate.connection.useUnicode"
    value="true" />
    <property name="hibernate.connection.characterEncoding"
    value="UTF-8" />
  • in case of “plain” hibernate, just specify the above properties in your configuration file (hibernate.properties or hibernate.cfg.xml)

Part VI. Java Mail

Finally comes the easiest part: sending e-mails with the subject and body in UTF-8. The only things you have to do here is use MimeMessage, and give additional parameters when setting the subject and text of your message:

1
2
3
4
5
6
7
8
(...)
MimeMessage msg = new MimeMessage(session);
msg.setFrom(InternetAddress.parse(from, false)[0]);
msg.setSentDate(new Date());
msg.setRecipients(Message.RecipientType.TO, InternetAddress.parse(to, false));
msg.setSubject(subject, "UTF-8");
msg.setText(body, "UTF-8");
transport.sendMessage(msg, msg.getAllRecipients());

Do you know any other areas of Java which you have to configure to have full support for UTF8?

Thanks to Tomek Szymański for helping me in finding the above information.

  • Jonathan Ekwempu

    That was a useful article for anyone interested in writing multilingual applications in Java.

    Nonetheless, based on my experience I think there are easier ways to set the character encoding for MySql. You could set the character encoding during the installation and configuration and also while creating tables:

    CREATE TABLE TsExceptionLevelTab (
    …..

    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    Also, while writing multilingual JSF applications for my clients, I found the following very useful in my Java classes:

    (The assumption is that your database is set up to handle UTF-8 encoding: e.g. as described above for MySql)

    String utfValue = new String(dataBaseValue,”UTF-8″);

    where “dataBaseValue” is value read from a database. Java gurrantees that utfValue will contain the correct string (no matter the locale).

    Moreover, don’t forget that if you estimate that the size of your variable in English is say 12 characters long, in your database you should have a size of 12*4 + 2 to guarantee that all locales can be accommodated.

    If you are using JSF with facelets, then all you need to do next is add:

    to the tag of your template.

    Easy, isn’t it!

  • Jonathan Ekwempu

    If you are using JSF with facelets, then all you need to do next is add:

    “”

    to the tag of your template.

  • Jonathan Ekwempu

    This is what is missing from my reply:

    If you are using JSF with facelets, then all you need to do next is add:

    meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″

    to the head tag of your template.

  • http://www.cars.mywise.net Daniel

    I couldn’t understand some parts of this article UTF-8 in JBoss/Tomcat + MySQL + Hibernate + JavaMail, but I guess I just need to check some more resources regarding this, because it sounds interesting.

  • http://ociensa.com Olivier Debas

    I’m using a DBCP connection pool configured in tomcat (common/lib) and I met a problem with UTF-8 encoding. When form data are submitted, UTF-8 characters are “well encoded” and well received in servlet. But, jdbc driver does not receive UTF-8 encoded string but ISO-8859-1 encoded string.
    I have to run Tomat with jvm option -Dfile.encoding=UTF-8. That forces all applications to use UTF-8, which is inacceptable.
    Don’t you use DBCP connection pool configured in tomcat ?

  • http://www.warski.org Adam Warski

    I haven’t tried that setup, but did you configure the connection URL in the pool definition correctly? (with ?useUnicode=true&characterEncoding=UTF-8 as the query string). Also, make sure that your database encoding is UTF8.

  • http://ociensa.com Olivier Debas

    I’m sure that everything is ok is Mysql.

    I tested direct use of driver with DriverManager in my application like this :
    Connection co = DriverManager.getConnection(…,…,…);
    co.createStatement().executeUpdate(query);
    and it’s ok, UTF-8 characters are written correctly in database.

    When i use directly DBCP in my application to obtain connection, characters are well encoded in UTF-8.

    When i use datasource which is set up in Tomcat, characters are damaged with ISO 8859-1 encoding. It’s more than strange because i can read UTF-8 characters from database but write to database throw datasource/pool damages UTF-8 characters.
    It seems that using Datasource is the cause of problems…

    If a run Tomcat with -Dfile.encoding=UTF-8 JVM option, everything is ok ; but i can’t use this work around because other applications use ISO 8859-1.

  • Vivek

    Thanks for such good tips . I struggled a lot before I found this page
    But you saved my weekend .

  • http://- Jesse Huurre

    Thanks for these great tips! saved me a lot of time. The only part had a small problem with was the JBoss datasource configuratation, but using
    …. jdbc:mysql://localhost:3306/test
    true
    utf8
    ….
    instead of your format did the trick for me!

    Thanks again!

  • Hjava

    hi , I use tomcat+ mysql + jsp if my form is english I can insert Chinese character into my db, but if my jsp page also have Chinese character, I can’t get it right, what can I do

  • http://www.warski.org Adam Warski

    Well, I have only tested it using polish characters, are the chinese characters also in UTF8?
    If so, it “should” work :)

    Adam

  • Pingback: Encoding in forms sent using POST method « Piotr Gabryanczyk’s Blog

  • http://www.toupil.fr Aurélien

    I use hibernate/mysql on my website http://www.toupil.fr and i appreciate this technology. tank you for your article.

  • badam571

    Hi Adam,

    I am using Sybase and JBoss. And I am using datasource to connect to database. I ran a select statment, but I have an error saying:
    Invalid column name ‘ ‘.
    Invalid column name ‘:’.
    Invalid column name ’00′

    when I ran the sql statement on ISQL it ran fine. When I accessed through data-source, I have the above error.

    here is my select statment:

    SELECT convert(varchar(12), pdl_site.last_updt_tmsp, 101) + ” ” + convert(varchar(8), pdl_site.last_updt_tmsp, 108) + “:” + right(“00″ + datename(ms,pdl_site.last_updt_tmsp), 3) last_updt_tmsp, pdl_site.last_updt_usr, pdl_site.appl_version, pdl_site.default_archive_days, pdl_site.default_fund_type, pdl_site.email_protocol, pdl_site.general_info, pdl_site.license_key, pdl_site.id, pdl_site.password_expir_days, pdl_site.reg_instr, pdl_site.smtp_account, pdl_site.smtp_host, pdl_site.site_ip_addr, pdl_site.pd_version FROM pdl_site


    What do you thing is the error?

    Best Regards
    badam571

  • http://www.warski.org Adam Warski

    Hello,

    maybe try simpler select expressions first, and check if they work. However, I don’t think it’s anything related to UTF8. Also, I don’t have much experience with SyBase.

    Adam

  • badam571

    Hi Adam

    Thank you for your response. why you ruled out the encoding issue?
    The SQL statement was not read correctly when I sent it through data connection. But when I ran the same statement directly on the SQL tool, it worked.

    If it is a username/pwd issue, I shouldn’t have gotten the connection in the first place?

    or

    if it is config files issue how I get the connection ..

    Here is my question Adam .. do you think using the global jdbc/jndi binding has any effect in relation to using java:jdbc binding? I am using the global binding.

    Best Regards
    badam571

  • badam571

    Hi Adam

    Please disregard my earlier questions — it is just getting interesting.

    I followed your advise, I used simple select statment:

    select * from pdl_st

    and the news — I got connected successfully.

    so, the issue is the presence of the 00, ”, : in the field names.

    but, the question is why when those characters passed through data-source the db do not understand to interpert them correctly.

    still I am hunting out ..

    your earlier comment was a big help.

    Best Regards
    badam571

  • Miss ‘B’

    That is a very great article you saved my days man.
    thankx alot

  • Magnus

    Thank you very much. It was most helpful.
    To change encoding for the java compiler when using maven, add UTF-8 to the configuration section for the maven compiler plugin.

  • aymen

    first tkx a lot for this article. Really it saves my life.
    But a i have a little problem in the filter.
    someone can explain me how to integrate it?

  • http://www.warski.org Adam Warski

    But what’s the problem? :)

  • Serge

    Can someone knows how can I configure the utf-8 encoding for embeddded HSQLDB in JBoss 4.2.3?, because I working know for a proof of concept for our customer using EJB3 Hibernate, Spring ws and JBoss.

  • RafalD

    Thank you, very much

  • Emad Al-Bloushi

    Great job , easy steps and straight forward , besides I have done these steps in JBoss 6.0 the result was fabulous

    Thank you

    ;-)

  • szakal

    If you use spring in your application, you can use

    org.springframework.web.filter.CharacterEncodingFilter

    instead of implementing your own filter.

  • ralph

    Many Thanks, this helped me so much! It works perfekt with Tomcat6 and Mysql 5.1.

  • :onder

    Thanks for MySql Administrator trick in section “MySql and Hibernate”.

    Setting default encoding solved the problem easily. This way, Hibernate exports the schema with UTF-8 encoded tables and columns.

  • amir

    thank you for your gr8 article;
    i am using jsf with servlets, i need to autovalidate a form and show an alert;
    i am using: notify=”form1:alert1″
    and i throw a validationException in my validator…
    when i throw a faces message with english characters it works… but when i use faces message with UTF8 characters it shows “?????”.

    by the way every where else is ok, showing UTF8 chars due to your article
    thank you!

  • http://www.warski.org Adam Warski

    Make sure the encoding of your java source files is UTF8 and that you compile with UTF8 support enabled.
    I don’t have much other ideas, sorry.

    Adam

  • Peter

    Perfect Article.
    My problem was solved quickly.

  • http://www.warski.org Adam Warski

    On how to set the encoding in mysql settings, see: http://recursive-design.com/blog/2008/06/23/force-mysql-encoding-to-utf8/

    Adam

  • http://www.facebook.com/vigneshkcs Vignesh Vicky

    Thank you so much.. Warski.. I suffered a lot on this.. UTF-8 . But now i cleared all bugs..