What is the problem with this code:


final byte[] bytes = someString.getBytes();


There are, in fact, two problems:

  • the code relies on the default Charset of the JVM;
  • it supposes that this default Charset can handle all characters.

While the second problem is rarely a concern, the first certainly is a concern.

For instance, in most Windows installations, the default charset is CP1252; but on Linux installations, the default charset will be UTF-8.

As such, such a simple string as “é” will give a different result for this operation depending on whether this code is run on Windows or Linux.

The solution is to always specify a Charset, as in, for instance:

final byte[] bytes = someString.getBytes(StandardCharsets.UTF_8);
