Java provides relatively a convenient recoding mechanism through the
Converting bytes from a specific encoding:
String str = new String (bytes, encoding);
Converting a String to a specific encoding:
byte  bytes = str.getBytes(encoding);
Unfortunately, the list of supported encodings is rather difficult to
find. Required by the Java Specification are
UTF-16 in big, little and specified endian order.
Sun's JDK contains a lot more encodings in the international version.
Cp037, though you need to be careful,
EBCDIC can mean any number of codepages. Though if in doubt
(and in an English speaking country), Codepage 037 is probably your best bet.
Since I can never find the list of supported encodings and their names when I'm looking, here is the link for Sun's JDK 1.3.
recode lat1..cp037 $FILENAME
For conversion inside of a C
program, Unix provides the
iconv facility (
man 3 iconv), though it
seems the Unix Spec doesn't provide a list of charsets required in
This discussion of GNU's implementation is the closest I've been
able to find.
IANA maintains a list of canonical charset names here and not at
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets as is
stated in RFC 2278 IANA Charset Registration Procedures. The IANA
list is useful because it provides a list with official names as well as
EBCDIC stand for Extended Binary Coded Decimal Interchange
Code. Chances are, you're lucky enough to never have to deal with it.
EBCDIC is a character set IBM uses on mainframes, so you'll probably
only come in contact with it if dealing with legacy business
EBCDIC characters are 8-bit wide. They're table-based
because this corresponds roughly to positioning on punchcards. It also
goes to explain why characters in
EBCDIC aren't consecutive, i.e. 'i'
isn't followed by 'j'. Characters are basically sorted into a table, and
the first four bits of the value specify the columns, while the second
four bits specify the row. For example the character 't' has the hex
0xA3, so you can find 't' in column
3 in the table
EBCDIC codepage 037
In case you're wondering: line feed (\n) is
0x15, carriage return
0x0d, which is the same as in
ASCII! The space character
If you're looking at a hex dump of
EBCDIC characters, it's fairly
easy to single out the digits, they're the bytes starting with 'F', so:
in hex. Same thing applies in
ASCII, by the way, only with '3':