Handling of UnicodeΒΆ

This package presents all text strings as Python unicode objects. From Excel 97 onwards, text in Excel spreadsheets has been stored as Unicode. Older files (Excel 95 and earlier) don’t keep strings in Unicode; a CODEPAGE record provides a codepage number (for example, 1252) which is used by xlrd to derive the encoding (for same example: “cp1252”) which is used to translate to Unicode.

If the CODEPAGE record is missing (possible if the file was created by third-party software), xlrd will assume that the encoding is ascii, and keep going. If the actual encoding is not ascii, a UnicodeDecodeError exception will be raised and you will need to determine the encoding yourself, and tell xlrd:

book = xlrd.open_workbook(..., encoding_override="cp1252")

If the CODEPAGE record exists but is wrong (for example, the codepage number is 1251, but the strings are actually encoded in koi8_r), it can be overridden using the same mechanism.

The supplied runxlrd.py has a corresponding command-line argument, which may be used for experimentation:

runxlrd.py -e koi8_r 3rows myfile.xls

The first place to look for an encoding, the “codec name”, is the Python documentation.