December 20, 2014

quote in urllib

quote accept two arguments: a string and a safe characters. If you pass empty to quote as the second parameter, it means every characters except preserved characters will be replaced by a % leading characters, e.g. space will be replaced by %20, etc.

In my project, I invoke quote like quote(params, ''), and I have imported unicode_literals, so I passed a unicode parameter to quote method.

What will happen?

(My project is a web project, I use Tornado frameworks, the python process will be running as a backend service.)

Next time you invoke quote with the same safe parameter you used before, you may get an error like

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128) 

The error is raised by rstrip, quote and urlencode invokes rstrip, but the argument passed into rstrip is an unicode, e.g


urllib will cache the safe parameter( 1277 line), it expects a byte type parameter but you pass an unicode, also, that library didn’t check or convert it before it uses.

Best practice: pass a byte parameter like b'' to quote if you have same usage(unicode literals) like me.

Powered by Hugo & Kiss.