Intermittent Pickle Problems in Jython

The previous post mentioned "integrating" Jython and CPython by transmitting a stream of pickles between the two. I encountered one intermittent problem with this approach, and I'm unsure of its cause. (Hm, and I should probably post this to a Jython mailing list...)


In Jython I'd pickled the str() of a, into which I'd just written the SD representation of a CDK molecule. Jython could create the pickle alright. But when I tried to unpickle it in CPython, sometimes, for some molecules, I got a traceback:

File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/", line 970,  in load_string
    raise ValueError, "insecure string pickle"
ValueError: insecure string pickle

The error occurred consistently in my application code, always on the same input structure. But I couldn't derive a simple test script to demonstrate the problem.


Examination of the problematic pickle data showed that a Python unicode string literal marker had somehow been inserted, and the type code for the item was somehow S (for string) rather than V (for unicode):

Su'ZINC00000181\n  CDK...
 ^ What the... ?


Google turned up a usable workaround: encode the offending string as utf-8 before trying to pickle it.

import codecs

enc = codecs.getencoder('utf8')
sdf = enc(sdf)