In this post I want to share some of my thoughts about embedding python into C/C++ applications. It will not be yet another python tutorial, but just my personal feelings about some of the mechanisms that I’ve encountered during my work on dirtyJOE. I’ll describe three completely different things:
- Usage of FILE* structure by Python runtime
- Small differences between different Python versions
- Reference counting
Above three topics are just small part of the whole python embedding topic, but they attracted me enough to write about it. So let’s start.
FILE* problem (or not)
The easiest way to run python script inside your application is to call one of the high level functions whose names are generally matching this mask: PyRun_*File*(FILE* fp, …). Those functions are very convenient, because they can read and execute whole script directly from the file. There is (at least) one got-cha, in python documentation you can read:
Note also that several of these functions take FILE* parameters. One particular issue which needs to be handled carefully is that the FILE structure for different C libraries can be different and incompatible. Under Windows (at least), it is possible for dynamically linked extensions to actually use different libraries, so care should be taken that FILE* parameters are only passed to these functions if it is certain that they were created by the same library that the Python runtime is using.
From my point of view it sound a bit ridiculous, expecting that someone will use the exact version of C runtime to open a file… I’ve checked a few python versions (compiled win32 binaries from python.org) and situation looks like this:
- python25.dll uses msvcr71.dll
- python26.dll uses msvcr90.dll
- python27.dll uses msvcr90.dll
So, you need to use fopen() from the specific msvcr library (Thanks god it is not using static version of C runtime). Of course you can always compile your version of python dll (that will use msvcr of your choice) and ship it with the product.
When I was integrating python into dirtyJOE, I decided to use Py_CompileString() function. It works on memory buffer with the script instead of the FILE* and it gives a lot of more possibilities than just running simple scripts. It also needs some additional code, especially if you want to run only specific function from the script, but it is out of the scope of this post.
Small differences between versions
I’ll describe only one peculiarity, because I’ve encountered only one that was serious enough to investigate it. dirtyJOE can use three different python versions (2.5, 2.6, 2.7), my C++ code was identical for every of those three versions until I’ve discovered a small bug. I was testing it against v2.7 and everything worked fine, but when I’ve changed python to v2.6 or v2.5 I’ve started receiving errors from Py_CompileString() function. I’ve downloaded python sources for all of three versions and I’ve started diffing (2.5 vs 2.7) from Py_CompileString():
- Py_CompileString() – no differences
- Py_CompileStringFlags() – no differences
- PyParser_ASTFromString() – small differences
- PyParser_ParseStringFlagsFilename(Ex)() – small differences
- PyTokenizer_FromString() – small differences
- decode_str() – small difference… but…
…but not that small, v2.7 includes call to mysterious function called translate_newlines() which basically filters out all encountered ‘\r’ characters. Adding code that strips all ‘\r’ from the input script was rather easy task and now everything works fine (and code is still the same for every python version!).
All C/C++ code that works with Python API needs to care about references to PyObjects. PyObject* is a basic type for all python objects that you’ll be working on with your C/C++ code. Keeping track of all references sometimes can be very tedious and error prone task. Python SDK gave us two basic macros for reference incrementation and decrementation:
There are few more macros described here: http://www.python.org/doc//current/c-api/refcounting.html
Another thing that is worth to mention is reference stealing:
(…)when a calling function passes in a reference to an object, there are two possibilities: the function steals a reference to the object, or it does not. Stealing a reference means that when you pass a reference to a function, that function assumes that it now owns that reference, and you are not responsible for it any longer.
Most functions does not steal references, so if the function steals the reference it is explicitly stated in the documentation (“This function “steals” a reference (…)”).
PyObjects returned form a function can also behave in two ways: it can be a new reference or it can be borrowed reference. Borrowed reference means that you are not the owner and you don’t need to call Py_DECREF on the object unless you have called Py_INCREF somewhere. If the function returns PyObject*, documentation describes if it is new or borrowed reference:
- “Return value: Borrowed reference.”
- “Return value: New reference.”
All mentioned mechanisms (and more) are described in the original python documentation that can be found here: http://www.python.org/doc//current/c-api/intro.html#objects-types-and-reference-counts