Converting C strings to Python



Byte objects can be built using Py_BuildValue() like

// Pointer to line data C

char * s; 

 
// Data length

int len; 

 
// Make a byte object

PyObject * obj = Py_BuildValue ( "y #" , s, len); 

To create a Unicode string and s is known to point to data encoded as UTF-8, the below code can be used as —

PyObject * obj = Py_BuildValue ( " s # " , s, len); 

If s is encoded in what -other known encoding, a string using PyUnicode_Decode() can be done as:

PyObject * obj = PyUnicode_Decode (s, len, "encoding" , " errors " ); 

 
// Example

obj = PyUnicode_Decode (s, len, "latin-1" , "strict" ); 

obj = PyUnicode_Decode (s, len, "ascii" , "ignore" ); 

If a wide string should be represented as wchar_t * , len pair. Then there are several options as shown below —

// Wide character string

wchar_t * w; 

 
// Length

int len; 

 
// Option 1 - use Py_BuildValue ()

PyObject * obj = Py_BuildValue ( "u #" , w , len); 

 
// Option 2 - use PyUnicode_FromWideChar ()
PyObject * obj = PyUnicode_FromWideChar (w, len); 

  • Data from C must be explicitly decoded into a string according to some codec
  • Common encodings include ASCII, Latin-1 and UTF-8.
  • If your encoding is unknown, it is better to encode the string in bytes instead.
  • Python always copies string data (assuming) when creating an object.
  • Also, to be more robust, strings should be constructed using both pointer and size rather than relying on null-terminated data.