Change language

Passing null character strings to C libraries

| | |

The code below has a C function that we will illustrate and test. The C function ( code # 1 ) simply prints the hexadecimal representation of individual characters so that the passed strings can be easily debugged.

Code # 1:

void print_chars ( char * s)

{

  while (* s)

  {

printf ( "% 2x" , (unsigned char ) * s); 

s ++; 

}

printf ( " " ); 

}

 

print_chars ( "Hello" ); 

Output:

 48 65 6c 6c 6f 

There are several options to call such a C function from Python. First, it can be limited to — work with bytes only, using the conversion code "y" to PyArg_ParseTuple() as shown in the code below.

Code # 2:

static PyObject * py_print_chars (PyObject * self, PyObject * args)

{

char * s; 

if (! PyArg_ParseTuple ( args, "y" , & amp; s))

{

return NULL; 

}

print_chars (s); 

Py_RETURN_NONE; 

}

Let’s see how the resulting function works and how bytes with embedded NULL bytes and Unicode strings are discarded.

Code # 3:

print (print_chars (b ’ Hello World’ ))

 

print ( "" , print_chars (b ’Hellox00World’ ))

  

print ( "" , print_chars ( ’Hello World’  ))

Output:

 48 65 6c 6c 6f 20 57 6f 72 6c 64 Traceback (most recent call last): File "", line 1, in TypeError: must be bytes without null bytes, not bytes Traceback (most recent call last): File "", line 1, in TypeError: ’str’ does not support the buffer interface 

If you want to pass Unicode strings instead, use format code "s" for PyArg_ParseTuple() as below.

Code # 4:

static PyObject * py_print_chars (PyObject * self, PyObject * args)

{

char * s; 

if (! PyArg_ParseTuple ( args, "s" , & amp; s))

{

return NULL; 

}

print_chars (s); 

Py_RETURN_NONE; 

}

Using the above code ( code # 4 ) will automatically convert all strings to null-terminated UTF-8 encoding. As shown in the code below.

Code # 5:

print (print_chars ( ’Hello World’ ))

 
# UTF-8 encoding

print ( "" , print_chars ( ’Spicy Jalapeu00f1o’ ))

  

print ( "" , print_chars ( ’Hellox00World’ ))

 

print ( "" , print_chars (b ’Hello World’ ))

Output:

 48 65 6c 6f 20 57 6f 72 6c 64 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Traceback ( most recent call last): File "", line 1, in TypeError: must be str without null characters, not str Traceback (most recent call last): File "", line 1, in TypeError: must be str, not bytes 

If you are working with PyObject * and cannot use PyArg_ParseTuple () , the code below explains how to check and extract a suitable reference char * from both bytes and from a string object.

Code # 6: Converting from Bytes

// some Python object
PyObject * obj; 

 
// Convert from bytes
{

char * s; 

s = PyBytes_AsString (o); 

if (! s)

{

/ * TypeError has already been raised * *

  return NULL; 

}

print_chars (s); 

}

Code # 7: Convert to UTF-8 bytes from string

{

 

PyObject * bytes; 

char * s; 

 

if (! PyUnicode_Check (obj))

  {

PyErr_SetString (PyExc_TypeError, "Expected string" ); 

return NULL; 

}

 

  bytes = PyUnicode_AsUTF8String (obj); 

s = PyBytes_AsString (bytes); 

print_chars (s); 

Py_DECREF (bytes); 

}

Both code conversions guarantee null-terminated data, but there is no check for embedded NULL bytes elsewhere in the string. This should be checked if important.

Note: There is a hidden memory overhead associated with using the "s" format code for PyArg_ParseTuple () which is easy to miss. When you write code that uses this conversion, a UTF-8 string is created that is permanently attached to the original string object, which, if it contains non-ASCII characters, increases the size of the string until it is garbage collected.

Code # 8:

import sys

s = ’Spicy Jalapeu00f1o’

print ( "Size:" , sys.getsizeof (s))

 
# passed string

print ( "" , print_chars (s) )

 
# increase size

print ( "Size:" , sys.getsizeof (s))

Output:

 Size: 87 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Size: 103 

Passing null character strings to C libraries File handling: Questions

Passing null character strings to C libraries Python functions: Questions

Shop

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Best laptop for Zoom

$499

Best laptop for Minecraft

$590

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News

Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method