Change language

Passing null character strings to C libraries

| | |

The code below has a C function that we will illustrate and test. The C function ( code # 1 ) simply prints the hexadecimal representation of individual characters so that the passed strings can be easily debugged.

Code # 1:

void print_chars ( char * s)

{

  while (* s)

  {

printf ( "% 2x" , (unsigned char ) * s); 

s ++; 

}

printf ( " " ); 

}

 

print_chars ( "Hello" ); 

Output:

 48 65 6c 6c 6f 

There are several options to call such a C function from Python. First, it can be limited to — work with bytes only, using the conversion code "y" to PyArg_ParseTuple() as shown in the code below.

Code # 2:

static PyObject * py_print_chars (PyObject * self, PyObject * args)

{

char * s; 

if (! PyArg_ParseTuple ( args, "y" , & amp; s))

{

return NULL; 

}

print_chars (s); 

Py_RETURN_NONE; 

}

Let’s see how the resulting function works and how bytes with embedded NULL bytes and Unicode strings are discarded.

Code # 3:

print (print_chars (b ’ Hello World’ ))

 

print ( "" , print_chars (b ’Hellox00World’ ))

  

print ( "" , print_chars ( ’Hello World’  ))

Output:

 48 65 6c 6c 6f 20 57 6f 72 6c 64 Traceback (most recent call last): File "", line 1, in TypeError: must be bytes without null bytes, not bytes Traceback (most recent call last): File "", line 1, in TypeError: ’str’ does not support the buffer interface 

If you want to pass Unicode strings instead, use format code "s" for PyArg_ParseTuple() as below.

Code # 4:

static PyObject * py_print_chars (PyObject * self, PyObject * args)

{

char * s; 

if (! PyArg_ParseTuple ( args, "s" , & amp; s))

{

return NULL; 

}

print_chars (s); 

Py_RETURN_NONE; 

}

Using the above code ( code # 4 ) will automatically convert all strings to null-terminated UTF-8 encoding. As shown in the code below.

Code # 5:

print (print_chars ( ’Hello World’ ))

 
# UTF-8 encoding

print ( "" , print_chars ( ’Spicy Jalapeu00f1o’ ))

  

print ( "" , print_chars ( ’Hellox00World’ ))

 

print ( "" , print_chars (b ’Hello World’ ))

Output:

 48 65 6c 6f 20 57 6f 72 6c 64 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Traceback ( most recent call last): File "", line 1, in TypeError: must be str without null characters, not str Traceback (most recent call last): File "", line 1, in TypeError: must be str, not bytes 

If you are working with PyObject * and cannot use PyArg_ParseTuple () , the code below explains how to check and extract a suitable reference char * from both bytes and from a string object.

Code # 6: Converting from Bytes

// some Python object
PyObject * obj; 

 
// Convert from bytes
{

char * s; 

s = PyBytes_AsString (o); 

if (! s)

{

/ * TypeError has already been raised * *

  return NULL; 

}

print_chars (s); 

}

Code # 7: Convert to UTF-8 bytes from string

{

 

PyObject * bytes; 

char * s; 

 

if (! PyUnicode_Check (obj))

  {

PyErr_SetString (PyExc_TypeError, "Expected string" ); 

return NULL; 

}

 

  bytes = PyUnicode_AsUTF8String (obj); 

s = PyBytes_AsString (bytes); 

print_chars (s); 

Py_DECREF (bytes); 

}

Both code conversions guarantee null-terminated data, but there is no check for embedded NULL bytes elsewhere in the string. This should be checked if important.

Note: There is a hidden memory overhead associated with using the "s" format code for PyArg_ParseTuple () which is easy to miss. When you write code that uses this conversion, a UTF-8 string is created that is permanently attached to the original string object, which, if it contains non-ASCII characters, increases the size of the string until it is garbage collected.

Code # 8:

import sys

s = ’Spicy Jalapeu00f1o’

print ( "Size:" , sys.getsizeof (s))

 
# passed string

print ( "" , print_chars (s) )

 
# increase size

print ( "Size:" , sys.getsizeof (s))

Output:

 Size: 87 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Size: 103 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically