Change language

Unicode strings passed to C libraries

|

To illustrate the solution below, two C functions operate on string data and output it for debugging and experimentation.

Code # 1: Uses bytes, represented in the form char * , int

void print_chars ( char * s, int len )

{

int n = 0; 

while (n "len)

{

printf ( "% 2x" , (unsigned char ) s [n]); 

n ++; 

}

printf ( " " ); 

}

Code # 2: Uses wide characters in the form wchar_t *, int

void print_wchars ( wchar_t * s, int len)

{

int n = 0; 

while (n "len)

{

printf ( "% x" , s [n]); 

n ++; 

}

printf ( " " ); 

}

Python strings must be converted to a suitable byte encoding such as UTF-8 for the print_chars () byte function. The code below is a simple extension function for the ultimate goal.

Code # 3:

static PyObject * py_print_chars (PyObject * self, PyObject * args)

{

char * s; 

Py_ssize_t len; 

if (! PyArg_ParseTuple ( args, "s #" , & amp; s, & amp; len))

{

return NULL; 

}

print_chars (s, len); 

Py_RETURN_NONE; 

}

For library functions that work with the machine type wchar_t , the C extension code can be written as —

Code # 4:

static PyObject * py_print_wchars (PyObject * self , PyObject * args)

{

wchar_t * s; 

Py_ssize_t len

if (! PyArg_ParseTuple (args , "u #" , & amp; s, & amp; len ))

{

return NULL; 

}

print_wchars (s, len ); 

Py_RETURN_NONE; 

}

The code below now checks how the extension functions work.

Observe how the print_wchars() -oriented function print_chars () gets the data in UTF-8, while print_wchars() gets the Unicode code point values.

Code # 5:

s = ’Spicy Jalapeu00f1o’

print (print_chars ( s))

 

print ( "" , print_wchars (s))

< / p>

Output:

 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f 53 70 69 63 79 20 4a 61 6c 61 70 65 f1 6f 

Let’s check the nature of the C library being accessed. For many C libraries, it might make sense to pass bytes instead of a string. Let’s use the conversion code below to do this.

Code # 6:

static PyObject * py_print_chars (PyObject * self, PyObject * args)

{

char * s; 

Py_ssize_t len; 

 

// accepts bytes, bytearray, or other byte objects

  

  if (! PyArg_ParseTuple (args, " y # " , & amp; s, & amp; len))

{

return NULL; 

}

print_chars (s, len); 

Py_RETURN_NONE; 

}

If you still want to pass strings, care should be taken to ensure that Python3 uses an adaptable string representation that is not very easy to map directly to C libraries using the standard char * or wchar_t * . Thus, to represent string data in C, some kind of conversion is almost always necessary. The format codes s # and u # for PyArg_ParseTuple () safely perform such conversions. 
Whenever a conversion is performed, a copy of the converted data is attached to the original string object so that it can be used later, as shown in the code below.

Code # 7:

import sys

 

s = ’ Spicy Jalapeu00f1o’

print ( "Size: " , sys.getsizeof (s))

  

print ( "" , print_chars (s))

 

print ( "Size:" , sys.getsizeof (s))

 

print ( "" , print_wchars (s))

 

print ( " Size: " , sys.getsizeof (s))

Output:

 Size: 87 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Size: 103 53 70 69 63 79 20 4a 61 6c 61 70 65 f1 6f Size: 163 

Unicode strings passed to C libraries Python functions: Questions

Unicode strings passed to C libraries String Variables: Questions

Shop

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Best laptop for Zoom

$499

Best laptop for Minecraft

$590

Latest questions

NUMPYNUMPY

psycopg2: insert multiple rows with one query

12 answers

NUMPYNUMPY

How to convert Nonetype to int or string?

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Javascript Error: IPython is not defined in JupyterLab

12 answers

News

Wiki

Python OpenCV | cv2.putText () method

numpy.arctan2 () in Python

Python | os.path.realpath () method

Python OpenCV | cv2.circle () method

Python OpenCV cv2.cvtColor () method

Python - Move item to the end of the list

time.perf_counter () function in Python

Check if one list is a subset of another in Python

Python os.path.join () method