Change language

Unicode strings passed to C libraries

|

To illustrate the solution below, two C functions operate on string data and output it for debugging and experimentation.

Code # 1: Uses bytes, represented in the form char * , int

void print_chars ( char * s, int len )

{

int n = 0; 

while (n "len)

{

printf ( "% 2x" , (unsigned char ) s [n]); 

n ++; 

}

printf ( " " ); 

}

Code # 2: Uses wide characters in the form wchar_t *, int

void print_wchars ( wchar_t * s, int len)

{

int n = 0; 

while (n "len)

{

printf ( "% x" , s [n]); 

n ++; 

}

printf ( " " ); 

}

Python strings must be converted to a suitable byte encoding such as UTF-8 for the print_chars () byte function. The code below is a simple extension function for the ultimate goal.

Code # 3:

static PyObject * py_print_chars (PyObject * self, PyObject * args)

{

char * s; 

Py_ssize_t len; 

if (! PyArg_ParseTuple ( args, "s #" , & amp; s, & amp; len))

{

return NULL; 

}

print_chars (s, len); 

Py_RETURN_NONE; 

}

For library functions that work with the machine type wchar_t , the C extension code can be written as —

Code # 4:

static PyObject * py_print_wchars (PyObject * self , PyObject * args)

{

wchar_t * s; 

Py_ssize_t len

if (! PyArg_ParseTuple (args , "u #" , & amp; s, & amp; len ))

{

return NULL; 

}

print_wchars (s, len ); 

Py_RETURN_NONE; 

}

The code below now checks how the extension functions work.

Observe how the print_wchars() -oriented function print_chars () gets the data in UTF-8, while print_wchars() gets the Unicode code point values.

Code # 5:

s = ’Spicy Jalapeu00f1o’

print (print_chars ( s))

 

print ( "" , print_wchars (s))

< / p>

Output:

 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f 53 70 69 63 79 20 4a 61 6c 61 70 65 f1 6f 

Let’s check the nature of the C library being accessed. For many C libraries, it might make sense to pass bytes instead of a string. Let’s use the conversion code below to do this.

Code # 6:

static PyObject * py_print_chars (PyObject * self, PyObject * args)

{

char * s; 

Py_ssize_t len; 

 

// accepts bytes, bytearray, or other byte objects

  

  if (! PyArg_ParseTuple (args, " y # " , & amp; s, & amp; len))

{

return NULL; 

}

print_chars (s, len); 

Py_RETURN_NONE; 

}

If you still want to pass strings, care should be taken to ensure that Python3 uses an adaptable string representation that is not very easy to map directly to C libraries using the standard char * or wchar_t * . Thus, to represent string data in C, some kind of conversion is almost always necessary. The format codes s # and u # for PyArg_ParseTuple () safely perform such conversions. 
Whenever a conversion is performed, a copy of the converted data is attached to the original string object so that it can be used later, as shown in the code below.

Code # 7:

import sys

 

s = ’ Spicy Jalapeu00f1o’

print ( "Size: " , sys.getsizeof (s))

  

print ( "" , print_chars (s))

 

print ( "Size:" , sys.getsizeof (s))

 

print ( "" , print_wchars (s))

 

print ( " Size: " , sys.getsizeof (s))

Output:

 Size: 87 53 70 69 63 79 20 4a 61 6c 61 70 65 c3 b1 6f Size: 103 53 70 69 63 79 20 4a 61 6c 61 70 65 f1 6f Size: 163 

Shop

Learn programming in R: courses

$

Best Python online courses for 2022

$

Best laptop for Fortnite

$

Best laptop for Excel

$

Best laptop for Solidworks

$

Best laptop for Roblox

$

Best computer for crypto mining

$

Best laptop for Sims 4

$

Latest questions

NUMPYNUMPY

Common xlabel/ylabel for matplotlib subplots

12 answers

NUMPYNUMPY

How to specify multiple return types using type-hints

12 answers

NUMPYNUMPY

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

12 answers

NUMPYNUMPY

Flake8: Ignore specific warning for entire file

12 answers

NUMPYNUMPY

glob exclude pattern

12 answers

NUMPYNUMPY

How to avoid HTTP error 429 (Too Many Requests) python

12 answers

NUMPYNUMPY

Python CSV error: line contains NULL byte

12 answers

NUMPYNUMPY

csv.Error: iterator should return strings, not bytes

12 answers

News


Wiki

Python | How to copy data from one Excel sheet to another

Common xlabel/ylabel for matplotlib subplots

Check if one list is a subset of another in Python

sin

How to specify multiple return types using type-hints

exp

Printing words vertically in Python

exp

Python Extract words from a given string

Cyclic redundancy check in Python

Finding mean, median, mode in Python without libraries

cos

Python add suffix / add prefix to strings in a list

Why do I get "Pickle - EOFError: Ran out of input" reading an empty file?

Python - Move item to the end of the list

Python - Print list vertically