casefold () string in python

casefold | Python Methods and Functions | String Variables

The

String casefold () method is used to implement case-insensitive string matching. It is similar to the string lower () method, but case removes all case differences present in the string. those. ignore cases when comparing. 
Syntax :

  string.casefold ()   Parameters:  the casefold doesn't take any parameters.  return value:  it return the casefolded string the string converted to lower case. 

Examples

  1. Convert string to lowercase

    # Python program to convert string to lowercase

    string = "GEEKSFORGEEKS"

     
    # print lowercase string

    print ( "lowercase string: " , string.casefold ())

    Output:

     lowercase string: pythonengineering 
  2. Check if the string is a palindrome

     # A program to check if the string
    # palindrome or not

      
    # change this value for another output

    str = ' pythonengineering'

     
    # make it suitable for case insensitive comparison

    str = str . casefold ()

     
    # flip the line

    rev_str = reversed ( str )

     
    # check if the string is its inverse

    if list ( str ) = = list (rev_str):

    print ( "palindrome " )

    else :

    print ( "not palindrome" )

    Output:

     not palindrome 
  3. Count the vowels in a line

    # Program for counting the number of each
    # vowel string

      
    # vowel string

    v = 'aeiou'

     
    # change this value for a different result

    str = 'Hello, have you try pythonengineering?'

     
    # user input
    # str = input ( & quot; Enter the string: & quot;)

     
    # case insensitive

    str = str . casefold ()

     
    # make a dictionary with each vowel key and value 0

    c = {}. fromkeys (v, 0 )

     
    # count vowels

    for char in str :

    if char in c:

      c [char] + = 1

    print (c)

    Output:

     {'o': 3,' e': 6, 'a': 1,' i': 0, 'u': 1} 

This article courtesy of Shivani Bagel . If you are as Python.Engineering and would like to contribute, you can also write an article using contribute.python.engineering or by posting an article contribute @ python.engineering. See my article appearing on the Python.Engineering homepage and help other geeks.

Please post comments if you find anything wrong or if you'd like to share more information on the topic discussed above.





casefold () string in python: StackOverflow Questions

Answer #1

Comparing strings in a case insensitive way seems trivial, but it"s not. I will be using Python 3, since Python 2 is underdeveloped here.

The first thing to note is that case-removing conversions in Unicode aren"t trivial. There is text for which text.lower() != text.upper().lower(), such as "ß":

"ß".lower()
#>>> "ß"

"ß".upper().lower()
#>>> "ss"

But let"s say you wanted to caselessly compare "BUSSE" and "Buße". Heck, you probably also want to compare "BUSSE" and "BUẞE" equal - that"s the newer capital form. The recommended way is to use casefold:

str.casefold()

Return a casefolded copy of the string. Casefolded strings may be used for caseless matching.

Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. [...]

Do not just use lower. If casefold is not available, doing .upper().lower() helps (but only somewhat).

Then you should consider accents. If your font renderer is good, you probably think "ê" == "ê" - but it doesn"t:

"ê" == "ê"
#>>> False

This is because the accent on the latter is a combining character.

import unicodedata

[unicodedata.name(char) for char in "ê"]
#>>> ["LATIN SMALL LETTER E WITH CIRCUMFLEX"]

[unicodedata.name(char) for char in "eÃÇ"]
#>>> ["LATIN SMALL LETTER E", "COMBINING CIRCUMFLEX ACCENT"]

The simplest way to deal with this is unicodedata.normalize. You probably want to use NFKD normalization, but feel free to check the documentation. Then one does

unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê")
#>>> True

To finish up, here this is expressed in functions:

import unicodedata

def normalize_caseless(text):
    return unicodedata.normalize("NFKD", text.casefold())

def caseless_equal(left, right):
    return normalize_caseless(left) == normalize_caseless(right)

Answer #2

How to convert string to lowercase in Python?

Is there any way to convert an entire user inputted string from uppercase, or even part uppercase to lowercase?

E.g. Kilometers --> kilometers

The canonical Pythonic way of doing this is

>>> "Kilometers".lower()
"kilometers"

However, if the purpose is to do case insensitive matching, you should use case-folding:

>>> "Kilometers".casefold()
"kilometers"

Here"s why:

>>> "Maße".casefold()
"masse"
>>> "Maße".lower()
"maße"
>>> "MASSE" == "Maße"
False
>>> "MASSE".lower() == "Maße".lower()
False
>>> "MASSE".casefold() == "Maße".casefold()
True

This is a str method in Python 3, but in Python 2, you"ll want to look at the PyICU or py2casefold - several answers address this here.

Unicode Python 3

Python 3 handles plain string literals as unicode:

>>> string = "–ö–∏–ª–æ–º–µ—Ç—Ä"
>>> string
"–ö–∏–ª–æ–º–µ—Ç—Ä"
>>> string.lower()
"–∫–∏–ª–æ–º–µ—Ç—Ä"

Python 2, plain string literals are bytes

In Python 2, the below, pasted into a shell, encodes the literal as a string of bytes, using utf-8.

And lower doesn"t map any changes that bytes would be aware of, so we get the same string.

>>> string = "–ö–∏–ª–æ–º–µ—Ç—Ä"
>>> string
"xd0x9axd0xb8xd0xbbxd0xbexd0xbcxd0xb5xd1x82xd1x80"
>>> string.lower()
"xd0x9axd0xb8xd0xbbxd0xbexd0xbcxd0xb5xd1x82xd1x80"
>>> print string.lower()
–ö–∏–ª–æ–º–µ—Ç—Ä

In scripts, Python will object to non-ascii (as of Python 2.5, and warning in Python 2.4) bytes being in a string with no encoding given, since the intended coding would be ambiguous. For more on that, see the Unicode how-to in the docs and PEP 263

Use Unicode literals, not str literals

So we need a unicode string to handle this conversion, accomplished easily with a unicode string literal, which disambiguates with a u prefix (and note the u prefix also works in Python 3):

>>> unicode_literal = u"–ö–∏–ª–æ–º–µ—Ç—Ä"
>>> print(unicode_literal.lower())
–∫–∏–ª–æ–º–µ—Ç—Ä

Note that the bytes are completely different from the str bytes - the escape character is "u" followed by the 2-byte width, or 16 bit representation of these unicode letters:

>>> unicode_literal
u"u041au0438u043bu043eu043cu0435u0442u0440"
>>> unicode_literal.lower()
u"u043au0438u043bu043eu043cu0435u0442u0440"

Now if we only have it in the form of a str, we need to convert it to unicode. Python"s Unicode type is a universal encoding format that has many advantages relative to most other encodings. We can either use the unicode constructor or str.decode method with the codec to convert the str to unicode:

>>> unicode_from_string = unicode(string, "utf-8") # "encoding" unicode from string
>>> print(unicode_from_string.lower())
–∫–∏–ª–æ–º–µ—Ç—Ä
>>> string_to_unicode = string.decode("utf-8") 
>>> print(string_to_unicode.lower())
–∫–∏–ª–æ–º–µ—Ç—Ä
>>> unicode_from_string == string_to_unicode == unicode_literal
True

Both methods convert to the unicode type - and same as the unicode_literal.

Best Practice, use Unicode

It is recommended that you always work with text in Unicode.

Software should only work with Unicode strings internally, converting to a particular encoding on output.

Can encode back when necessary

However, to get the lowercase back in type str, encode the python string to utf-8 again:

>>> print string
–ö–∏–ª–æ–º–µ—Ç—Ä
>>> string
"xd0x9axd0xb8xd0xbbxd0xbexd0xbcxd0xb5xd1x82xd1x80"
>>> string.decode("utf-8")
u"u041au0438u043bu043eu043cu0435u0442u0440"
>>> string.decode("utf-8").lower()
u"u043au0438u043bu043eu043cu0435u0442u0440"
>>> string.decode("utf-8").lower().encode("utf-8")
"xd0xbaxd0xb8xd0xbbxd0xbexd0xbcxd0xb5xd1x82xd1x80"
>>> print string.decode("utf-8").lower().encode("utf-8")
–∫–∏–ª–æ–º–µ—Ç—Ä

So in Python 2, Unicode can encode into Python strings, and Python strings can decode into the Unicode type.

Answer #3

This is because strings are immutable in Python.

Which means that X.replace("hello";"goodbye") returns a copy of X with replacements made. Because of that you need replace this line:

X.replace("hello", "goodbye")

with this line:

X = X.replace("hello", "goodbye")

More broadly, this is true for all Python string methods that change a string"s content "in-place", e.g. replace,strip,translate,lower/upper,join,...

You must assign their output to something if you want to use it and not throw it away, e.g.

X  = X.strip(" 	")
X2 = X.translate(...)
Y  = X.lower()
Z  = X.upper()
A  = X.join(":")
B  = X.capitalize()
C  = X.casefold()

and so on.

Answer #4

In Python 3.3+ there is the str.casefold method that"s specifically designed for caseless matching:

sorted_list = sorted(unsorted_list, key=str.casefold)

In Python 2 use lower():

sorted_list = sorted(unsorted_list, key=lambda s: s.lower())

It works for both normal and unicode strings, since they both have a lower method.

In Python 2 it works for a mix of normal and unicode strings, since values of the two types can be compared with each other. Python 3 doesn"t work like that, though: you can"t compare a byte string and a unicode string, so in Python 3 you should do the sane thing and only sort lists of one type of string.

>>> lst = ["Aden", u"abe1"]
>>> sorted(lst)
["Aden", u"abe1"]
>>> sorted(lst, key=lambda s: s.lower())
[u"abe1", "Aden"]

Tutorials