Python Lowercase String

Introduction
What is a String?
Common String Operations
Python uses Unicode
Unicode encodings
The Python lower method on strings
The Python islower method
Summary
Next Steps

Introduction

In this article, we discuss some general aspects of Python strings that will help us understand the Python lowercase method, make Python strings lowercase, and check whether a Python string is lowercase.

What is a String?

A string in Python is a sequence of characters. It is surrounded by single quotes, double quotes or triple quotes. For example, 'laptop', "mobile phone", and '''television''' are valid strings in a Python program.

We can use square brackets to index a Python string. The syntax is shown below:

programming_language = "Python"
print(programming_language[0]) # prints P
print(programming_language[1]) # prints y
print(programming_language[2]) # prints t
print(programming_language[3]) # prints h
print(programming_language[4]) # prints o
print(programming_language[5]) # prints n

Common String Operations

Since a Python string is a sequence of characters, Python lets us perform common sequence operations. The syntax for such operations includes the following:

# The in and not in operators
>>> print('x' in 'xyz')
True
>>> print('p' not in 'abcpqr')
False

# The + (concatenation) operator
>>> print('Hello '+'World')
Hello World

# The * (repetition) operator
>>> print('ABC ' * 4)
ABC ABC ABC ABC

# The slice operator
>>> print('ABCDEFGH'[0:3])
ABC
>>> print('ABCDEFGH'[::-1])
HGFEDCBA

We can also use the len() function to find the number of characters in the given string, the Python min() function to get the character with the smallest Unicode, and the max() function that help us find the character with the largest Unicode from the original string.

The Unicode Standard gives a unique number for each character, no matter what device or language.

Python uses Unicode

Python is a human-friendly programming language. When programming in Python, one can easily forget that computers understand only numbers. When we write any string, we are actually working with the mapping between the numbers and characters that are getting displayed. The characters that are getting stored in the computer's memory are still a set of numbers. Unicode is one of the standards for this type of mapping. There was another very popular encoding standard called ASCII; it was invented for English speakers and could only encode the Latin alphabet. However, Unicode is an international standard and can encode all languages as well emojis.

A code point is a number that represents a character and a code unit is a sequence of bits representing that code point. Unicode supports different types of encodings with code units of varying length. It is important to note that Unicode uses the first 128 code points from ASCII.

In Python, we can use the ord() function to get the code point of a character:

>>> help(ord)
Help on built-in function ord in module builtins:
ord(c, /)
    Return the Unicode code point for a one-character string.
# Some examples:
>>> ord('A')
65
>>> ord('a')
97

We can use the chr() function to find out the character using its mapped number.:

# Some examples:
>>> chr(65)
'A'
>>> chr(97)
'a'

Let’s find out what numbers can be given to the chr() function:

>>> help(chr)
Help on built-in function chr in module builtins:
chr(i, /)
    Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.

Did you pay attention to 0x10ffff? We need to understand how numbers are represented in binary, octal and hexadecimal number systems. To understand them better, let’s consider the following decimal number:

789 = 700 + 80 + 9 = 7x100 + 8x10 + 9x1 or 7x102 + 8x101 + 9x100

In the example above, we have a decimal number, it has base 10. If we apply the same rule on base 2, we can easily get its decimal form. It is important to note that base 2 means that the digits are always less than 2. Consider the following example:

(101)2 = 1x22 + 0x21 + 1x20 = 4 + 0 + 1 = 5 in decimal

Let's try the same thing on octal (base 8) and hexadecimal (base 16) numbers:

(45)8 = 4x81 + 5x80 = 32 + 5 = 37 in decimal (1995)16 = 1x163 + 9x162 + 9x161 + 5x160 = 6549 in decimal

Another thing to note is that hexadecimal values can contain anything from 0 to 9 and A to F (not case sensitive) where A is 10, B is 11 and so on. Also writing hexadecimal values is easier than writing binary values. Each hexadecimal digit maps to 4 binary digits.

# Example:
>>> bin(165)
'0b10100101'
>>> hex(165)
'0xa5'

Here, 0x means hexadecimal value and 0b means binary value. If we pay close attention to these values, we find that 165, when converted to hexadecimal, is a5 and the same in binary is 10100101 which is nothing but grouping of the binary digits of a with the binary digits of 5.

Let’s come back to 0x10ffff value and convert it to a decimal number:

(10ffff)16 = 1x165 + 0x164 + 15x163 + 15x162 + 15x161 + 15x160 = 1114111

It means that Unicode values can be from 0 to 1114111 (over a million) out of which around 110,000 are already assigned and we still have a lot of room for future needs.

Unicode encodings

As discussed earlier, we use hexadecimal values to represent Unicode code points as they are easier to use. We need a way to represent such points as bytes so as to store them.

The Unicode Standard defines the ways to represent these code points as bytes. These are known as Unicode encodings. UTF-8 is the most popular encoding for storing and transmitting Unicode. UTF stands for Unicode Transformation Format and UTF-8 encoding technique uses a variable number of bytes for each code point (one to four bytes): the higher the code point value, the more bytes required. Since each ASCII character is one byte (= 8 bits) and the first few Unicode points are same using the same values as ASCII, it means that ASCII is a subset of UTF-8.

On the other hand, the UTF-16 is a fixed width encoding technique in which each character is represented by two bytes, i.e. 16 bits.

# Examples:
>>> 'ß'.encode() # default utf-8 encoding
b'\xc3\x9f'
>>> 'ß'.encode(encoding='utf-16')
b'\xff\xfe\xdf\x00'

It is important to note that the decoding technique to be used should match with the encoding technique. So, when b'\xc3\x9f' is decoded using utf-16, we will get wrong data.

>>> b'\xc3\x9f'.decode() # default utf-8 encoding
'ß'
>>> b'\xc3\x9f'.decode(encoding='utf-16')
'?'

The above section helped us understand how Python uses encoding techniques to store and transfer Unicode points. Now let’s understand how Python converts a character to its lowercase form.

The Python lower method on strings

Python string methods provide us with all the control we need over strings. One of the methods is the str.lower() (also represented as string.lower()). This method converts all the characters of the string into lowercase characters and returns the new string. An example:

>>> word = "COMPUTER"
>>> print(word.lower())
computer

It is important to note that strings are immutable in Python. The lower method returns a copy of the string. So the above code does not change the actual word.

>>> print(word)
COMPUTER

Using the help function, we can understand what the lower method does.

>>> help(str.lower)
Help on method_descriptor:
lower(self, /)
    Return a copy of the string converted to lowercase.

Let’s have a look at another example:

>>> sentence = "This Will Get Converted To Lowercase."
>>> sentence_in_lowercase = sentence.lower()
>>> print(sentence)
This Will Get Converted To Lowercase.
>>> print(sentence_in_lowercase)
this will get converted to lowercase.

The lower method can also be used on strings that contain symbols, numbers and non-English letters.:

# For example:
>>> my_string = 'AbCdßen10'
>>> my_string_lower = my_string.lower()
>>> print(my_string_lower)
abcdßen10

Here, the ß is not changed because as per Unicode special casing, the lowercase code point of this German character is the same code point (0xdf) as the character itself. The 1 and 0 do not have any casing, so they also remain unchanged.

Python uses the Unicode Character Database and the lowercasing algorithm described in section 3.13 of the Unicode Standard. Python 3 also provides the str.upper() (also represented as string.upper()) method to convert strings to uppercase characters. The str.capitalize() method will capitalize the first character of every word in the string separated by whitespace. A swapcase() method is also available to change the lowercase characters to uppercase and vice versa.

The Python islower method

Python also gives us the str.islower() method which we can use to check if the string contains only lowercase letters or is a fully lowercase string in a Python program.:

>>> help(str.islower)
Help on method_descriptor:
islower(self, /)
    Return True if the string is a lowercase string, False otherwise.
    A string is lowercase if all cased characters in the string are lowercase and
    there is at least one cased character in the string.

The Python documentation says, "Cased characters are those with general category property being one of Lu (Letter, uppercase), Ll (Letter, lowercase), or Lt (Letter, titlecase)."

The following examples will give us more clarity on this method and cased characters:

>>> 'ß'.islower()
True
>>> '10'.islower()
False
>>> 'a10'.islower()
True

The str.islower() method when used on the string '10' gives False because 1 and 0 both are not cased characters. The method returns True only when there is at least one cased character in the string, as stated in the definition of the method. Similarly, Python 3 also provides the str.isupper() method check if a string contains only uppercase letters.

Summary

In this article, we discussed some Python string concepts including how strings are represented in Python, Unicode, how to make strings lowercase in Python, and how to check if a string is lowercase in Python.

Next Steps

The lower and islower methods are part of the Python standard library. This is a collection of functions and methods that come built-in with your Python installation and are ready for you to use. To see some of the other math operations you can perform out of the box, check out our tutorials on the floor() and sum() functions.

You may also be interested in learning how to check if a Python string contains a substring.

If you're interested in learning more about the basics of Python, coding, and software development, check out our Coding Essentials Guidebook for Developers, where we cover the essential languages, concepts, and tools that you'll need to become a professional developer.

Another way to boost your coding skills is to check out a platform like YoungWonks, a top after-school coding program for kids and teens. This is a WASC-accredited program that offers kids a world-class introduction to Python. Students here get started with real syntactical computer programming and gain conceptual depth of a college level program with a kid-friendly curriculum.

Thanks and happy coding! We hope you enjoyed this article. If you have any questions or comments, feel free to reach out to jacob@initialcommit.io.