Remove non-alphanumeric characters from a Python string | bobbyhadz (2024)

# Table of Contents

  1. Remove non-alphanumeric characters from a Python string
  2. Remove all non-alphabetic characters from String in Python

# Remove non-alphanumeric characters from a Python string

Use the re.sub() method to remove all non-alphanumeric characters from astring.

The re.sub() method will remove all non-alphanumeric characters from thestring by replacing them with empty strings.

main.py

Copied!

import remy_str = 'bobby !hadz@ com 123'# βœ… Remove all non-alphanumeric characters from stringnew_str = re.sub(r'[\W_]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobbyhadzcom123'# -----------------------------------------------# βœ… Remove all non-alphanumeric characters from string,# preserving whitespacenew_str = re.sub(r'[^\w\s]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobby hadz com 123'

The code for this article is available on GitHub

If you need to remove the non-alphabetic characters from a string, clickon the following subheading.

  • Remove all non-alphabetic characters from String in Python

The example uses the re.sub() method to remove all non-alphanumeric charactersfrom a string.

The re.sub() method returnsa new string that is obtained by replacing the occurrences of the pattern withthe provided replacement.

If the pattern isn't found, the string is returned as is.

The first argument we passed to the re.sub() method is a regular expression.

The square brackets [] are used to indicate a set of characters.

The \W (capital W) special character matches any character that is not aword character.

We remove all non-alphanumeric characters by replacing each with an empty string.

# Remove non-alphanumeric characters but preserve the whitespace

If you want to preserve the whitespace and remove all non-alphanumericcharacters, use the following regular expression.

main.py

Copied!

import remy_str = 'bobyb !hadz@ com 123'new_str = re.sub(r'[^\w\s]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobyb hadz com 123'

The code for this article is available on GitHub

The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT Unicode word characters, numbers, underscores or spaces.

The \w character is the opposite of the \W character and matches:

  • characters that can be part of a word in any language
  • numbers
  • the underscore character

The \s character matches Unicode whitespace characters like [ \t\n\r\f\v].

If you ever need help reading or writing a regular expression, consult theregular expression syntaxsubheading in the official docs.

The page contains a list of all of the special characters with many usefulexamples.

If your string has multiple spaces next to one another, you might have toreplace multiple consecutive spaces with a single space.

main.py

Copied!

import remy_str = 'bobby !hadz@ com 123'new_str = re.sub(r'[^\w\s]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobby hadz com 123'result = " ".join(new_str.split())print(result) # πŸ‘‰οΈ 'bobby hadz com 123'

The str.split() method splits the string on one or more whitespace characters and we join the list of strings with a single space separator.

Alternatively, you can use a generator expression.

# Remove non-alphanumeric characters from a string using a generator expression

This is a three-step process:

  1. Use a generator expression to iterate over the string.
  2. Use the str.isalnum() method to check if each character is alphanumeric.
  3. Use the str.join() method to join the alphanumeric characters.

main.py

Copied!

my_str = 'bobby !hadz@ com 123'new_str = ''.join(char for char in my_str if char.isalnum())print(new_str) # πŸ‘‰οΈ 'bobbyhadzcom123'new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ')print(new_str) # πŸ‘‰οΈ 'bobby hadz com 123'

The code for this article is available on GitHub

We used agenerator expressionto iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalnum() method to check if the currentcharacter is alphanumeric and return the result.

The str.isalnum()method returns True if all characters in the string are alphanumeric and thestring contains at least one character, otherwise, the method returns False.

main.py

Copied!

print('A'.isalnum()) # πŸ‘‰οΈ Trueprint('!'.isalnum()) # πŸ‘‰οΈ Falseprint('5'.isalnum()) # πŸ‘‰οΈ True

The generator object contains only alphanumeric characters.

The last step is to use the str.join() method to join the alphanumericcharacters into a string.

main.py

Copied!

my_str = 'bobby !hadz@ com 123'new_str = ''.join(char for char in my_str if char.isalnum())print(new_str) # πŸ‘‰οΈ 'bobbyhadzcom123'

The str.join method takes aniterable as an argument and returns a string which is the concatenation of thestrings in the iterable.

The string the method is called on is used as the separator between theelements.

For our purposes, we call the join() method on an empty string to join the alphanumeric characters without a separator.

If you want to remove the non-alphanumeric characters and preserve thewhitespace, use the boolean ORoperator.

main.py

Copied!

my_str = 'bobby !hadz@ com 123'new_str = ''.join( char for char in my_str if char.isalnum() or char == ' ')print(new_str) # πŸ‘‰οΈ 'bobby hadz com 123'

We used the boolean or operator, so for the character to be added to thegenerator object, one of the conditions has to be met.

The character has to be alphanumeric or it has to be a space.

# Remove non-alphanumeric characters from a string using filter()

You can also use the filter() function to remove all non-alphanumericcharacters from a string.

main.py

Copied!

my_str = 'bobby !hadz@ com 123'new_str = ''.join(filter(str.isalnum, my_str))print(new_str) # πŸ‘‰οΈ bobbyhadzcom123

The code for this article is available on GitHub

Thefnmatch.filter()method takes an iterable and a pattern and returns a new list containing onlythe elements of the iterable that match the provided pattern.

We passed the str.isalnum method to filter() so the method gets called witheach character in the string.

The filter method returns a new object containing only the characters forwhich the str.isalnum() method returned True.

The last step is to use the str.join() method to join the filter object intoa string.

# Remove all non-alphabetic characters from String in Python

The re.sub() method can also be used to remove all non-alphabetic charactersfrom a string.

main.py

Copied!

import remy_str = 'bobby! hadz@ com'# βœ… Remove all non-alphabetic characters from string (re.sub())new_str = re.sub(r'[^a-zA-Z]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobbyhadzcom'# -----------------------------------------------------# βœ… Remove all non-alphabetic characters from string, preserving whitespacenew_str = re.sub(r'[^a-zA-Z\s]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobby hadz com'

The code for this article is available on GitHub

The example uses the re.sub() method to remove all non-alphabetic charactersfrom a string.

The re.sub method returns anew string that is obtained by replacing the occurrences of the pattern with theprovided replacement.

main.py

Copied!

import remy_str = 'bobby! hadz@ com'new_str = re.sub(r'[^a-zA-Z]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobbyhadzcom'new_str = re.sub(r'[^a-zA-Z\s]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobby hadz com'

If the pattern isn't found, the string is returned as is.

The first argument we passed to the re.sub() method is a regular expression.

The square brackets [] are used to indicate a set of characters.

The caret ^ at the beginning of the set means "NOT". In other words, match all characters that are NOT letters.

The a-z and A-Z characters represent lowercase and uppercase letter ranges.

# Remove all non-alphabetic characters, but preserve the whitespace

If you need to remove all non-alphabetic characters and preserve the whitespace,use the following regular expression.

main.py

Copied!

import remy_str = 'bobby! hadz@ com'new_str = re.sub(r'[^a-zA-Z\s]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobby hadz com'

The code for this article is available on GitHub

The \s character matches Unicode whitespace characters like [ \t\n\r\f\v].

In its entirety, the regular expression matches all non-letters or whitespacecharacters.

If you ever need help reading or writing a regular expression, consult theregular expression syntaxsubheading in the official docs.

The page contains a list of all of the special characters with many usefulexamples.

If your string has multiple spaces next to one another, you might have toreplace multiple consecutive spaces with a single space.

main.py

Copied!

import remy_str = 'bobby! hadz@ com'new_str = re.sub(r'[^a-zA-Z\s]', '', my_str)print(new_str) # πŸ‘‰οΈ 'bobby hadz com'result = ' '.join(new_str.split())print(result) # πŸ‘‰οΈ 'bobby hadz com'

The str.split() method splits the string on one or more whitespace characters and we join the list of strings with a single space separator.

Alternatively, you can use a generator expression.

# Remove all non-alphabetic characters from String using generator expression

This is a three-step process:

  1. Use a generator expression to iterate over the string.
  2. Use the str.isalpha() method to check if each character is alphabetic.
  3. Use the str.join() method to join the alphabetic characters.

main.py

Copied!

my_str = 'bobby! hadz@ com'new_str = ''.join( char for char in my_str if char.isalpha())print(new_str) # πŸ‘‰οΈ 'bobbyhadzcom'new_str = ''.join( char for char in my_str if char.isalpha() or char == ' ')print(new_str) # πŸ‘‰οΈ 'bobby hadz com'

The code for this article is available on GitHub

We used a generator expression to iterate over the string.

Generator expressions are used to perform some operation for every element or select a subset of elements that meet a condition.

On each iteration, we use the str.isalpha() method to check if the currentcharacter is alphabetic and we return the result.

The str.isalpha method returns True if all characters in the string arealphabetic and there is at least one character, otherwise, the method returnsFalse.

main.py

Copied!

print('H'.isalpha()) # πŸ‘‰οΈ Trueprint('@'.isalpha()) # πŸ‘‰οΈ False

The generator object contains only alphabetic characters.

main.py

Copied!

my_str = 'bobby! hadz@ com'new_str = ''.join( char for char in my_str if char.isalpha())print(new_str) # πŸ‘‰οΈ 'bobbyhadzcom'

The last step is to use the str.join() method to join the alphabeticcharacters into a string.

The str.join() method takes aniterable as an argument and returns a string which is the concatenation of thestrings in the iterable.

The string the method is called on is used as the separator between theelements.

For our purposes, we call the join() method on an empty string to join the alphabetic characters without a separator.

If you want to remove the non-alphabetic characters and preserve the whitespace,use the boolean or operator.

main.py

Copied!

my_str = 'bobby! hadz@ com'new_str = ''.join( char for char in my_str if char.isalpha() or char == ' ')print(new_str) # πŸ‘‰οΈ 'bobby hadz com'

We used the boolean or operator, so for the character to be added to thegenerator object, one of the conditions has to be met.

The character has to be alphabetic or it has to be a space.

# Remove all non-alphabetic characters from String using filter()

This is a three-step process:

  1. Pass the str.isalpha() method and the string to the filter() function.
  2. The str.isalpha() method will filter out all non-letter characters.
  3. Use the str.join() method to join the result into a string.

main.py

Copied!

a_string = 'bobby123hadz456.com'only_letters = ''.join( filter( str.isalpha, a_string ))print(only_letters) # πŸ‘‰οΈ bobbyhadzcom

The code for this article is available on GitHub

The filter() functiontakes a function and an iterable as arguments and constructs an iterator fromthe elements of the iterable for which the function returns a truthy value.

We passed the str.isalpha() method to the filter() function.

The str.isalpha() method gets called with each character in the string and returns True if the character is a letter.

The last step is to use the str.join() method to join all matching charactersinto a string.

Which approach you pick is a matter of personal preference. I'd use thestr.isalpha() method with a generator expression because the approach is quitedirect and intuitive.

# Additional Resources

You can learn more about the related topics by checking out the followingtutorials:

  • Remove non-ASCII characters from a string in Python
  • Remove the non utf-8 characters from a String in Python
  • Remove punctuation from a List of strings in Python
  • How to remove Quotes from a List of Strings in Python
  • Remove characters matching Regex from a String in Python
  • Remove special characters except Space from String in Python
  • Remove square brackets from a List or a String in Python
  • How to Remove the Tabs from a String in Python
  • Remove Newline characters from a List or a String in Python
Remove non-alphanumeric characters from a Python string | bobbyhadz (2024)

References

Top Articles
Latest Posts
Article information

Author: Kelle Weber

Last Updated:

Views: 6107

Rating: 4.2 / 5 (53 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Kelle Weber

Birthday: 2000-08-05

Address: 6796 Juan Square, Markfort, MN 58988

Phone: +8215934114615

Job: Hospitality Director

Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.