- Python Regular Expression's Cheat Sheet (borrowed from pythex) Special Characters. Escape special characters. Matches any character. ^ matches beginning of string. $ matches end of string. 5b-d matches any chars '5', 'b', 'c' or 'd'. ^a-c6 matches any char except 'a'.
- A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression This blog post gives an overview and examples of regular expression syntax as implemented by the re built-in module (Python 3.8+).
- Example 1: Extract all characters from the paragraph using Python Regular Expression. Import re pattern = r'.' Patternregex = re.compile(pattern) result = patternregex.findall(para) print(result) #output:- 'G', 'a', 'm', 'e', ' ', 'o', 'f', ' ', 'T', 'h', # 'r', 'o', 'n', 'e', 's', ' ', 'i' etc.
Match match takes two arguments- a pattern and a string. If they match, it returns the string. Else, it returns None. Let’s take a few Python regular expression match examples. print(re.match('center','centre')) print (re.match ('center','centre')) print (re.match ('center','centre')) Output. Pythex is a quick way to test your Python regular expressions. Try writing one or test the example. Match result: Match captures: Regular expression cheatsheet Special characters escape special characters. Matches any character ^ matches beginning of string $ matches end of string.
I’m sitting in front of my computer refactoring Python code and have just thought of the following question:
Can You Use a Regular Expression with the Python string.startswith()
Method?
The short answer is no. The string.startswith()
method doesn’t allow regular expression inputs. And you don’t need it because regular expressions can already check if a string starts with a pattern using the re.match(pattern, string)
function from the re
module.
In fact, shortly after asking the question, I realized that using a regex with the startswith()
methoddoesn’t make sense. Why? If you want to use regular expressions, use the re
module. Regular expressions are infinitely more powerful than the startswith()
method!
For example, to check whether a string starts with 'hello'
, you’d use the regex 'hello.*'
. Now you don’t need the startswith()
method anymore because the regex already takes care of that.
If you already learned something from this tutorial, why not joining my free Python training program? I call it the Finxter Email Computer Science Academy—and it’s just that: a free, easy-to-use email academy that teaches you Python in small daily doses for beginners and pros alike!
How Does the Python startswith() Method Work?
Here’s an overview of the string.startswith()
method:
Let’s look at some examples using the Python startswith()
method. In each one, I will modify the code to show different use cases. Let’s start with the most basic scenario.
Related article:Python Regex Superpower – The Ultimate Guide
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Python startswith() — Most Basic Example
Suppose you have a list of strings where each string is a tweet.
Let’s say you work in the coffee industry and you want to get all tweets that start with the string 'coffee'
. We’ll use the startswith()
method with a single argument:
There is only one tweet in our dataset that starts with the string 'coffee'
. So that is the only one printed out.
Python startswith() — Optional Arguments
The startswith() method has two optional arguments: start
and end
. You can use these to define a range of indices to check. By default startswith checks the entire string. Let’s look at some examples.
The start argument tells startswith()
where to begin searching. The default value is 0 i.e. it begins at the start of the string. So, the following code outputs the same result as above:
What happens if we set start=7?
Regex Cheat Sheet Php
Why does it print 'i like coffee'
? By calling the find() method, we see that the substring 'coffee'
begins at index 7.
Hence, when checking tweet.startswith('coffee', 7)
for the tweet 'i like coffee'
, the result is True
.
Let’s add another argument – the end index – to the last snippet:
Nothing is printed to the console. This is because we are only searching over 2 characters – beginning from index 7 (inclusive) and ending at index 9 (exclusive). But we are searching for ‘coffee’ and it is 6 characters long. As 6 > 2, startswith()
doesn’t find any matches and so returns nothing.
Now that you know everything about Python’s startswith method, let’s go back to our original question:
Can You Use a Regular Expression with the Python startswith() Method?
No. The startswith method does not allow for a regular expressions. You can only search for a string.
A regular expression can describe an infinite set of matching strings. For example, 'A*'
matches all words starting with 'A'
. This can be computationally expensive. So, for performance reasons, it makes sense that startswith()
doesn’t accept regular expressions.
Instead, you can use the re.match()
method:
re.match()
The re.match(pattern, string)
method returns a match object if the pattern
matches at the beginning of the string
. The match object contains useful information such as the matching groups and the matching positions. An optional argument flags
allows you to customize the regex engine, for example to ignore capitalization.
Specification: re.match(pattern, string, flags=0)
The re.match()
method has up to three arguments.
pattern
: the regular expression pattern that you want to match.string
: the string which you want to search for the pattern.flags
(optional argument): a more advanced modifier that allows you to customize the behavior of the function. Want to know how to use those flags? Check out this detailed article on the Finxter blog.
Return Value:
The re.match()
method returns a match object. You can learn everything about match objects and the re.match() method in my detailed blog guide:
Here’s the video in case you’re more the multimodal learner:
But is it also true that startswith
only accepts a single string as argument? Not at all. It is possible to do the following:
Python startswith() Tuple – Check For Multiple Strings
This snippet prints all strings that start with either 'coffee'
or 'i'
. It is pretty efficient too. Unfortunately, you can only check a finite set of arguments. If you need to check an infinite set, you cannot use this method.
What Happens If I Pass A Regular Expression To startswith()?
Let’s check whether a tweet starts with any version of the 'coffee'
string. In other words, we want to apply the regex 'coff*'
so that we match strings like 'coffee'
, 'coffees'
and 'coffe'
.
This doesn’t work. In regular expressions, *
is a wildcard and represents any character. But in the startswith() method, it just means the star character '*'
. Since none of the tweets start with the literal string 'coff*'
, Python prints nothing to the screen.
So you might ask:
What Are The Alternatives to Using Regular Expressions in startswith()?
There is one alternative that is simple and clean: use the re module. This is Python’s built-in module built to work with regular expressions.
Success! We’ve now printed all the tweets we expected. That is, all tweets that start with “coff” plus an arbitrary number of characters.
Note that this method is quite slow. Evaluating regular expressions is an expensive operation. But the clarity of the code has improved and we got the result we wanted. Slow and successful is better than fast and unsuccessful.
The function re.match()
takes two arguments. First, the regular expression to be matched. Second, the string you want to search. If a matching substring is found, it returns True. If not, it returns False. In this case, it returns False for “to thine own self be true” and True for the rest.
So let’s summarize the article.
Summary: Can You Use a Regular Expression with the Python startswith Method?
No, you cannot use a regular expression with the Python startswith
function. But you can use the Python regular expression module re
instead. It’s as simple as calling the function re.match(s1, s2)
. This finds the regular expression s1
in the string s2
.
Python Startswith() List
Given that we can pass a tuple to startswith()
, what happens if we pass a list?
Python raises a TypeError. We can only pass a tuple to startswith(). So if we have a list of prefixes we want to check, we can call tuple() before passing it to startswith.
This works well and is fine performance wise. Yet, one of Python’s key features is its flexibility. So is it possible to get the same outcome without changing our list of letters to a tuple? Of course it is!
We have two options:
- any + list comprehension
- any + map
The any() function is a way to combine logical or statements together. It takes one argument – an iterable of conditional statements. So instead of writing
We write
This is much nicer to read and is especially useful if you are using many mathematical statements. We can improve this by first creating a list of conditions and passing this to any().
Alternatively, we can use map instead of a list comprehension
Both have the same outcome. We personally prefer list comprehensions and think they are more readable. But choose whichever you prefer.
Python Regex Course
Google engineers are regular expression masters. The Google search engine is a massive text-processing engine that extracts value from trillions of webpages.
Facebook engineers are regular expression masters. Social networks like Facebook, WhatsApp, and Instagram connect humans via text messages.
Amazon engineers are regular expression masters. Ecommerce giants ship products based on textual product descriptions. Regular expressions rule the game when text processing meets computer science.
If you want to become a regular expression master too, check out the most comprehensive Python regex course on the planet:
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.
Related Posts
By Alex Yang, Dataquest
This cheat sheet is based on Python 3’s documentation on regular expressions. If you're interested in learning Python, we have a free Python Programming: Beginner course for you to try out.
Download the cheat sheet here
Special Characters
^
| Matches the expression to its right at the start of a string. It matches every such instance before each n
in the string.
$
| Matches the expression to its left at the end of a string. It matches every such instance before each n
in the string.
C# Regex Cheat Sheet
.
| Matches any character except line terminators like n
.
| Escapes special characters or denotes character classes.
A|B
| Matches expression A
or B
. If A
is matched first, B
is left untried.
+
| Greedily matches the expression to its left 1 or more times.
*
| Greedily matches the expression to its left 0 or more times.
?
| Greedily matches the expression to its left 0 or 1 times. But if ?
is added to qualifiers (+
, *
, and ?
itself) it will perform matches in a non-greedy manner.
{m}
| Matches the expression to its left m
times, and not less.
{m,n}
| Matches the expression to its left m
to n
times, and not less.
{m,n}?
| Matches the expression to its left m
times, and ignores n
. See ?
above.
Character Classes (a.k.a. Special Sequences)
w
| Matches alphanumeric characters, which means a-z
, A-Z
, and 0-9
. It also matches the underscore, _
.
d
| Matches digits, which means 0-9
.
D
| Matches any non-digits.
s
| Matches whitespace characters, which include the t
, n
, r
, and space characters.
S
| Matches non-whitespace characters.
b
| Matches the boundary (or empty string) at the start and end of a word, that is, between w
and W
.
B
| Matches where b
does not, that is, the boundary of w
characters.
A
| Matches the expression to its right at the absolute start of a string whether in single or multi-line mode.
Z
| Matches the expression to its left at the absolute end of a string whether in single or multi-line mode.
Sets
[ ]
| Contains a set of characters to match.
[amk]
| Matches either a
, m
, or k
. It does not match amk
.
[a-z]
| Matches any alphabet from a
to z
.
[a-z]
| Matches a
, -
, or z
. It matches -
because escapes it.
[a-]
| Matches a
or -
, because -
is not being used to indicate a series of characters.
[-a]
| As above, matches a
or -
.
[a-z0-9]
| Matches characters from a
to z
and also from 0
to 9
.
[(+*)]
| Special characters become literal inside a set, so this matches (
, +
, *
, and )
.
[^ab5]
| Adding ^
excludes any character in the set. Here, it matches characters that are not a
, b
, or 5
.
Groups
( )
| Matches the expression inside the parentheses and groups it.
(? )
| Inside parentheses like this, ?
acts as an extension notation. Its meaning depends on the character immediately to its right.
(?PAB)
| Matches the expression AB
, and it can be accessed with the group name.
(?aiLmsux)
| Here, a
, i
, L
, m
, s
, u
, and x
are flags:
a
— Matches ASCII onlyi
— Ignore caseL
— Locale dependentm
— Multi-lines
— Matches allu
— Matches unicodex
— Verbose
(?:A)
| Matches the expression as represented by A
, but unlike (?PAB)
, it cannot be retrieved afterwards.
(?#...)
| A comment. Contents are for us to read, not for matching.
A(?=B)
| Lookahead assertion. This matches the expression A
only if it is followed by B
.
A(?!B)
| Negative lookahead assertion. This matches the expression A
only if it is not followed by B
.
(?<=B)A
| Positive lookbehind assertion. This matches the expression A
only if B
is immediately to its left. This can only matched fixed length expressions.
(?<!B)A
| Negative lookbehind assertion. This matches the expression A
only if B
is not immediately to its left. This can only matched fixed length expressions.
(?P=name)
| Matches the expression matched by an earlier group named “name”.
(...)1
| The number 1
corresponds to the first group to be matched. If we want to match more instances of the same expresion, simply use its number instead of writing out the whole expression again. We can use from 1
up to 99
such groups and their corresponding numbers.
Popular Python re module Functions
re.findall(A, B)
| Matches all instances of an expression A
in a string B
and returns them in a list.
re.search(A, B)
| Matches the first instance of an expression A
in a string B
, and returns it as a re match object.
re.split(A, B)
| Split a string B into a list using the delimiter A
.
re.sub(A, B, C)
| Replace A
with B
in the string C
.
Useful Regular Expressions Sites for Python users
Bio: Alex Yang is a writer fascinated by the things code can do. He also enjoys citizen science and new media art.
Original. Reposted with permission.
Regular Expression Cheat Sheet Python
Related: