REGEXEXTRACT: Google Sheets formulas explained

Look, I'll be honest with you. I used to be a super organized person until I started dabbling with spreadsheets. I'd get lost in the cells, lost in the taxonomies, lost in the functions. But then, I met REGEXEXTRACT in Google Sheets, and everything changed.

REGEXEXTRACT is like a spreadsheet superhero – it can help you to extract data from anywhere, no matter how messy the data may be. And it's all done using one simple formula that I am about to explain to you.

The basics of REGEXEXTRACT

The REGEXEXTRACT function looks at a string of text and extracts a specific piece of information. As mentioned earlier, REGEXEXTRACT is a Google Sheets formula, so let's get our hands on a spreadsheet and get started.

Here's a cute little example I came up with: Maybe you have a column of movie titles, but you only want to display the genre, it's easy to use the formula to extract the desired information. Here is what it would look like if you had a list of movies, and you wanted to isolate the genre alone:

 
    =REGEXEXTRACT(A2, "(.*)\:")

The first part of the formula (A2) is the cell in which the text we want to extract resides. The second part is the extraction criteria. It's like telling the function what data to look for and where to start extracting it. The ".*" tells the formula to look for any character (the period stands for "any character") and the asterisk ("*") indicates that this character can appear zero or more times.

The "\:" is used to indicate the end of the match criteria. Essentially, if REGEXEXTRACT finds that this "(.*)\: part of your data, then it will extract all data from the start until it finds a colon.

Here is an example: imagine that we have a movie titled "Avengers: Endgame". The formula then goes in to collect all of the data that appears after the initial quotation. After reading the colon character, the formula then stops collecting the data. Ergo, you're left with just the genre, which in this case is "Action/Adventure".

Complex REGEXEXTRACT functions

Of course, things don't have to be this simple or standard. You might have more complex data in your cells, which means that you will need to apply more complex REGEXEXTRACT functions. Confusing, right?

But that's okay. REGEXEXTRACT is very powerful and versatile, and there are countless ways to apply it. It's the perfect tool for extracting substrings according to user-defined patterns. Just play around with it until you get the hang of it!

For instance, let's say you are running a literature club and you want to extract all the last names of the various authors. With REGEXEXTRACT, you can create the following formula:

=REGEXEXTRACT(B3, "\b[A-Z][a-z]+,?\s[A-Z][a-z]{1,9}\b")

Here, "\b[A-Z][a-z]+,?\s[A-Z][a-z]{1,9}\b" is our extraction criteria. \b is a word-break character, meaning we're selecting for words, while [A-Z][a-z]+ selects for the first name of the author. The comma is optional, which is what the "?" signifies. \s selects for white space between the first name and the middle initial and [A-Z][a-z] selects for the middle initial itself, which needs to be between 1 and 9 characters long, which is what {1,9} signifies. Finally, once all these criteria are met, REGEXEXTRACT extracts the result.

The beauty of REGEXEXTRACT

The possibilities for REGEXEXTRACT are practically endless – it's a versatile and powerful tool. It can be a little tricky to learn, but it's well worth it in the end.

As technical as it may seem, REGEXEXTRACT is one of the simplest formulas you can ever use in Google Sheets. Once you get the hang of it, everything will be a piece of cake.

I hope you found this article helpful. And remember, if you're working with messy data, don't fret. REGEXEXTRACT has your back.

close
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.