Get started with Regex: Here’s how regular expressions work
Short for regular expression, regex is a handy way to create patterns that help match, find, and manage text. When you want to learn regex, it's best to start simply, and expand your knowledge as you find yourself needing more powerful expressions.
At first, regex examples will seem like a foreign language. Just looking at a regular expressions cheat sheet won't help; you first have to understand where to use regex and why you want to use it.
We'll provide you with a beginner's regex tutorial, a handy regex cheat sheet, and tell you about some apps to help you along the way.
What Does Regex Do?
The purpose of regex is not to code full programs. Instead, it's a method to get data from massive bodies of text. It's useful for many of the most popular programming languages today, like Java, JavaScript, C-based languages, Perl, Python, Delphi, Ruby, R, and many more.
If you've used HTML before, it's probably fair to say a regex expression is a lot like markup. It uses anchors, quantifiers, operators, classes, and flags to help you parse what's in the text you're asking it to search.
A regex expression is really trying to find what you've asked it to search for. When there's a regex match, it's verification your expression is correct. You could simply type 'set' into a Regex parser, and it would find the word "set" in the first sentence. You could also use 's.t' in the parser, which will find all words that begin with 's' and end with 't'. If you have to deal with a massive amount of text, this is a life-saver.
Getting Started with Regex
Keep in mind regex is an expression. It's meant to be used in your code as an expression, not as a coding language.
A great tool for getting started with regex is Expressions, a Mac app that gives you a standalone sandboxed environment to work with regex expressions. It has regex highlighting to show your matches, a minimalist interface, and handy reference chart at your fingertips. There's a really sharp live preview for regex matching, too.
Now let's get into the regular expression cheat sheet!
Regex Cheat Sheet
Here's a very simple cheat sheet for regex:
Anchors
- \A — Start of string
- \Z — End of string
- \b — Word boundary
- \B — Not word boundary
- \< — Start of word
- \> — End of word
- | — Matches previous OR next character
- ^Here – Matches any string that begins with 'Here'
- finish$ - Matches any string that ends with 'finish'
- ^Here finish$ - Matches any string that begins with 'Here' and ends with 'finish'
- here – Matches any string with 'here' in the string
Quanitifers
- xyz* - Matches strings which have xy in them, but not necessarily z.
- xyz+ - Matches strings which have xy followed by at least one z.
- xyz? - Matches strings which have xy and either zero or one z.
- xyz{2} - Matches strings which have xy followed by exactly two z.
- xyz{2, } - Matches strings which have xy followed by two or more z.
- xyz{2, 8} - Matches strings which have xy followed by at least 2 z, and up to eight z.
- x(yz)* - Matches strings which have x followed by zero or more uses of yz.
- X(yz){2, 8} - Matches strings which have x followed by two through 8 uses of the sequence yz.
As you see, our regex examples are starting to get a little more complex – almost mathematical! Now, let's get into operators, which can expand on your regex parsing quite a bit.
Operators
- x(yz) - Matches strings where x is followed by either y or z.
With the 'or' operator, you can start to capture sequences that may be slightly off. Let's say in a body of text, you were discussing desserts. Your fingers were moving too fast, and you were typing 'dessetrs' half the time; instead of reading through it all, you could use the 'or' operator to discover your mistakes: e(rt).
The problem here is you'd also find a ton of other words. In the paragraph above, you'd get 'operator' and 'were' along with many other words. We can solve that in just a minute.
First, another regex operator:
- x[yz] Matches strings where x is matched, but not y and z.
Using this would return a lot of matches, too. It will find everything in the aforementioned paragraph which includes 'et' or 'er' – and that includes your 'dessetrs' error word, as well as 'desserts' and other words like 'discover.'
So how can we find the error word, and block the rest out?
Character Classes
- \d - Matches a single character that is a digit.
- \w - Matches a single character that is a word character (no numbers).
- \s - Matches any whitespace character.
- . - Matches any character.
- \t – Matches any tab.
- \r – Matches any return.
In a regex engine, you could enter 'et' and find your 'dessetrs' error word, but it would also show 'let's'. If this were a massive body of text, who knows how many times you'd find 'et' used similarly.
But you can also use character classes. If we entered 'et\w' into the regex parser, it would return our error word – and only our error word! Finally, the right regex for our needs.
Tools to learn, build, and test RegEx
Regex is handy for beginners, and really useful when you start to tinker with its broad set of features and functionality.
This is why we really suggest a Regex app like Expressions. It provides a safe environment to learn regex without worrying about screwing anything up. We're also big fans of TeaCode and CodeRunner; all three make for a solid coding environment.
You can speed up your coding with TeaCode, a text expander for Mac, and with plugins for IDEs like Atom, Visual Studio Code, JetBrains, and Sublime Text. It has over 80 ready-to-use shortcode expanders that blossom into code that can be compiled within your IDE.
If you're looking for a really handy lightweight IDE for Mac, CodeRunner may be just what you're looking for. It has support for over 25 languages and 230 syntax highlighters, and arrives in a familiar format with sidebars and customization options to suit anyone.
They're also available for free as part of a seven-day trial of Setapp, which is just $9.99 per month after the trial period ends. So give it a try!