bb|[^b]{2}
You’ve come a long way already. You know how to use the terminal to start and stop processes. You’re aware of how to operate with files, list, move, copy and view them. You can redirect command output to files. You know how to create chains of commands, feeding the output of one as the input of the other.
One of the advantages of using a text-based environment is that it’s simple. Much simpler than graphical output. Maybe not for the human in front of the machine, but for the computer. There’s a reason why text-based systems evolved before graphical ones! Text means less data and more structure - it’s very hard for the computer to organize an image and almost impossible to make any sense of its content. With text, this is more easy. The computer still cannot understand the meaning of the text (how you feel when saying “I love you”) but it can process it very efficiently. On an image, you cannot tell the computer “Look for all cups”. But it will easily find all occurences of the word “cups” in any file on your disk.
In fact, we have some tools which do exactly that.
grep 'ssh' /etc/services
The grep command searches for a string (ssh) in a file (/etc/services) and prints you all matching lines. The quotes are optional but a good practice. You’ll need them in any case if you’re search text includes whitespaces.
This example was very simple, trivial even. Quite handy but we can do more. Much more. Very often you’ll use the following form of the command.
cat /etc/services | grep 'ssh'
Still not extremely interesting in this case but let’s be clear about the implication: You can easily and quickly search for text in any command’s output. Which is way more fast and comfortable than using less or scanning through the output yourself. You need the PID of your bash? No problem:
ps -ef | grep bash
But read on to see why bash is such a popular tool.
In fact, there’s several commands which focus on text processing and text manipulation. They all provide some very basic functionality. In itself, the program won’t do much. But combined with input/outupt redirection and pipelining, they can be very powerful.
Command | Description |
---|---|
echo | Print some text |
cat | Output a file |
cut | Remove sections from a line |
tr | Change or delete characters |
tee | Write to the standard output as well as a file |
uniq | Remove repeated lines |
sort | Sort text |
diff | Show the difference between two files. Or two directories. |
wc | Word count. Counts also lines, bytes and more. |
Just for fun, maybe you can appreciate the sequence below.
echo "Hello world" | cut -d ' ' -f 1 | tee /tmp/teaout | tr -c -d '\ne[:upper:]' | cat -n | tr -d ' \t' | tr 1H By | cat - /tmp/teaout | sort -r | cat -n
The man pages of these commands explains the exact meaning of the various options. Some of the tools are mainly useful in scripts (cut, tr, tee) where you cannot (or don’t want to) process a program’s output yourself. Others are very handy to manipulate text files (uniq, sort), also ones not related to commands or program output. Especially sort is quite good to scan through unsorted program output more quickly.
The diff tool reaches very deep into the linux community and is used in many other places (patches, git). Any system changes with time. With diff you get a quick overview over what changed. You can either have a side-by-side view (-y) of the whole document or show the differences (plus context, -C NUM). And you can also see the difference between all files of two directories (-r) or just the summary of what files are different (-q -r).
What makes grep so powerful? So far you’ve only given a simple keyword to search for. But this section is called Regular expression (or regex for short), so what is this and how is it related to grep?
Consider an example you’ve seen before:
ls /dev/[sh]d[a-z]
The device name follows a certain scheme. The first two letters are either sd or hd. Then comes one additional letter (which identifies the device number). We don’t know yet what files will be there so we want to show any following this scheme. So we formalize the scheme as follows:
What’s in between the brackets ([, ]) gives you a choice over characters. You can either define single characters ([ahxY4]) or ranges ([0-9], [a-k]). But you can go further. Let’s talk quantities:
ls /dev/[sh]d[a-z][0-9]*
Now the last part demands any number (0-9). You know the selection with the brackets, but the star (*) was added. Does this look familiar? You’ve seen this one a couple of times and actually know it to be the wildcard. It’s like saying that the number [0-9] may occur an arbitrary number of times. So /dev/sda0, /dev/sda00, /dev/sda012 are all valid. But the wildcard also includes zero occurrences, so /dev/sda is listed also. When you put a wildcard (or another quantity specification) it is applied to the character or character choice given before. Like the range [0-9] in the example.
But in fact, with ls you don’t use regular expressions, they copied just the range and wildcard. Regex goes way further. It’s a language to specify a search term where you know some structure but not everything. It basically goes like this: You put a character (or a list thereof) first, then its quantity. The character is either a letter or a list, as seen above. The quantity can be the wildcard (*, zero or more occurences), a plus (+, at least one occurence), a question mark (?, zero or one occurence) or a specific number of occurrences in curly braces ({4}). If the quantity is one, you don’t put anything.
Note
The wildcard in ls means ‘any characters, an arbitrary number of times’. In regular expression, this is a bit different. There, the same character * means ‘an arbitrary number of times’, but says nothing about the character. You have to put the character in front, like so: a* (‘a’ repeated an arbitrary number of times). So you can write ls *.jpg but not ls | grep '*.jpg. The second command could for example be ls | grep 'jpg'.
hello
h[a-z]llo
hel{2}o
hel?lo
hel+o
help*lo
h[el]*lo
All of the examples above match “hello”, but some of them also match other words. In the last example, the range choice and the quantity was combined, leading to much more interesting words.
Regex | Words matched |
---|---|
hello | hello |
h[a-z]llo | hallo, hbllo, hcllo, ... |
hel{2}o | hello |
hel?lo | helo, hello |
hel+o | helo, hello, helllo, hellllo, ... |
help*lo | hello, helplo, helpplo, helppplo, ... |
h[el]*lo | hlo, helo, hllo, heelo, hello, hllllllo, hleellellellelelo, ... |
Note
In grep, you have to escape some characters. Meaning you have to put a backslash