4. How computers do stuff

Run Forest, run

4.1. Review

You know now how to run commands in the terminal and how a command is assembled. You can distinguish parameters from commands. You know how to use the built-in help system. You can easily navigate through the file tree and look at files.

4.1.1. Test questions

  1. Identify the command and all arguments:

    • cat /proc/meminfo
    • ls -l -h ~
    • less /proc/uptime
    • pwd
    • man man
  2. Name the four most essential components of your computer.

  3. Explain what files are for. Give a reason why we need directories.

  4. Name 3 responsibilities of the kernel.

4.2. I charge you a bit

You’ve already learnt about the file structure. You know that files are used to store data and are organized in directories. You also know that files are kept on the hard disc, so when you turn off the machine the files still persist (i.e. are still there when you turn the machine back on). Files are one of the most important concepts in computers, as we store anything - and I mean everything - in a file. But files don’t do stuff for themselves. They are a mere container, a box, to store some content.

You’ve surely heard about Memory and the [CPU]. Let’s explain these two parts quickly. The CPU is responsible for all the calculations. This means, it can add, subtract, multiply and divide like crazy. It’s really insanely fast but at the same time it can hardly remember anything. That’s because it does not have any memory [1]. Instead, the memory is a seperate component, often called [RAM] or main memory. The CPU and memory work closely together. Basically, the CPU loads parts of the memory, then does all the computation and stores the result in the memory again.

Principally, the main memory and the hard disc are not that different. Both are storage systems. You could actually use a hard disc instead of main memory. We usually don’t do this because the hard disc is extremely slow, at least compared to main memory. Since main memory is fast, why don’t we use it for storage of files? Well, RAM only remembers stuff as long as there’s power. When you turn off your computer, all data in the main memory is lost but the hard disc persists.

So how does this information help us? Well, you’ve only just discovered how computers - or any electronic device - work. Let’s now put the facts together. There’s the disc keeping all persistent data in files. This means that also programs are stored in files - surely a program is persistent, as you still have it there after a reboot. So yes, a program is nothing more than a file on your disc. It does not contain text, however, but instructions the computer can understand and execute (the code). But as stated, the file is only a container, it does not do anything on its own. When a program is to be run, it is copied from the disc to main memory. From there, the CPU can directly read it and execute the program code it contains, line by line.

_images/program_lifecycle.jpg

Any questions? Let’s go over some exercises to let the new knowledge sink in

4.2.1. Exercises

  1. Do a little online research:

    • What’s the price per megabyte for a hard disk? What for main memory?
    • What’s the access speed (MB/s reading) of a hard disk? And of main memory?
  2. When you enter text in leafpad but don’t save it - where is it stored (memory or hard disk)?

  3. When you save the text in leafpad where is it stored (memory or hard disk)?

  4. How is a program file different from a text file you write with leafpad?

4.3. Program

It was just defined what a program is: A file (probably several) containing instructions. But a program is a static thing - as it’s a file it doesn’t do anything (yet). If you want to use a program, you have to start it (seems trivial, right?). Let’s check out what programs currently run on your machine. We introduce a new command to do so:

top

By pressing M and P you can sort the list by memory or CPU consumption. You can return to the terminal via q. Note that for top the case matters - you must write the characters as shown here.

_images/top_mem.png

On the right, you see the program name. For example, SciTE is the text editor which I use to write this tutorial right now. The time column to the left of the name tells you for how long the program runs already (so I’ve started writing less than 20 minutes ago). Also interesting are the %CPU and %MEM columns - they show you which program consumes the most system resources. For example, if the %MEM value is very high - let’s say 80%, you will soon run out of memory and into trouble. top also presents the overall memory statistics (KiB Mem-row). If the free value is very low your system will soon become very slow [2]. We will not discuss the meaning of the other columns now, they will be explained later on.

4.3.1. Exercises

  1. Open top
  2. Which program consumes the most memory?
  3. Which program consumes the most CPU?
  4. How much memory is used?
  5. How much swap space is free?

4.4. Command

Have a look at top again.

_images/top_mem.png

Before, you were told that the program name is denoted in the rightmost column. Well, in the column header it says COMMAND there. You’ve used the term command before, now we say it’s the program name. Confused? Actually these terms are closely related. You might have guessed - the command you type is exactly the program name. All you do on the command line is starting programs. You do so by giving the program name. Consider the already well known command

ls /

When you execute the command, the computer checks if a program called ls is installed. If so, it is started. A program can produce text output. In the case of ls, that’s the list of files and directories. While in principle you could do anything with the output, the terminal simply displays it.

So that explains all you see on the terminal - the command line (starting programs) and text (program output). But a program is not restricted to text only. You want to start a graphical program from the terminal? No problem, just do it:

xterm

Ok, this opens another terminal, how impressive. Really, you can do whatever you want and you open a new terminal? You must really love the terminal by now. You can’t even use both terminals, you’ll notice that the original command line is ‘blocked’. Only once you exit the new terminal window, you can continue in the first.

_images/xterm.png

Note

The command you type in the terminal is not always what you’d expect. For example, if you want to start LibreOffice, you’d have to type soffice.

4.5. Process

Only one step is missing until absolution. Have you ever started a program twice? Of course, this is possible. Let’s have a look at this. You’ll need 3 terminals to run the next example (the ones where you start the leafpad can’t be used until leafpad is closed - just like xterm before). The first two commands use leafpad to show you a file. The files are probably empty, if you like you can type something.

leafpad /tmp/foo.t
leafpad /tmp/bar.t

Next, let’s show at the program list, similar to what you did with top before. But this time, we use a different command.

ps u

Usually, top is used for system monitoring (memory and CPU consumption). It automatically updates the process list and you can sort easily. However, it also truncates the list so that it fits your screen. If you want the complete list of all processes and don’t need the regular updates, then ps is the better alternative.

Your screen could look somewhat like in the image below:

_images/ps_leafpad.png

In the topmost terminal, you now see the leafpad twice in ps‘s output. That’s because you ran the program leafpad two times. Not really unexpected, is it?

There’s a slight difference between a program and a process. The program is a static thing, a collection of files lying somewhere on the hard dics. It is typically installed once. When you run a program, a process (or program instance) is created. So the process is the running program, the ‘living’ thing. It can run, do stuff, receive input, produce output and do more stuff. You can have several processes (of the same program) running at the same time. When you close the program (e.g. via the x button), you actually terminate its process. The process is then really finished, all traces of it are removed. You’d be surprised though, if terminating a process uninstalls the program.

Why this distinction? In the example, the leafpad was opened twice, each one reading a (different) file. So even though it’s the same program, each process works with its own file. Another example: Imagine you start ls in two different terminals at two different paths at the same time. Of course you’d expect ls to report the directory contents from where it was started - even though a second ls runs simultaneously. So each process has to run independently of other instances of the same program.

Generally speaking each process get some resources. That’s files, CPU time, memory but also access to the network or other hardware [3]. It also gets some information about its environment, like the working directory it was started from and the arguments you used to start. You can also call this the context a program is started in. All this doesn’t matter (or is not known) when the program is not running and may be different at each program startup.

Note

For demonstration, we have to be a bit careful since some programs (like firefox) don’t actually start twice. They realize that they were already started and open a new window in the original process instead of creating a second one. Keep this in mind if you start firefox from the terminal. The same is true for LibreOffice.

4.5.1. Exercises

  1. Explain the difference between a program, command and process in your own words.
  2. Open LibreOffice
  3. Check the processes in top, try to find the just started one.
  4. How much memory does LibreOffice need?
  5. Open LibreOffice again and watch how top reacts.
  6. Close LibreOffice and watch how top reacts.

4.6. Dial M for Murder

You’ve already seen how a process can be started on the command line. How about stopping? Perhaps you didn’t notice but in all the examples so far you terminated the process somehow or it finished on its own. An example of the former is less: You close the process when you press q. For the latter, let’s have a look at ls: After having written something on the terminal it has fulfilled its purpose, so it terminates itself. When a process, started from the terminal, finishes or gets closed you end up at the command line again. In the graphical examples, you didn’t see the command line (and couldn’t type another command) until you’ve closed the window with the x button (terminates graphical programs).

Sometimes you want to abort a process prematurely, i.e. before it would close itself. Sometimes it’s not possible to do so from the program instance (like less). For example, you might have typed (don’t do this!) ls -lR / by mistake and don’t want to wait until it has finished (which might take quite some time).

You already know one method of aborting commands: Ctrl-c. If you press this key combination, the currently running process is asked to terminate immediately. Sometimes we cannot do this, so let’s talk alternatives.

Like in the example before, start leafpad from the terminal.

leafpad

Save and close all other running leafpads before continuing. If you fail to do so, they will be closed for you without warning. Open a second terminal and then run the next commands.

ps u

You should indeed see the leafpad process. Only a single leafpad process (if you considered the warning above). Now, let’s terminate it.

killall leafpad

This command closes all running leafpads. It does exactly the same as Ctrl-c but not only for one process but any named leafpad.

_images/killall_leafpad.png

4.6.1. Exercises

  1. Find out what ls -lR / does.
  2. Start leafpad from the terminal. Terminate it using the same terminal.
  3. Start leafpad from the terminal. Terminate it using a different terminal.

4.7. PID

Let’s assume you have more than one leafpad running. Wait, let’s actually do this. In two terminals, do

leafpad /tmp/foo.t
leafpad /tmp/bar.t

Now you’re set to go with two running leafpads. Say, you want to close the first but not the second one. So you cannot use killall, as this would close both processes. Somehow we have to select only a single one of them. Let’s again show at our process list.

ps u

On the left side, there’s a column PID (Process ID). A PID is an unique number which is automatically assiged to any new process. With the PID, we can identify any process. Now, let’s terminate the first leafpad. In the command below, replace <PID> with the actual PID of the first leafpad. Run it in a new terminal and watch your first leafpad die.

kill <PID>

In the example, the process of the first leafpad was killed. So its window disappears and the we’re informed that the process was Terminated. Like when the process would have been closed normally, the prompt returns and we can use the command line again. Unlike killall which closed all leafpads, we now only terminated one of them.

_images/kill_leafpad_A.png

Sometimes a program ignores our kill attempt. So we have to make a second, more serious one. Let’s try this on the second leafpad, now the only running one. Again, replace the <PID> in the command below with the PID of the second leafpad process. Then run the command and again watch a leafpad being terminated.

kill -9 <PID>

If you add the -9 to the kill command, the program cannot refuse to shut down. However, you should only use this variant if it is absolutely necessary. A forced program shutdown may lead to file corruption or inconsistencies: Imagine you terminate a process while it writes a file - the first half is already written but not the second one.

_images/kill_leafpad_B.png

Again the leafpad window gets closed, but in contrast to the first kill, now the terminal states Killed. This again reflects the slight difference between the two kill versions.

4.7.1. Exercises

  1. Start leafpad from the terminal. Terminate it using kill.
  2. Explain, in your own words, the difference between kill and kill -9
  3. Using ps, Find the highest pid of all running processes
  4. Using ps, find the highest pid of all running processes (yes, do it again!). What do you notice?
  5. Explain, in your own words, the difference between kill and killall
  6. Run kill 1. What happens?

4.8. Daemons

You’ve only seen your own processes until now. Let’s change that. Do the following:

ps -ef

Wow, that’s a lot more processes that it used to be. In fact, that’s all processes currently running. Most of them are unrelated to you and you certainly didn’t start them. That’s because these are Daemons. A Daemon is nothing extraordinary. It’s just a system process - started and maintained by the operating system. Daemons offer important services, like sound, networking or the graphical interface (see below).

But when you execute a command in the terminal you cannot use the terminal as long as the process is alive. You’ve noticed this already when starting the leafpad. If you only have to run a couple of windowed programs you probably don’t want to open a single terminal for every one to start. Also, you didn’t start the many daemons in a terminal, so where’s their terminal? How can they live without? You’ll figure out in a second!

You can start a leafpad again. But this time we want the terminal to remain usable. Try the following:

leafpad &

Bam! Immediately after the leafpad started, the terminal shows you the command line again. What happened? The & at the end of a command makes the process run in the background. It is started normally but then seperated from the terminal so that you return to the command line. You can confirm with ps that the process is actually running

ps u

Congratulations, you’ve just created a deamon. This also gives a new purpose for kill. Imagine the program you start doesn’t have graphical output, i.e. no window. How would you close the program then? Having its PID (easily obtained through ps) you can kill it (that’s nice enough without the -9). You know how this works, so let’s close the leafpad as an exercise :

kill <PID>
_images/kill_daemon.png

4.9. Some system processes

Before we finish this chapter, let’s briefly discuss some important system processes. You can get the full process list with ps:

ps -ef

There’s lots of processes enclosed in brackets ([, ]). These are kernel processes, meaning that you cannot start or manipulate them yourself. All others are processes started by you or the operating system. We cannot go through the whole list, so here’s a description of some commonly available services.

_images/ps_ef.png
Process Description
init The process which starts up your system. You’ll learn more details about it very soon.
syslog Manages log files, used for general system monitoring. Whenever a program feels like storing possibly important notifications it uses this daemon.
cron Periodically executes commands. If, for example, you want to run a daily backup this daemon takes care of running it at an appropriate time.
cupsd The printing daemon. Is very often installed but not mandatory.
exim An internal (local) mail delivery agent.
sshd The remote login service. You’ll discover the benefits of this tool later in the tutorial.
dhclient The DHCP client which sets up your network.
getty The most basic terminal login. Offers very impressive terminal magic.
udev The service which manages devices (i.e. hardware). When you plug in your USB stick, that’s when udev becomes active.
X The x window system deals with all graphical programs (windows and stuff). The X process is the underlying program handling these things for you, but there are many others involved.
bash What we usually call the terminal is actually also a program. Typically, we use bash, but others exist (sh, ksh).

4.9.1. Exercises

  1. In the terminal, start a new bash. Then use exit. What happens? Can you explain it?

4.10. Summary

You’ve seen the terms program, command and process. You know how they are related and how these concepts are different. You understand their importance for computers. You’ve seen how to keep track of processes in the terminal and how to start and stop them. You know about the PID, how you find it and how to use it. You’ve been exposed to some daemons and know how to master them.

4.10.1. Exercises

  1. What is a PID? What is it used for?
  2. Why shouldn’t you use kill -9 if you can close the program by other means?
  3. Run ls -l / in the background. What do you observe?
  4. What is a daemon?
  5. How many kernel processes are running on your system?
  6. Name 5 ways to terminate a program
  7. Why can’t you use the terminal when you start a graphical program?
  8. Read the man page of kill
  9. What’s a resource?
  10. Name two ways to find out how much memory you have.
  11. Imagine your system becomes suddenly very slow. You suspect a program running wild. What can you do about it?
  12. Where is a process stored?
  13. What is the lowest PID on your system?

4.11. Terminal Cheat Sheet

Command Example Description
ls ls / List directory contents.
pwd pwd Show the working directory.
cd cd / Change directory.
exit exit Close the terminal.
cat cat /proc/version Display file contents on the terminal.
less less /proc/cpuinfo A simple and small text viewer.
top top Live process and resource viewer.
ps ps -ef Show a long list of process information.
killall killall firefox Terminate a process by name.
kill   Terminate a process by PID.

4.12. Cheatsheet

  • top
  • xterm
  • leafpad [FILE]
  • killall PROGRAM_NAME
  • ps [-ef]
[CPU]Central Processing Unit
[1]This is almost true. The CPU has some small space for storing information, the registers. However, there’s only a lower two-digit figure of registers, not nearly enough to store anything useful. Think of it more as a piece of paper on which you write intermediate results of a complex calculation.
[RAM]Random Access Memory. This means that you can freely access any memory location.
[2]Remember the part about the hard disc being much slower than main memory? If you run out of main memory, the hard disc will actually be used as memory extension at the cost that the system becomes slower.
[3]The resources are managed by the kernel. It’s more or less anything that you have once in your system and needs to be distributed among the running processes. Like writing to the memory or hard disk - two processes shouldn’t write to the same memory location or file at the same time. In this case, the kernel decides what memory location (and how much) is given to which program.