5. Files

aaand it’s gone

5.1. Review

You have already seen files at many places and can read paths. You know how to navigate with the terminal and two ways of how to look at files. You know that programs are stored in files, like everything else. You know how to write a command and where to find information about its arguments.

5.1.1. Test questions

  1. Name two ways to reach your home directory. What is special about this directory?
  2. Explain how tab completion helps avoiding errors.
  3. What kind of data is stored in the memory? What data is on the hard disk?
  4. What’s the difference between an absolute and a relative path? How can you distinguish them?
  5. How can you check whether a path leads to a file or to a directory?

5.2. The tree of life

You’ve already seen the file tree and how to address a certain file or directory. You know that we use directories to organize the files, but don’t know how exactly. Probably, you’ve organized your own data on some scheme. For example, I have a Downloads folder where I put random stuff I copied from the internet and a data folder with some documents I wrote myself. Similarly, the operating system organizes its parts according to some scheme (though it’s more chaotic than mine).

Let’s have a closer look at the unix file tree. Again, start with the well-known ls command.

ls /
_images/ls_root.png

Then, let’s go into greater detail about these folders (and their subfolders).

5.2.1. Program folders

/bin
/sbin
/lib
/lib64
/usr
/etc
/opt
/tmp

It was explained that a program is a couple of files on the hard disk. Not surprisingly, there are many directories associated with installed programs. The directories listed above are all directly related to programs. The folders /bin (binaries) and /sbin (system binaries) hold the most essential program executables. Actually, it’s only the code required to start the program - a program can split its full code over several files. These additional files are stored in the directories /lib and /lib64 (library). So when you type a command on the command line, bash looks if there is a file with the name of your command in one of the bin directories. If so, it executes this program. The program then loads all the files (if any) from the lib directories itself.

The /usr (Unix system resources) contains more programs. You see the /usr/bin, /usr/sbin and /usr/lib directories again, which have the same purpose as the ones on the root level. Additionally, the /usr directory contains files related to programs but not containing executable code. Like icons, graphics in general, text files, man pages and so on. These files are usually placed in a subdirectory of /usr/share. To be more precise, this directory holds program related data which is normally not changed after installation.

_images/ls_usr.png

This last note mainly excludes configuration files. Programs can be configured on two levels: system-wide and per-user. The system-wide configuration is stored in /etc (et-cetera). It is commonly maintained by the system administrator and sets the default configuration. For user programs, it can usually be overwritten by configuration files in the user’s home (see below). /etc also holds the configuration for system programs, like daemons, which are not executed by a user and therefore don’t have user configuration.

_images/ls_etc.png

Finally, in the folders /usr/local and /opt, there’s more programs - these directories are usually not used by the operating system itself but reserved for customly installed programs (not via system tools). There’s no structure specified on these folders. Often a program installed there creates its own subdirectory, named after the program or vendor.

Perhaps a program needs to store some data temporarily in a file. That’s what the /tmp folder is there for. It can be read and written by anyone and no specific structure is imposed. You can also place files there if you like. But keep in mind that the directory is emptied at every system start.

So you know where programs are stored in general, but how to find out where a specific program is? You certainly don’t want to browse through all of these directories. Of course, there’s a command to find out.

whereis bash

This lists you the files related to the command you specify. From your knowledge about the file tree, you can figure out yourself what you’d expect in which file. In the example, you see the executable in /bin, a configuration file in /etc and the man page in /usr/share. Which nicely corresponds to the explanations given above.

_images/whereis_bash.png

5.2.2. System information

/proc
/sys
/var
/dev

Some directories contain data generated and maintained by the operating system. The folders /proc (process) and /sys allow getting runtime information from the operating system via files. You’ve already seen the /proc/meminfo. This file always contains up-to-date information about memory usage and you don’t need a program or anything to read it as it’s just a file. You can get much more than just the overall memory consumption, e.g. details about a single process or enabled network features. In fact, many tools you already know get their information from these files.

The /dev (devices) directory is also automatically generated, but for handling of devices. You should be careful doing stuff here, as it can have drastic effects. Similar to /proc, the files here are not actually files you created yourself and usually you don’t edit them, though you can read and write them.

You’ll see more of the devices later on, for now let’s just consider a couple of them. Let’s read some very interesting data.

cat /dev/zero

Abort any time you like via Ctrl-c. Didn’t work, you say? Let’s try again. Again abort whenever you’ve seen enough.

cat -A /dev/zero

You already know how to read man pages, so check out what the -A argument changes. Did you also figure out what’s special with /dev/zero? Think about it before reading on...

_images/cat_zero.png

You can let the above command run as long as you want, it will never finish. That’s because /dev/zero is infinitely large. Of course, this is not actually possible, that’s why it’s a special file. Anyhow, whatever you read from this file, it will always be - tataaa - zero. That’s why cat prints the same sequence over and over again.

Let’s try another file. If you forget the -A now, your terminal will probably become unusable, so make sure it’s there.

cat -A /dev/urandom

Again, abort manually after some time. The /dev/urandom file gives you a random sequence [1]. If you run the command a second time, you’ll notice that the output is indeed different. However, you shouldn’t cat this file for too long, as generating random numbers takes a lot of time so you don’t have too many of them.

_images/cat_urandom.png

Last but not least, sometimes you’ll stumble across /dev/null. That file is the ultimate black hole, the nirvana, the vast emptiness, where no man has gone before. Anything you write into this file is discarded. Not saved, not in a file, not temporary, it’s not even written to memory. It just ceases to exist.

5.2.3. Data folders

/root
/home

In terms of size, the usually largest part of the file tree is the user files. That’s where your Desktop is located and the default directory if you store files. And also the starting directory when you open a new terminal. This directory is called the home directory. Your home is a subfolder of the /home directory. Every user on the system has its own home directory. That’s where you store your personal files and the directory is often inaccessible by other users - so your private files stay private.

Since most work is performed in this directory, you can use the abbreviation ~. You can put a ~ in every place you’d write the path to your home directory. Consider the following examples for clarification.

ls ~
ls ~/Desktop
ls ~/..
cat ~/Desktop/ideas.t
cd ~

Note

The commands only work if you’re logged in as user shared. If you’re using a different user, the ~ takes you to that home, not the one in the example. The paths might be different, e.g. Desktop/ideas.t might not exist.

_images/ls-tilde.png

For cd there’s one more shortcut to the home directory:

cd
pwd

But let’s do some something a little more interesting. Check out the following command:

cd ~
ls -a

Suddenly, out of the blue, more files appeared. You should at least see the .bashrc. How about that? Any file or folder which name begins with dot (.) is hidden from the normal file listing. Only if you add the -a parameter, these files are shown. These hidden files are mainly used in your home directory to hide per-user configuration and per-user program data. For example, the .bashrc is a configuration file for the bash shell.

_images/ls-hidden.png

Do not confuse the hidden files with the current directory .. The current directory is only a dot, nothing more. No name after that. If you spot a dot in front of a name, you’re dealing with a hidden file (or directory). If it’s only a dot, you mean the current working directory. Sometimes you find a slash after the dot, indicating it’s a directory (but slashes are not part of a name either).

Let’s demonstrate:

ls
ls .

The same output in both cases. As expected.

Since we’re looking at the config files, let’s also do some changes. So far, you’ve never edited a file, only looked at them with less. For editing, you’ll need a new command.

nano .bashrc

You see the control keys on the bottom rows. The letter ^ means the Ctrl key, so you save your changes by pressing Ctrl-o. Scroll down until you see a line HISTSIZE=.... Now change the number there to something large, e.g. 100000. This number influences how far you can go back in your command history with the Up key. When you’re finished, save the document and exit nano.

_images/nano_bashrc.png

5.2.4. Exercises

  1. State three different ways to get to your home directory

  2. Locate the man page of ls

  3. Locate the man page of ps

  4. Explain why tab completion works for commands

  5. Open the file /tmp/foobar.t in nano. Write some lines, then look at the file using cat.

  6. Which of the following paths point to hidden files?

    • /home/michel/.bashrc
    • ./bash
    • ../doc
    • /etc/bash.bashrc
    • ./michel/.bash_history
  7. Locate all configuration files of bash in /etc and your home

  8. Locate all paths related to nano

  9. Locate all paths related to the program rulor

  10. How can you list hidden files?

  11. For the paths below, write them as short as possible:

    • /home/shared/./movies/../Desktop/
    • /etc/../dev/../usr/lib/./../lib/./../share
    • /etc/../home/shared/Desktop/./../movies/favorites/../././../../
    • /home/shared/./../shared/../
    • /././.././../../././../

5.2.5. Wild things

Let’s talk wildcards. You might have noticed that ls works on both files and directories. If you write a path to a directory as argument to ls, the contents of that directory will be listed. If you pick a path to a file instead, only that file is displayed. Maybe you’ve made use of this fact together with the long list -l to see file permissions.

There’s a middle ground. Let’s say you have a pattern in your filenames. For example, I often use the extension .t for text files. In Linux, file extensions have no special meaning, you can pick whatever you like. It’s just a habbit. But I can make my work a bit easier by doing so. Let’s say I’m interested in all text files (maybe I want to know what text files I have or would like to see their permissions).

cd /home/shared
ls *.t

The asterisk * or star character has a special meaning here. We read the whole argument *.t as “anything which ends with .t”. In a similar fashion I could list all files beginning with “chap”.

cd /home/shared/projects/linux/course/chapters
ls chap*

So the star character’s meaning is similar to “anything”. It can also be used in longer paths, such as

ls /home/shared/projects/linux/course/chapters/chap*
_images/wildcard_end.png

Some more commands support this so-called wildcard, like you’ve seen with ls now. You can judge yourself whether it makes sense to try. It can sometimes even be found in other places, like search engines or databases. If you want to know more about this topic, check out chapter Advanced terminal use.

5.2.5.1. Exercises

  1. List all files ending with ”.t” in /home/shared/data/personal

  2. List all files starting with “per” in /home/shared/documents

  3. While being in /tmp (cd /tmp), list all files ending with ”.t” in /home/shared/data/personal. Don’t use cd a second time!

  4. List all files ending with ”.jpg” in /home/shared/Desktop/pictures/europe/switzerland

  5. Judge wheter the wildcard makes sense for the following commands:

    • ls
    • cd
    • nano
    • cat
    • less`
  6. List all files starting with “chap” and ending with ”.rst” in /home/shared/projects/linux/course/chapters

  7. List all files starting with “D” in /home/shared. Can you make sense of what happens?

5.3. Faster than the speed of light

Surely you’ve wondered, “How should I ever find all those files again?”. Well, you could run a search for example. The command you need is find. Note the dot (.) right after the command. That’s the directory where the search starts and you’ll likely forget it. After, you can put search criteria.

find . -iname '.BASH*'

The -iname search criteria is what you’ll mostly use: It makes find search for names, ignoring the case. As in the example, you can even use the wildcard *. It means “any sequence of characters”. So you’re searching for any file (or directory) whose name must begin with ‘.BASH’ but may continue arbitrarily after that. Many other criteria are available, you can look them up in the man page yourself. find will search form its starting directory through all subdirectories and list all files that match the criteria. It might take some time, though.

_images/find_bash.png

Let’s again consider the wildcard. You can put the * in any place, also in front of the word. Then it means verything that ends with ‘.BASH’. Or you put one before, one after, changing the search for any file containing ”.BASH” anywhere in the name. But more importantly, you can use the wildcard in other commands as well!

ls .bash
ls .bash*
ls .*sh*

The first fails, because no filename matches ”.bash” exactly. But the second one shows you all files starting with ”.bash” (only in your current directory, not a search like find). The last one prints all hidden files (or directories) having “sh” somewhere in the name.

_images/ls_wildcard.png

The same works for other commands:

cat .bash*

You can use the wildcard for commands which expect files (and allow to pass several files at once, like ls or cat). Which is most commands, really. Very often you can also use the wildcard in text or other searches (like find). It will appear quite frequent when working with computers.

5.3.1. Exercises

5.4. A short history of space

When working with a system for some time, you easily run out of space. But how do you notice? Well, at some point your machine will simply refuse working. For example, you’ll get No space left on device errors when copying files. Or it won’t start anymore. Meaning you’ll get to the login screen but after you entered your credentials you only return to the login screen instead of your desktop [2]. Other errors are also possible, like generic file write errors or alike. So it’s a good idea to check the disk usage every once in a while. And certainly before you copy large files.

Let’s say for some reason you suspect having too little space remaining. First, check how much space is left on your hard disk.

df -h

You’ll see some paths on the right. You’re primarily interested in the line where the path is the root directory /. Unless you have a special setup (in this case you should know where to look), this shows you how much of your disk space you’ve already used (Use%, Used) and how much free space is left (Avail). Perhaps you now already know that you have plenty of space left, so not to worry and maybe check later.

For the sake of learning however, let’s say the used space strikes you as too much. Especially if your use gets close to 100%, you might have to delete, move or at least compress some files. Of course you want to do as little as possible. So you need to know which file removal brings you the most benefit. You can check file sizes with the ls command.

ls -lh

The file size is displayed in the fourth column. The -h parameter makes ls print a G, M or K for Gigabytes, Megabytes or Kilobytes, respectively. So, you search for the G and M and (re)move those files first.

_images/ls-long.png

Unfortunately, you’ll have many directories to go through. Especially if you used the machine for some time data gathers quickly and distributes nicely over many directories. To get a quick impression of how much you can do, check out the du (disk usage) tool. Change to the directory whose overall size you’d like to know, then run the following command.

du -hs /usr/local

As for ls, the -h makes it print sizes nicely. The -s parameter gives you a summary, showing only the size of the path you specified. If you omit this argument, the sizes of all subdirectories (and their subdirectories) are listed seperately. With -s, their sizes are still taken into account, but they are not seperately shown. The last part of the command (/usr/local) is the directory on which you are interested in.

_images/du.png

You can of course use an absolute or a relative path, as you desire. For example, if you’re interested in your current directory you can use the dot . instead. For example, getting the usage of your Desktop, without showing any subdirectories.

cd Desktop
du -hs .

The du command basically goes through all subfolders of the specified target directory and adds up the file sizes it contains. So you get a good impression where in the file tree the big files are located. You can start at the home directory, list all its subdirectories and then decide in which one you want to continue your investigation. This way, you’ll find unused but large files very quickly and get a good feeling for who or what consumes lots of disk space.

5.4.1. Exercises

  1. Report the size of your home directory

  2. Report the free disk space.

  3. Report the size of your totally usable disk space.

  4. Do the following:

    1. Open /tmp/foobar.t in nano
    2. Write a couple of lines.
    3. Save the file and close nano.
    4. Get the size of /tmp/foobar.t from ls
    5. Get the size of /tmp/foobar.t from du
    6. Compare the sizes.
    7. Discuss why there’s a difference with a collegue or a teacher.

5.5. I like to move it, move it

It’s nice to know that your disk is too full and you should do something about it - but it was never discussed how to do so. Let’s first be nice and not remove anything. Perhaps you have a second disk, maybe an external one, a USB dongle or some other means of storage attached. Moving files is very easy. We’ll use the mv command for this (maybe check its man page already!?).

First, let’s create a file. Use the nano text editor for that and write a couple of lines.

nano /tmp/foo.t

Now there’s a file you can move around. Let’s move it from the /tmp directory to your home folder. The ls around the mv are just there to demonstrate the effect.

ls /tmp/*.t ~/*.t
mv /tmp/foo.t ~/
ls /tmp/*.t ~/*.t

The first argument to mv is the source file, the second argument is the new destination. In this case the destination is the home directory. So far so good, not really spectacular is it? But mv can do a little more. In fact, renaming a file is essentially a move, you just don’t move it across a directory. Before, the destination was a directory, now let’s extend the destination with a file name.

ls ~/*.t
mv ~/foo.t ~/bar.t
ls ~/*.t

How about moving over a directory and renaming the file in one command? Well, that’s easy and nothing unexpected. Just combine the examples above.

ls /tmp/*.t ~/*.t
mv ~/bar.t /tmp/foo.t
ls /tmp/*.t ~/*.t

And we have the file in its original location at /tmp/foo.t again.

_images/mv_loop.png

So you figured out that your second hard disk is also full... shit. Moving won’t help, you’ll have to remove some files. Since you’ve just created a file, let’s remove this one. Be aware that if you remove a file in the terminal, there’s no way to restore it. So you have only one shot at the command below. If you want to try it a second time, you’ll have to re-create the file with nano.

ls /tmp/*.t
rm /tmp/foo.t
ls /tmp/*.t

Is it gone or what? And it didn’t even take long.

Warning

Removing files in the terminal is final. There’s no Trash or something similar, if a file is gone, it’s gone. It’s passed on. It is no more. It ceased to be. It expired and gone to meet its maker. It’s a stiff. Bereft of life, it rests in peace. It’s pushing dasies. Its metabolig process is now history. It’s off the twig. It hass kicked the bucket, it shuffled off its mortal coil, run down the curtain and joined the bleeding choir invisible. It’s an ex-file. So always check before you delete. Always. You can fix most things but not bringing back deleted files.

Let’s do a little experiment. Don’t worry, nothing can go wrong (hmm, that would make some excellent last words).

rm /tmp

Didn’t see a warning? Oh my, so it did go wrong. Seriously, you will see a warning about /tmp being a directory. This is because per default rm does only remove files, not directories. If you want to remove a directory, there’s rmdir. Let’s first create a directory, so we can remove it later. The command to create a directory is quite obviously

mkdir /tmp/foobar
ls -d /tmp/foobar

Now that there’s an (empty) directory, let’s try rmdir.

rmdir /tmp/foobar
ls -d /tmp/foobar

And the directory is gone again. Now that was a pretty little experiment in futility, we basically dug a hole and closed it right after. But you saw some new commands, so you got that going for you, which is nice.

_images/rm_dir.png

Warning

There’s the even more lethal rm -r. It removes files and directories (i.e. all subdirectories) in one go. This command is usually used if you have to delete a whole directory with many subdirectories. Otherwise you’d have to go through all directories manually and remove their contents with rm and rmdir. As useful it is, it is also very dangerous as you easily remove lots of files by mistake. NEVER run the command on the root directory - it deletes you whole system in no time. Always make sure you only remove the files you want before running this command!

5.5.1. Exercises

  1. Create the file /tmp/parrot with nano. Write some lines.
  2. Rename the file parrot to someBird
  3. Move the file parrot to /tmp/dead_or_alive
  4. Create the directory /tmp/sketches
  5. Create the directory /tmp/sketches/bird-like
  6. Move the file someBird to /tmp/sketches/bird-like
  7. Try to remove the directory /tmp/sketches with rmdir. Why does it fail?
  8. Create a directory coconut-like in /tmp/sketches/bird-like
  9. Move the file someBird from its current location to coconut-like.
  10. Move (and rename) the directory coconut-like to /tmp/sketches/others
  11. Remove the directory /tmp/sketches/bird-like
  12. Try to remove the directory others with rm. Why does it fail?
  13. Remove the file someBird from its current location.
  14. Remove /tmp/sketches and all its subdirectories with rm. Before executing the command, make sure it does what you think it does. Use -v to see what rm does.
  15. Try to remove the file /tmp/parrot. Why does it fail?
  16. Assume you’re in the directory /home/shared. Give a single command to move the file /home/shared/Desktop/ideas.t to the directory /tmp. Use relative paths only.

5.6. Pirates!

You’ve seen how to create a file, view it, move it, rename it and delete it. What’s still missing? Surely, you don’t want to rewrite a file’s contents in case you want to give it to your fellow neighbour. Of course, we can copy a file! If you followed so far, copying isn’t harder at all. You use it similar to mv. And here’s the first example.

ls /tmp/cat
cp /bin/cat /tmp
ls /tmp/cat

Similar to mv, the copy target can either be a directory (like above) or a file. This controls what the destination file will be called. If no file is given, the original file name is used. See below an example of the second case.

_images/copy_cat.png
ls /tmp/ki*
cp /bin/cat /tmp/kitty
ls /tmp/ki*

So, how about directories? Well, you can try

cp /bin /tmp/programs

but it won’t work. Similar to rm, cp normally only handles files. If directories should be copied, the -r parameter has to be given. With this option, all subdirectories are copied as well.

cp -r /bin /tmp/progs
ls /tmp/progs
_images/copy_dir.png

5.6.1. Exercises

  1. Create the file /tmp/swallow with nano. Write some lines.
  2. Copy the file swallow to european-swallow
  3. Copy the file swallow to african-swallow
  4. Copy the file /tmp/asian-swallow to /tmp/american-swallow
  5. Copy the file african-swallow to /tmp/swallows/
  6. Copy from and to the file european-swallow. What message do you get?
  7. Assume you’re in the directory /home/shared. Give a single command to copy the file /home/shared/notes.t to the directory /tmp/foobar. Use relative paths only.

5.7. Summary

You’ve mastered another chapter about files. You know now how your system is organized and where to look for what. You understand how to keep your disk clean and how to react if this is not so. You can edit files and store them. You also know the other file operations, can move, delete and copy them. And the same for directories.

5.7.1. Exercises

  1. How can you distinguish text files from programs?
  2. Can you edit a binary file (e.g. a program) with nano?
  3. What argument do you normally use to include subdirectories?
  4. Create the directory /tmp/and/now/something/completely/different with a single command. Consult the man page of mkdir first.
  5. How does top know the memory statistics?
  6. Why do we use the /tmp directory so often in this tutorial?
  7. Which two entries do you always see in the listing for any directory?
  8. Give a reason for hidden files.
  9. How do you notice that you ran out of space?
  10. Why should you take care using ‘rm -r’? Why do we use it anyways?
  11. For the commands below, write the paths as short as possible. Decide yourself whether to use an absolute or a relative path.
  • user@host:/tmp$ ls /tmp
  • user@host:/home$ mv /home/michel/forall /tmp
  • user@host:/proc$ less /proc/meminfo
  • user@host:/proc$ less /home/michel/forall
  • user@host:/home/michel$ cp /home/michel/forall/notes.t /tmp/foobar.t
  • user@host:/usr/bin$ cp /home/michel/my_proc /usr/bin/my_proc
  • user@host:/var/log$ ls /home/michel
  • user@host:/usr/bin$ cp /home/michel/my_prog /bin/my_prog
  • user@host:/bin$ rm /bin/ls
  • user@host:/var/log$ cat /var/../usr/share/../../boot/vmlinuz
  1. In the following commands, identify all paths
    • ps -ef
    • cp notes.t ../foobar.t
    • rm foobar/foo.bar
    • whoami
    • mv /home/michel/texts.odf examples
    • less /etc/passwd
    • chmod 743 ../../shared/
    • top
    • chown root nobody
    • ps -u
  2. What kind of files are located in /dev?
  3. Where is the main executable of soffice stored?
  4. What do we use the path /etc for? What kind of files are stored there?

5.8. Cheatsheet

  • whereis COMMAND
  • cat [-A] FILE
  • find [PATH] [-iname EXPRESSION] [MORE CRITERIA]
  • df [OPTIONS] [-h]
  • du [OPTIONS] [-h] [-d DEPTH] [PATH]
  • nano [OPTIONS] [FILE]
  • mv [OPTIONS] SOURCE DESTINATION
  • rm [OPTIONS] FILE
  • mkdir [OPTIONS] DIRECTORY
  • rmdir [OPTIONS] DIRECTORY
  • cp [OPTION] [-r] SOURCE DESTINATION
[1]Generating random numbers on a computer is a surprisingly tricky thing. Actually, it’s not only hard for a computer but also for humans, but that’s another story. Computers are usually deterministic, meaning that if you run the same program twice with exactly the same input, the output will also be identical. Which makes it really impossible to create a program (without input) which generates a different sequence very time it runs. It is possible to get true random numbers from the outside (e.g. delays in networks), but this method is very slow. That’s why there’s also the pseudo-random number generator (PRNG). It’s a deterministic program (as described above) which creates a random-looking sequence, based on a single input (the seed). In consequence, if you know the seed, you can generate the same sequence again. An PRNG can create may almost random numbers very quickly. What’s often done is to use a single true random number for the seed and then get the rest from the PRNG. How is this related to the devices? You’ve seen the /dev/urandom device. This is the PRNG. Then, there’s also /dev/random. That would be the true random numbers. Because true random numbers are scarse, you shouldn’t cat from /dev/random.
[2]This error occurs more often than one would like to admit. When you log in, some data is written, which is of course not possible if there’s no more disk space. Instead of telling you so, the system just aborts and brings you back to the login screen. This behaviour is somewhat obscure and the root cause easily overlooked, although quickly fixed. You should still be able to login via TTY (see next chapter). NB: If you cannot resolve the issue like this, check file permissions (also next chapter).