Linux command line
An introduction to the Linux command line for genomics
Susan Fairley
Aims
• Introduce the command line
• Provide an awareness of basic functionality
• Illustrate with some examples
• Provide some information on how to find out more
What we will not achieve
• Immediate proficiency in the command line
– As with learning a language, it takes time and use
• A comprehensive survey of the command line
– There is a vast array of commands, this session can only cover a small fraction
Format
• Series of short talks followed by exercises
• Should not need to listen and type at the same time
• Suggested reading listed at end
Overview
• Introduction to Linux
• Navigating the filesystem
• Basic commands
• Linking commands and directing output
• Additional commands
• Shells and shell scripts
Introduction to Linux
• What is Linux and what is an operating system?
• Unix and Linux – what’s the difference?
• Why consider Unix/Linux?
• The command prompt
Linux is an operating system
• Operating systems enable applications and users to make use of computer hardware
User
Applications
Operating system
Hardware
Operating systems
• Operating systems act as resource managers for the machine on which they are installed
• They wrap, and provide access to, hardware functionality
• The OS kernel controls the hardware
• Access to kernel services is provided to higher level applications and system utilities via
system calls
Operating systems and shells
• Applications and system utilities can be started via a shell or GUI
• A shell is a textual command line interface
• A variety of shells, with slightly different features, exist
• Examples of shells include bash, bourne, csh and tcsh
• Using the shell can provide useful functionality
Unix and Linux
From http://www.doc.ic.ac.uk/~wjk/UnixIntro/Lecture1.html
Unix and Linux
• There are many variations of these systems
• They have some differences but many similarities
• Examples of Unix and Unix-like systems
include Sun Solaris, GNU/Linux and Mac OS X
• Popular Linux distributions (packaging a Linux kernel with system utilities, GUI and
applications) include Redhat and Debian, among others
Why Linux?
• Linux systems are commonplace in bioinformatics
• Large variety of software is developed by academic groups for these platforms
• Free and open source software
The command prompt
• The command line (shell) and GUI enable interaction with applications and system utilities
• A command prompt, where commands are entered at the command line, is accessed via software that provides a terminal window
The command prompt
The command prompt
• A terminal window can be opened when logged in to a machine
• Also options to open terminals on remote machines – ssh, telnet and PuTTY
• Today, we will use PuTTY to connect from the classroom Windows machines to a Linux
machine
Commands
• The command prompt
• 1) The command
• 2) Options
• 3) What the command is to run on
• Example: prompt$ command –option thing_to_run_on
Commands
• White space
• Quotes
• Special characters
• We’ll return to some of these topics later
• Typically best to avoid white space in file names
Important points
• No need to be afraid but…
• Type with care
– You will NOT be asked if you really mean it
• Some commands are powerful and can remove many files at once
• Sometimes a command will run and run and run because something is wrong
• Can use Ctrl-C to kill processes in most cases
• If in doubt, ask!
In Exercise 1
• Open PuTTY
• Connect to a remote machine
• Copy material to the remote machine
Exercise 1
Overview
• Introduction to Linux
• Navigating the filesystem
• Basic commands
• Linking commands and directing output
• Additional commands
• Shells and shell scripts
Navigating the filesystem
• Where am I?
• What is here (and permissions)?
• Moving around
• Searching
Navigating the filesystem
• In Exercise 1 we used some commands
• ls listed the contents of the directory
• tar unpackaged linux_course.tar.gz
• These commands followed the pattern we described earlier
• Command –option thing_to_run_on
• The command is given along with any necessary additional information
Navigating the filesystem
• Now we are going to look at commands related to navigating the filesystem
• At any point in time, the command prompt is somewhere within the filesystem
• The filesystem is similar to the directory
structure you will be familiar with in graphical interfaces, where you navigate by clicking on folders and documents
Navigating the filesystem
Navigating the filesystem
Where am I?
• The current directory is also called the working directory
• pwd – print working directory
• Gives the path from the top of the file system (or root) to the current directory
• Root can be written as /
[s08sf2@login1(maxwell) ~]$ pwd /users/s08sf2
What is here?
• We’ve already used ls
• ls – lists the contents of the current directory
• We used the simple form of ls
• Options can also be specified, including –l or combinations of options such as –lh
• -l gives the long version of output and –h
converts file sizes to “human-readable” form
What is here?
[s08sf2@login1(maxwell) spades_Dec14]$ pwd /users/s08sf2/ken_forbes/spades_Dec14
[s08sf2@login1(maxwell) spades_Dec14]$ ls
mv.sh process_quast.pl quast_summary.txt
notes.txt quast.sh spades_listeria_Dec14.sh
What is here?
[s08sf2@login1(maxwell) spades_Dec14]$ ls -l total 136
-rw-r--r-- 1 s08sf2 clsm 50028 Dec 10 16:53 mv.sh
-rw-r--r-- 1 s08sf2 clsm 45948 Dec 12 14:55 notes.txt
-rw-r--r-- 1 s08sf2 clsm 1932 Dec 10 16:51 process_quast.pl -rw-r--r-- 1 s08sf2 clsm 634 Dec 10 16:13 quast.sh
-rw-r--r-- 1 s08sf2 clsm 23025 Dec 10 16:53 quast_summary.txt
-rw-r--r-- 1 s08sf2 clsm 977 Dec 10 11:19 spades_listeria_Dec14.sh
[s08sf2@login1(maxwell) spades_Dec14]$ ls -lh total 136K
-rw-r--r-- 1 s08sf2 clsm 49K Dec 10 16:53 mv.sh
-rw-r--r-- 1 s08sf2 clsm 45K Dec 12 14:55 notes.txt
-rw-r--r-- 1 s08sf2 clsm 1.9K Dec 10 16:51 process_quast.pl -rw-r--r-- 1 s08sf2 clsm 634 Dec 10 16:13 quast.sh
-rw-r--r-- 1 s08sf2 clsm 23K Dec 10 16:53 quast_summary.txt
-rw-r--r-- 1 s08sf2 clsm 977 Dec 10 11:19 spades_listeria_Dec14.sh
Permissions
• First character is file type: - for file, d for directory
• Then groups of three characters describing permissions for the user (u), group (g) and others (o)
• Each set of three characters is read, write and execute
• r = read, w = write, x=execute, -=no permission
• Here, we have a file (not a directory) where the owner can read and write, the group and all users can read and nobody can execute the file
-rw-r--r--
All usersGroup Owner
File type
Permissions
• Linux cares about permissions
• Permissions (including who the owner of a file is) can, in some cases, get carried over when moving or copying files
• Permissions can be changed using chmod
chmod
• chmod has various ways in which it can be used
• We’ll look at one where a number is supplied for each of user, group and other
• How do we know what number to supply for each category?
chmod
--- 0
--x 1
-w- 2
-wx 3
r-- 4
r-x 5
rw- 6
rwx 7
chmod
• chmod 600 private_file.txt
• chmod 777 everything_file.txt
• chmod 644 my_rw_otherwise_read.txt
Moving around
• We’ve said the command prompt is in a directory
• How do we change that directory?
• cd – change directory
• cd directions/to/where/we/want/to/go
Moving around
• Start in our home directory
• Can return there using ~
• cd ~
• There are some other directories we can refer to easily
• . is the directory we are in
• .. is the parent directory of our current location (the level above where we are)
Moving around
• We can move to a directory by specifying it and its location relative to root or our current location
• cd ../linux_course/text_files
• cd /users/s08sf2/linux_course/text_files
• cd linux_course/text_files
• NB: once you have started typing the path, you can press tab to autocomplete
• NB: you can use the up arrow to get the previous command and then edit it
Exercise 2
• We’ve looked at how you establish what is in a directory and how to move about
• In exercise 2, we’ll use the contents of
linux_course to try out some of this material using pwd, ls and cd
• We’ll also try out chmod
Exercise 2
Overview
• Introduction to Linux
• Navigating the filesystem
• Basic commands
• Linking commands and directing output
• Additional commands
• Shells and shell scripts
Basic commands
• We’ve now used a few commands, including some with options and have the basic skills to navigate through the file system
• Now, we’ll look at some additional commands, enabling you to make directories, move files, copy files, remove files, view files and find
them
man
• man – manual
• You can use the man command by supplying the name of a command you want to see the manual entry for i.e. man ls
• The manual entry provides information on the command and its usage
• Information can also be found online
mkdir
• mkdir – make directory
• This creates the specified directory as a sub- directory of the current directory
• Multiple levels can be created at once using the –p option
• mkdir my_dir
• mkdir –p my_dir/new_dir/another_new_dir
cp
• cp – copy file
• cp existing.txt copy.txt
• cp existing.txt ../different_location/copy.txt
• We can also use the –r option to recursively copy a directory and all of its contents
• cp –r dir copy_of_dir
mv
• mv – move
• Moves instead of copying
• Can be used to rename something in the same location
• mv old_name.txt new_name.txt
• mv old.txt ../new_location/new.txt
• Can be applied to files and directories
rm
• Type with care
• rm – remove (this means delete, and it will NOT move it to trash)
• rm file_to_remove.txt
• Need –r option to remove a directory because you must also remove any contents
• rm –r directory_to_remove
less
• less can be used to view files
• When viewing file press q to quit
• With less, can search the file using / and then typing pattern to search for
• less shakespeare/romeo_and_juliet.txt
• Also head, tail and more
wc
• wc – word count
• Counts the number of words in a file
• Has options that can be used, for example, to count the number of lines in a file
• wc –l file.txt
sort
• sort – does what it says
• By default, sorts lexicographically
• Can sort numerically and can output only unique lines
• sort file.txt
• sort –u file.txt
grep
• grep – general regular expression print
• grep options pattern files
• grep Juliet romeo_and_juliet.txt
• grep –r tide shakespeare
• Can supply patterns or regular expressions, which describe what to look for
• NB: we don’t have time to discuss regular expressions today
find
• find – search for things
• This command has many options
• find options path expression
• Path says where to look and expression what to look for
• find . –name going*
Exercise 3
• In Exercise 3 we’ll try out some of the commands we’ve just looked at
Exercise 3
Overview
• Introduction to Linux
• Navigating the filesystem
• Basic commands
• Linking commands and directing output
• Additional commands for genomics
• Shells and shell scripts
Linking commands and directing output
• So far, any output from our commands has been printed in our terminal
• However, we can redirect output to files or pipe it to another command
• | pipes output from one command to another
• > writes output to a file
• >> appends output to a file
stdout and stderr
• In most cases, output is written to standard output (stdout)
• Some errors are written to standard error (stderr)
• Both are, by default, written to the terminal
• We’ll look at redirecting stdout but stderr can also be redirected
Examples
• ls > dir_contents.txt
• ls | sort > sorted_dir_contents.txt
Exercise 4
• In Exercise 4 we’ll use the output of one command as input for another by piping
• We’ll also try redirecting output from standard out (stdout) to a file
Exercise 4
Overview
• Introduction to Linux
• Navigating the filesystem
• Basic commands
• Linking commands and directing output
• Additional commands
• Shells and shell scripts
Additional commands
• Many and varied commands can be used (including when handling genomic data)
• These are a few arbitrary examples
Grep for FASTA headers
• FASTA files have headers for each sequence
• Headers start with the character >
• grep ‘>’ proteins.fa
• Note the use of ‘’ around >, enabling the command line to differentiate from
redirection
Identify unique FASTA headers
• grep ‘>’ proteins.fa
• grep ‘>’ proteins.fa | wc –l
• grep ‘>’ proteins.fa | sort –u
• grep ‘>’ proteins.fa | sort –u | wc –l
Retain part of FASTA header
• grep ‘>’ protein_2.fa
• >sp|P09922|MX1_MOUSE Interferon-induced GTP-binding protein Mx1 OS=Mus musculus GN=Mx1 PE=1 SV=1
• sed –e’s/>\(\S*\).*/\1/’
• -e execute
• s/substitute this/for this/
• \(\) capture the contents of the brackets (using \ to escape) and reuse the contents using \1
• \S non-whitespace, * match many times
Compare sorted lists
• comm – compares sorted lists
• Options -123
• -1 lines unique to file1, -2 lines unique to file2, -3 lines that appear in both files
• Also diff
• diff –y --suppress-common-lines file1 file2
Exercise 5
• In Excercise 5 we’ll work with some FASTA files
• We’ll try some of the examples that we’ve discussed
Exercise 5
Overview
• Introduction to Linux
• Navigating the filesystem
• Basic commands
• Linking commands and directing output
• Additional commands
• Shells and shell scripts
Shells and shell scripts
• We briefly discussed shells earlier in the session
• There are different shells that differ slightly in how they operate
• Often, you can identify the shell you are using by typing: echo $SHELL
• $SHELL is a variable
Scripts
• Scripts let us put together a sequence of commands that can then be run
• We can run a script by typing: source script.sh
• Source runs the commands in your current shell environment
• Alternatively, you can make the shell file (.sh) executable
Scripts
#!/bin/bash
echo Hello World
Scripts
#!/bin/bash echo Hello World
echo Goodbye World
Exercise 6
• In this exercise, we’ll look at scripts that run commands we’ve already discussed
• We’ll also review checking permissions to see if files are executable
Exercise 6
More information
• http://www.doc.ic.ac.uk/~wjk/UnixIntro/
• http://www.ee.surrey.ac.uk/Teaching/Unix/
• Also many books and online resources
Feedback
• Please complete and return the feedback form before you leave
Acknowledgements
• Naveed Khan
• Tony Travis
• Eduardo Alves
• Mel McCann