The command line allows you to input commands, such as creating folders, deleting and copying files and extracting information from files.
By the end of this notebook, you will...
sed
, awk
, and grep
to find replace text, fetch columns, and find words in filesYou start any terminal session in your "home area". View your "present working directory"
pwd
Your default home folder (also called $HOME
) is represented by the character alias ~
(tilde)
echo ~
Change directory
cd ~/Desktop
List all the files in the present working directory using
ls
ls .
Arguments for unix commands
man ls
Creating a folder
mkdir data
mkdir software
Change directory into data or software (tab complete or use Up and Down). [TAB]
means to press the tab key on your keyboard, not to write out the characters.
cd da[TAB]
Change back to the root directory from any subdirectory:
cd ..
Create an empty file
touch emptyfile.txt
Write some text in it
echo "hello world" > emptyfile.txt
Look at the contents of the file with cat
cat emptyfile.txt
Append to your file with >>
echo "I love bioniformatics" >> emptyfile.txt
Count the number of lines with wc -l
wc -l emptyfile.txt
Move or rename a file
mv emptyfile.txt notempty.txt
Copy a file
cp notempty.txt deleteme.txt
Delete a file
rm deleteme.txt
Create a pointer (symlink) to a file
ln -s notempty.txt pointer
Go to the UCSC Table browser and choose "position" to pick a single chromosome (chr10+) to save the knownGene table with "all fields from selected table" (should be the default) as knowngene.txt
.
Move knownGene.txt
to Desktop. What is the command?
(optional) Secure copy knownGene.txt to TSCC.
scp Desktop/knownGene.txt ewyeo@tscc-login2.sdsc.edu:.
less
and more
are other commands (besides cat
) you can use to look at the contents of files. How are they different?
See what's in the first n lines (in this case 10)
head -n 10 knownGene.txt
How many lines are in the file?
wc -l knownGene.txt
Check if it's indeed n lines (| command)
less knownGene.txt | wc -l
wc -l knownGene.txt
What's in the last n lines?
tail -n 10 knownGene.txt
Extract specific columns
cut -f
paste column1.txt column2.txt > 2columns.txt
How many genes have 3 exons?
grep -c 'REGEXSEARCHTERM' target.txt
How many genes have 1...max # exons?
sort | uniq -c
Which user are you logged in as?
whoami
What groups is that user associated with?
groups
What is the ownership status of all files in my current directory?
ls -lrt
Changing permissions
chmod 775
The three digits indicate the affected user subset:
The value indicates visibility encoded as a sum of octal numbers. For example, read + execute = 2 + 3 = 5. 775 or 755 are the most common permissions setups because then you the owner can do everything to your files, and maybe the rest of the group can, but the "all" or "world" can only read and execute your programs, but not overwrite them.
# | Permission | rwx |
---|---|---|
7 | read, write and execute | rwx |
6 | read and write | rw- |
5 | read and execute | r-x |
4 | read only | r-- |
3 | write and execute | -wx |
2 | write only | -w- |
1 | execute only | --x |
0 | none | --- |
Changing Files Recursively
chmod -R 777 Directory/
chmod -R o-rwx ~/
Changing executable nature of files
chmod +x
Scratch maintenance occurs every 90 days:
cd important_scratch_dir
find . | xargs touch
awk
¶awk
is a command-line tool to
Another way to extract all lines
awk -F "\t" '{print;}' knownGene.txt
What if we only wanted one column
awk -F "\t" '{print $8;}' knownGene.txt | head
What if we wanted the length of genes?
awk -F "\t" '{ len = $5-$4;} {print len;}' knownGene.txt | head
Length of all genes summed?
awk -F "\t" '{ len = $5-$4;} {tot = tot + len;} END {print tot;}' knownGene.txt | head
Don't process the header line (introduction to conditionals)
awk -F "\t" '{
if (FNR==1){
next
};
tot = tot + $5-$4};
END {print tot;}' knownGene.txt | head
What if you only want the total length of genes in chromosome 1?
awk -F "\t" '{
if (FNR==1){
next;
};
chr =$2;
if (chr == "chr1") {
tot = tot + $5-$4;
}
};
END {print tot;}' knownGene.txt