Assembling whole phage genomes
Using sequencing data to assemble complete or near-complete phage genomes.
Assembling whole phage genomes
Slides: Assembling genomes
Slide material from the original manual is not embedded on this public page.
Activity 1: Command line basics
Rationale
We will learn the basics of how to use the terminal to give direct instructions to the computer.
Activity 1
- Follow along and take notes in your printed cheat sheet:
- Open the Terminal Preview app
- (Make the window half of the size of the screen so that you can see the Desktop)
- Write:
bash - Write
ls- What shows up in the Terminal?
- Take notes on your cheat sheet.
- Write:
mkdir myfolder- What showed up in the Desktop?
- Open
myfolder/by clicking on it on the Desktop. - Take notes on your cheat sheet
- Write:
cd myfolder- What changed in the terminal?
- Write:
ls- Why happened? Why?
- Write
mkdir myfolder2 - Write
ls - Write
touch hello.txt - Open
hello.txton the Sublime Text app. - Inside Sublime Text, write: “Hello world!”
- Save the file by going to File > Save. You can close Sublime Text now.
- Go back to the Terminal Preview
- Write
less hello.txt - Exit the
lessview by typing the letter:q - Write
rm text_file.txt - Write:
ls- What happened to
hello.txt?
- What happened to
- Write:
cd ..- Where did we go?
- Write:
ls - You can close the Terminal Preview app now.
Activity 2: Exploring your read files from the command line
Rationale
We will use sequencing read files and learn how to inspect them in the terminal.
Activity 2
- Put the read files for the exercise in a folder on your desktop.
- The folder should contain the paired read files for your sample.
- If the folder is zipped, open it first.
- Keep the folder name short and easy to type.
- Open the Terminal Preview app.
- Write
ls - Write
cd <name of your folder>- Here, replace
<name of your folder>with the name of the folder you downloaded!! - Instructions between angle brackets
< like this >mean that you have to write the name of the files YOU have.
- Here, replace
- Write
ls- Do you see your read files?
- There should be two, one that ends with
_R1.sub.fastq.gzand another that ends with_R2.sub.fastq.gz
- To make it human readable, write
gunzip -c <name_of_your_reads>_R1.sub.fastq.gz > reads_R1.fastq- (You don’t need to remember this one.)
- To see it, write:
less reads_R1.fastq- What do you see?
- How long is the first read?
- You can exit this view by writing
q
- To know how many reads are in the file, write:
echo $(cat reads_R1.fastq|wc -l)/4|bc- This is another “special command” to count the number of reads in our file. (You don’t need to remember this one.)
- How many reads are in one of your read files?
Activity 3: Assembling your genome with Unicycler
Rationale
We will use a software called Unicycler to assemble our reads into whole genomes.
Activity 3
- Go to the class computer
- Write
ls - Locate your folder, and enter the folder that contains your reads
cd <name of your folder> - Then write:
unicycler -h- This is the software’s “help”. It tells us how to use it.
- Scroll up and down to see the options.
- Now we will run the assembly. Write in a single line:
unicycler -1 <name_of_your_reads>_R1.sub.fastq.gz -2 <name_of_your_reads>_R2.sub.fastq.gz -o assembly- What are the options
-1and-2? - What is the
-ooption?
- What are the options
- Let your assembly run.
Activity 4: Looking at your assembled genome
Rationale
Once the assembler has run, you should have larger pieces of DNA sequence, which hopefully corresponds to a whole genome sequence.
Activity 4
- Use the assembly file produced for this exercise and keep it with your read files.
- Open the Terminal Preview app
- Write
bash - Go to the folder in the Desktop that contains your reads
cd <name of your folder> - Go to the folder that contains the results from the assembly:
cd assembly - Write:
ls- There, you will find many files, but the one we care about is
assembly.fasta
- There, you will find many files, but the one we care about is
- The assembly might contain more than one large piece of DNA. Let’s check how many it has:
grep -c "^>" assembly.fasta- How many pieces of DNA does your assembly contain?
- (This is another special command you don’t need to remember.)
- To extract only the first piece. Write
cat assembly.fasta | awk "/^>/ {n++} n>1{exit} 1" > contig1.fasta- (This is another special command you don’t need to remember.)
- Write:
less contig1.fasta- How long is the assembly you got?
- Exit this by typing
q - Open the
contig1.fastafile in the Sublime Text app. - Open the BLAST website
- Copy the sequence inside
contig1.fastaand paste it into the Query box of theBLASTwebsite. - Search by clicking “BLAST” and wait for the results.
- Look at the 5 best hits and make note about:
- Description
- Query Cover
- Percentage ID
- In your opinion, how similar is your phage to the previously known phages?
- Select one of the hits, and go to its genome record. Just like yesterday, try to find:
- How big is the genome?
- Is the genome DNA or RNA, linear or circular?
- What type of phage is it? (Siphoviridae, Podoviridae, Myoviridae, other?)
- What is the bacterial host?