Bioinformatics Course Work

Bioinformatics 1


Click image to expand

Course Accomplishments

Learn fundamental concepts and methods in bioinformatics.
Survey a wide range of topics including how to learn programming for bioinformatics, setting up your computer laboratory, fundamental skills like using the Linux command line for computing, online databases, sequence format conversion and sequence manipulation tools, sequence homology searching and alignment.
Work with more advanced topics of next generation DNA sequencing data analysis and machine learning for bioinformatics.


The purpose of this unit was to:
Learn how to install and use bioconductor packages.
Introduction to S4 objects and functions, since most packages within Bioconductor inherit from S4.
Use a real genomic dataset of a fungus to explore the BSgenome package.

Partitioning the Yeast Genome

Yeast Genome-Seq Subsetting

The purpose of this unit was to learn alphabets and sequence manipulations by using the genome of a virus.

Manipulating Biostrings

Exploring the Zika Virus Sequence


The purpose of this unit was to learn how to use IRange and GenomicRange packages used for storing and manipulating genomic intervals and variables defined along a genome.
The dataset used was a gene found in the human genome.

Constructing IRanges

Constructing Iranges 2

From tabular data to Genomic Ranges

GenomicRanges accessors

Human genome chromosome x

ABCD1 Gene

Transcript Count


The purpose of this unit was to learn how to manipulate and assess fasta and fastq files. This included subsetting, trimming and filtering sequences of interest using plant genome sequences

Exploring a fastq file

Exploring sequence quality

Nuceotide frequency plot

Plotting cycle average quality


Cancer Genomics | Neural Networks vs k-NN Classifiers


Click image to expand

Course Accomplishments

Topics/Tools Covered:
Anaconda and Jupyter IDE
Machine Learning
Cancer Genomics
k-NN Classifier
Neural Networks
Deep Learning
mglearn


Introduction to Bioconductor


Click image to expand

Course Accomplishments

Install packages from Bioconductor by using the BiocInstaller package.
Practice techniques for reading, manipulating and filtering raw genomic data using BioStrings, GenomicRanges and ShortRead.
Work with BSgenome and TxDb built-in datasets. Then use these to identify patterns by using matching functions.
Check the quality of sequence files using ShortRead and Rqc.

Explored: Funji, Viruses, Humans and plants

Used BSgenome, Biostrings, IRanges, GenomicRanges, TxDB, ShortRead and Rqc with datasets from different species


The purpose of this unit was to:
Learn how to install and use bioconductor packages.
Introduction to S4 objects and functions, since most packages within Bioconductor inherit from S4.
Use a real genomic dataset of a fungus to explore the BSgenome package.

Partitioning the Yeast Genome

Yeast Genome-Seq Subsetting


Manipulating Biostrings

Exploring the Zika Virus Sequence




Introduction to Genomic Technologies


Click image to expand

Course Accomplishments

An introduction to the basic biology of modern genomics and the experimental tools used to measure it.
An introduction to the Central Dogma of Molecular Biology and how next generation sequencing can be used to measure DNA, RNA and epigenetic patterns.
An introduction to the key concepts in computing and data science needed to understand data fron next generation sequencing experiments.


Topics:
Just enough molecular biology
The genome
Writing a DNA sequence
Central Dogma
Transcription
Translation
DNA structure and modifications


Computing Technology

Topics covered include the foundations of computer science, algorithms, memory and data structures, efficiency, software engineering, and computational biology software.


This unit covered information about how to handle data produced during the sequencing process. This includes eproducibility, analysis, statistics, question types, the central dogma of inference, analysis code, testing, prediction, variation, experimental design, confounding, power, sample size, correlation, causation, and degrees of freedom.


Python Programs (From "Finding Hidden Messages in DNA" course)


Pattern Matching

This program finds all occurrences of a pattern in a string.

Click here for code


Reverse Complement

This program finds the reverse complement of a string.

Click here for code


Pattern Count

This program prints out how many times a pattern is found in a string of code

Click here for code


Frequent Words

Frequent Words finds the most frequent k-mers in a string.

Click here for code


Hamming Distance

Find the hamming distance between strings.

Click here for code


Tools (Used in various courses)

  • Blast
  • Clustal
  • Fastqc
  • Trimmomatic
  • STAR
  • SAMtools
  • SARtools