## Training Courses

As part of its work with the Babraham Institute, the Bioinformatics group runs a regular series of training courses on many aspects of bioinformatics.

These courses are run regularly on the Babraham site but we are also able to come out and present them on other sites. You can see the list of current Babraham dates which are available, and you can contact us to discuss options for running courses on your site.

Where possible we also aim to make the material from our courses publicly available so that anyone who wants to can download them for their own use.

Below is a list of the courses we currently run. Where they are available there is a link to the training manual and course exercises.

#### Babraham Software

#### Core Bioinformatics Skills

#### Statistics

- Statistical Analysis using GraphPad Prism
- Statistical Analysis using SPSS
- Statistical Analysis using R
- Sample size estimation and experimental design
- Statistics bootcamp using R
- Statsitics bootcamp using GraphPad Prism

#### Application focussed courses

- RNA-Seq Analysis
- ChIP-Seq Analysis
- Analysing bisulfite methylation sequence data
- Extracting biological information from gene lists
- Quality control in Sequencing Experiments
- An Introduction to Mathematical Modelling

#### Comprehensive longer Bootcamp courses

- Introduction to R for Biologists bootcamp
- Introduction to NGS Analysis for Biologists bootcamp
- Introduction to Linux bootcamp

### Analysing Mapped Sequence Data with SeqMonk (One day)

SeqMonk is a program which can analyse large data sets of mapped genomic positions. It is most commonly used to work with data coming from high-throughput sequencing pipelines.

The program allows you to view your reads against an annotated genome and to quantitate and filter your data to let you identify regions of interest. It is a friendly way to explore and analyse very large datasets.

This course provides an introduction to the main features of SeqMonk and will run through the analysis of a couple of different datasets to show what sort of analysis options it provides.

## Course content

- What is SeqMonk
- Starting and configuring the program
- Creating a project and importing data
- Using the chromosome viewer
- Quantitating and Filtering Data
- Creating Reports
- Using Quantitation Pipelines
- Correcting and Normalising Quantitations
- Scaling analysis to larger studies
- Running statistical tests
- Interacting with external programs
- Exporting text and graphics

#### Course Material:

- Course Manual (pdf)
- Course Manual (docx)
- Course Exercises (pdf)
- Course Exercises (doc)
- Course Data (zip) [2.6GB]

### Statistical Analysis using R (One day)

Statistics are an important part of most modern studies and being able to effectively use a statistics package can help you to understand your results. This course provides an introduction to statistics illustrated though the use of the R language.

#### Course Content:

- Introduction to Power Analysis
- Qualitative and Quantitative Data Exploration
- Graphical representations
- Chi-square, Fisher's exact test, T-Test, ANOVA and correlation
- Choosing an appropriate analysis
- Interpreting analysis output

#### Course Material:

- Course Manual (pdf)
- Course Manual (docx)
- Course Slides (pdf)
- Course Slides (pptx)
- Course Data (zip 13kb)

### Statistical Analysis using SPSS (One day)

Statistics are an important part of most modern studies and being able to effectively use a statistics package can help you to understand your results. This course provides an introduction to statistics illustrated though the use of the friendly SPSS package.

#### Course Content:

- Introduction to SPSS
- Importing data from other software packages
- Preparing your data for analysis
- Getting to know your data
- Graphical representations
- Choosing an appropriate analysis
- Interpreting analysis output

#### Course Material:

### Statistical Analysis using GraphPad Prism (One day)

GraphPad Prism is a powerful and friendly package which allows you to plot and analyse your data. This course acts not only as an introduction to Prism, but also goes through the basic statistical knowledge which should allow you to make the most of your data.

#### Course Content:

- Introduction to GraphPad Prism
- Getting to know your data
- Graphical representations
- Choosing an appropriate analysis
- Interpreting analysis output

#### Course Material:

- Course Manual (pdf)
- Course Manual (docx)
- Course Slides (pdf)
- Course Slides (pptx)
- Exercises (pptx)
- Exercises (pdf)
- Course Data Files (zip)

### Sample Size Estimation and Experimental Design (Short course)

Sample size estimation and experimental design (Short day course) Power analysis is used to estimate the appropriate sample size needed to detect biologically meaningful differences. It should be central to any experimental approach as samples provide the evidence on which scientists build the confidence they have in the results of their research. This course covers the basic principles of power analysis and experimental design. Basic statistical knowledge is useful but not compulsory.

#### Course Content:

- Experimental Design
- Choice of statistical approach
- Type of design
- Technical versus biological replicates
- Definition of Power

- Variables in Power Analysis

- Power analysis for
- Comparing 2 proportions
- Comparing 2 means
- Comparing more than 2 means
- Correlation
- Course Manual (pdf)
- Course Manual (docx)
- Course Slides (pdf)
- Course Slides (pptx)
- Course Exercises (docx)
- Experimental Design
- Sample size estimation: power analysis
- Descriptive statistics and data exploration
- Analysis of quantitative data
- Linear modelling
- Analysis of qualitative data
- Experimental Design
- Sample size estimation: power analysis
- Descriptive statistics and data exploration
- Analysis of quantitative data
- Linear modelling
- Analysis of qualitative data
- Getting Started with Perl
- Conditions, Arrays, Hashes and Loops
- File Handling
- Regular Expressions
- Subroutines, References and Complex Data Structures
- Perl Modules
- Interacting with External Programs
- Cross Platform Issues and Compiling
- Course Manual (pdf)
- Course Manual (doc)
- Course Exercises (pdf)
- Course Exercises (doc)
- Code used in the course (zip)
- What is R
- Getting familiar with the R console
- Entering Data
- Manipulating data
- Importing data files
- Creating Graphs (boxplots, barplots, scatterplots, line graphs)
- Course Manual (pdf)
- Course Manual (docx)
- Course Exercises (pdf)
- Course Exercises (doc)
- Intro to R Slides (pdf)
- Intro to R Slides (pptx)
- R command cheatsheet (pdf)
- Answers to exercise questions (html)
- Course data (zip)
- Filtering and selection review
- Text manipulation
- Merging large datasets
- Looping
- Using and writing functions
- R packages
- Documenting your analysis
- Course Manual (pdf)
- Course Manual (docx)
- Course Exercises (pdf)
- Course Exercises (docx)
- Course data (zip)
- The R painters model
- Core graph types and options
- Plot area customisation
- Using colour in plots
- Adding plot overlays
- Useful extension packages
- Writing plots to files
- Course Manual (pdf)
- Course Manual (docx)
- Course Exercises (pdf)
- Course Exercises (docx)
- Presentation Slides (pdf)
- Presentation Slides (pptx)
- Course data (zip)
- Exercise Answers (html)
- How Tidyverse works
- Collection of R packages
- Aims to fix many of core R's structural problems
- Common design and data philosophy
- Designed to work together, but integrate seamlessly with other parts of R
- Tibble
- ReadR
- TidyR
- DplyR
- Ggplot2
- Course Slides (pdf)
- Course Slides (pptx)
- Course Exercises (pdf)
- Course Exercises (doc)
- Final Exercise (pdf)
- Final Exercise (doc)
- ggplot Slides (pptx)
- ggplot Slides (pdf)
- ggplot2 Exercise (docx)
- ggplot2 Exercises (pdf)
- Course data (zip)
- Tidyverse course final exercise (html)
- Tidyverse exercise answers (html)
- Unix commands
- Files and Directories
- Viewing, Creating, Copying, Moving and Deleting
- Permissions
- Pipes
- Course Manual (pdf)
- Course Manual (doc)
- Unix cheat sheet (pdf) (External content from Tufts University)
- The theoretical basis for BS-Seq
- Processing raw sequencing data with Bismark
- Visualisation and exploration of methylation calls with SeqMonk
- The theory of differential methylation calling
- Differential methylation analysis practical
- BS-Seq data processing lecture (pptx)
- BS-Seq data processing lecture (pdf)
- BS-Seq data processing exercises (docx)
- BS-Seq data processing exercises (pdf)
- Visualisation and exploration lecture (pptx)
- Visualisation and exploration lecture (pdf)
- SeqMonk tools for methylation analysis (pptx)
- SeqMonk tools for methylation analysis (pdf)
- Visualisation and exploration practical (docx)
- Visualisation and exploration practical (pdf)
- Differential methylation lecture (pptx)
- Differential methylation lecture (pdf)
- Differential methylation practical (docx)
- Differential methylation practical (pdf)
- Data for all practicals (tar) [WARNING 9GB]
- Course VirtualBox Machine Image (ova) [WARNING 9GB]
- Functional databases
- Statistical test for testing functional enrichment
- Common artefacts in functional analysis
- Presenting functional analysis in publications
- Motif detection tools
- Introduction to Gene Set analysis lecture (pptx)
- Introduction to Gene Set analysis lecture (pdf)
- Gene Set analysis practical (docx)
- Gene Set analysis practical (pdf)
- Artefacts and Biases Lecture (pptx)
- Artefacts and Biases lecture (pdf)
- Exploring and Presenting Results Lecture (pptx)
- Exploring and Presenting Results Lecture (pdf)
- Quantitative Gene List practical (docx)
- Quantitative Gene List practical (pdf)
- Motif Searching lecture (pptx)
- Motif Searching lecture (pdf)
- Motif Searching practical (docx)
- Motif Searching practical (pdf)
- Data for all practicals (zip) [350 MB]
- The theory of ChIP-Seq analysis
- Processing ChIP-Seq data
- Exploring and Visualising ChIP-Seq data
- Analysing for peak calling and differential enrichment
- ChIP-Seq Introduction and Theory (pptx)
- ChIP-Seq Introduction and Theory (pdf)
- ChIP-Seq Data Processing (pptx)
- ChIP-Seq Data Processing (pdf)
- Processing ChIP data exercise (docx)
- Processing ChIP data exercise (pdf)
- ChIP-Seq Data Exploration (pptx)
- ChIP-Seq Data Exploration (pdf)
- Exploring ChIP data exercise (docx)
- Exploring ChIP data exercise (pdf)
- ChIP-Seq Data Analysis (pptx)
- ChIP-Seq Data Analysis (pdf)
- Analysing ChIP data exercise (docx)
- Analysing ChIP data exercise (pdf)
- Virtual box image (ova - password='training') (8.2GB)
- The theory of RNA-Seq analysis
- Raw data QC
- Mapping RNA-Seq data with hisat2
- Viewing RNA-Seq data with SeqMonk
- Differential expression analysis with DESeq
- Reviewing and visualising differential expression hits
- Analysing more complex multi-condition studies
- Course Presentation (pptx)
- Course Presentation (pdf)
- Practical instructions (docx)
- Practical instructions (pdf)
- Multi-Condition Exercise (docx)
- Multi-Condition Exercise (pdf)
- Yeast data for mapping (tar.gz) (470MB)
- Mapped mouse data for seqmonk (zip) (2.4GB)
- Virtual box image (ova password='training') (5.3GB)
- Why QC is important
- How sequencing experiments fail
- Implementing sequencing QC
- Existing QC software
- Course Introduction (pptx)
- Course Introduction (pdf)
- How sequencing experiments fail (pptx)
- How sequencing experiments fail (pdf)
- Failures in biological interpretation (pptx)
- Failures in biological interpretation (pdf)
- Developing and Implementing QC (pptx)
- Developing and Implementing QC (pdf)
- Course Data (zip) [9.3MB]
- An introduction to modelling
- An overview of chemical kinetics
- Mathematical modelling with COPASI
- Introduction to Modelling (pdf)
- Introduction to Chemical Kinetics (pdf)
- COPASI Modelling Tutorial (pdf)
- Course Data (zip) [9.3MB]
- Data Visualisation Theory Lecture
- Data Representation Practical
- Ethics of Data Representation Lecture
- Design Theory Lecture
- GIMP Tutorial
- GIMP Practical
- Inkscape Tutorial
- Inkscape Practical
- Final Practical
- Figure Design Slides (pptx)
- Figure Design Slides (pdf)
- Data Representation Practical (docx)
- Data Representation Practical (pdf)
- GIMP Tutorial (pptx)
- GIMP Tutorial (pdf)
- GIMP Practical (docx)
- GIMP Practical (pdf)
- Inkscape Tutorial (pptx)
- Inkscape Tutorial (pdf)
- Inkscape Practical (docx)
- Inkscape Practical (pdf)
- Exporting Files (docx)
- Exporting Files (pdf)
- Submitting to Journals (pdf)
- Submitting to Journals (pptx)
- Final Practical (docx)
- Final Practical (pdf)
- Figure Design Course Data (zip 15MB)
- Introduction to the core language
- Introduction to plotting and drawing graphs
- Introduction to basic statistical concepts and how to execute them in R
- Final Practical
- Introduction to R
- Advanced R
- Plotting complex figures with R
- An introduction to ggplot
- Statistical Analysis using R
- Basic Sequencing QC
- RNA Seq Analysis
- ChIP Seq Analysis
- Extracting Biological Information from Gene Lists
- Quality control in Sequencing Experiments
- RNA-Seq Analysis
- ChIP-Seq Analysis
- Extracting biological information from gene lists
- Install a Linux operating system on your machine, either directly or through a virtual machine
- Run and customise installed applications using the BASH shell
- Perform simple automation, linking programs together and iterating the processing of large numbers of files
- Install and configure new software packages
- Understand how to use Linux in a variety of environments from personal computers to cloud infrastructure

#### Course Material (to be updated):

### Statistics bootcamp using R (3.5 days)

A more in depth look at statistical analyses using R.

Prerequisite: Introduction to R.

#### Course Content:

#### Course Material (coming soon):

### Statistics bootcamp using GraphPad Prism (2.5 days)

A more in depth look at statistical analyses using GraphPad Prism

#### Course Content:

#### Course Material (coming soon):

### Learning to Program with Perl (6 x 1.5 hour sessions)

For a long time, Perl has been a popular language among those starting out with programming. Although it is a powerful language, many of its features make it especially suited to first time programmers as it reduces the complexity found in many other languages. Perl is also one of the world's most popular languages which means there are a huge number of resources available to anyone setting out to learn it.

This course aims to introduce the basic features of the Perl language. At the end you should have everything you need to write moderately complicated programs, and enough pointers to other resources to get you started on bigger projects. The course tries to provide a grounding in the basic theory you'll need to write programs in any language, as well as an appreciation for the right way to do things in Perl.

#### Course Content:

#### Course Material:

### Introduction to R (Half a day)

R is a popular language and environment that allows powerful and fast manipulation of data, offering many statistical and graphical options. This course aims to introduce R as a tool for statistics and graphics, with the main aim being to become comfortable with the R environment. It will focus on entering and manipulating data in R and producing simple graphs. A few functions for basic statistics will be briefly introduced, but statistical functions will not be covered in detail.

#### Course Content:

#### Course Material:

### Advanced R (Half a day)

This course follows on from the introductory course. It goes into more detail on practical guides to filtering and combining complex data sets. It also looks at other core R concepts such as looping with apply statements and using packages. Finally, it looks at how to document your R analyses and generate complete analysis reports.

#### Course Content:

#### Course Material:

### Plotting complex figures with R (Half a day)

This course is a comprehensive guide to the use of the built-in R plotting functionality to construct everything from customised simple plots to complex multi-layered figures. It follows on from the material in our introductory R course and participants are expected to have a basic understanding of R - enough to load and do basic manipulation of datasets.

#### Course Content:

#### Course Material:

### Tidyverse packages in R, including ggplot (Two days)

The 'Tidyverse' is a set of add-in R packages for data loading, modelling, manipulation and plotting. It is an attempt to make data analysis and plotting cleaner, simpler and more consistent by addressing some poor design decisions in the original language. The Tidyverse is perhaps best known because of the ggplot graphing library, but this is part of a wider environment which can be used to generate code which is more consistent, efficient and readable. It also provides a set of standards for how data should be represented in a flexible and consistent manner. Anyone who is comfortable with the core concepts in R (having attended the Introduction to R course for example) will find it useful to have an understanding of the 'tidy' approach to data manipulation in R. This course covers the core Tidyverse packages and concepts. Looking at how to load, restructure, filter and summarise datasets as well as the use of the ggplot plotting library for data visualisation.

#### Course Content:

#### Course Material:

### An Introduction to Unix (Half a day)

Increasing amounts of bioinformatics work is done in a command line unix environment. Most large scale processing applications are written for unix and most large scale compute environments are also based on this.

This course provides an introduction to the concepts of unix and provides a practical introduction to working in this environment. Internally we link this course to a more specific course illustrating the use of our internal cluster environment and this part of the course could be adapted for other sites with different compute infrastructure

#### Course Content:

#### Course Material:

### Analysing bisulfite methylation sequencing data (One day)

This course builds on the core skills introduced in the Introduction to R, Introduction to Unix and Introduction to SeqMonk courses to provide a more in depth look at the analysis of bisulfite sequencing data. The course is a mix of theoretical lectures and hands-on practicals which go through the whole analysis pipeline, starting from raw sequence data and covering QC, visualisation, quantitation and differential methylation analysis.

#### Course Content:

#### Course Material:

### Extracting biological information from gene lists (One day)

Many experimental designs end up producing lists of hits, usually based around genes or transcripts. Sometimes these lists are small enough that they can be examined individually, but often it is useful to do a more structured functional analysis to try to automatically determine any interesting biological themes which turn up in the lists.

This course looks at the various software packages, databases and statistical methods which may be of use in performing such an analysis. As well as being a practical guide to performing these types of analysis the course will also look at the types of artefacts and bias which can lead to false conclusions about functionality and will look at the appropriate ways to both run the analysis and present the results for publication.

#### Course Content:

#### Course Material:

### ChIP-Seq Analysis (One day)

This course provides a complete introduction to the theory and practice of the analysis of ChIP-Seq data. It is designed for biologists who may have limited practical bioinformatics skills, but who would like to use ChIP-Seq as part of their work. By the end of the course students should be able to process and analyse their own data.

Students on this course would benefit from having attended the SeqMonk or Unix introduction courses, but these are not required in order to attend.

#### Course Content:

#### Course Material:

### RNA-Seq Analysis (One day)

This course provides an introduction to the QC, processing and analysis of RNA-Seq data. It focuses on a workflow where RNA-Seq is performed on a large eukaryotic genome for which there is a reference genome available. The course starts with a comprehensive lecture covering the theory of RNA-Seq data generation and analysis and is then followed by hands-on practical sessions which run though the entire RNA-Seq analysis pipeline from raw fastq files to a list of differentially expressed candidate genes.

#### Course Content:

#### Course Material:

### Quality Control in Sequencing Experiments (Half a day)

This course looks at the different ways in which sequencing based studies can fail and the options for visualisation and QC which allow you to identify and diagnose these failures at an early stage. It is designed to be of use to anyone who is using sequencing as part of their research, not just those who are running sequencing facilities.

#### Course Content:

#### Course Material:

### An Introduction to Mathematical Modelling (Half a day)

This course was developed in collaboration with the Le Novère lab at The Babraham Institute. The course is not currently running and is not supported, but we are leaving course materials here for reference.

It provides an introduction to the concepts of modelling biological systems. It is intended for biologists who have no experience in modelling but would like to know how it might apply to their area of research. The course provides a complete background to the history of modelling and the different approaches through which a biological system can be approximated by mathematical methods. The course also provides a practical introduction to the COPASI modelling environment.

#### Course Content:

#### Course Material:

### Scientific Figure Design (Whole day)

This course provides a practical guide to producing figures for use in reports and publications. It is a wide ranging course which looks at how to design figures to clearly and fairly represent your data, the practical aspects of graph creation, the allowable manipulation of bitmap images and compositing and editing of final figures.

The course will use a number of different open source software packages and is illustrated with a number of example figures adapted from common analysis tools.

#### Course Content:

#### Course Material:

### Introduction to R for Biologists Bootcamp (3.5 days)

This Bootcamp for Biologists requires no previous experience. Over 3 1/2 days you will gain the practical experience to do your own analysis in R.

#### Course Content:

#### Course Material:

### Introduction to NGS Analysis for Biologists Bootcamp (3.5 days)

This Bootcamp for Biologists requires no previous experience. Over 3 1/2 days you will gain an introduction to sequencing analysis from the ground up. Understand, explore and analyse your data and interpret the results.

#### Course Content:

#### Course Material:

### Introduction to Linux Bootcamp (2.5 days)

This Bootcamp for Biologists requires no previous experience and will provide an understanding of the Linux environment. This 2 1/2 day course shows how to set up a working Linux environment; how you can install, configure and manage software and packages within it; how to run software and create basic, simple automation to enable execution in a more structured and scalable way.