Assignment overview

In this assignment you will use commonly-used tools and approaches for [transcript|prote]omics data analysis

Part I: Proteomics

This study used shotgun proteomics to study the proteomes of mouse embryonic stem cell sub-cellular compartments.

Whole embryonic stem cell lysates were fractionated into nuclear and cytoplasmic components using low-speed centrifugation. We then used 1-Dimensional gel-based fractionation to separate the nuclear and cytoplasmic samples into 5 fractions each (Song et al, 2012). The data provided here correspond to one each of these fractions for the nuclear (Nuc) sample and the cytoplasmic (Cyto) samples.

Assignment Tasks

  1. Data files and formats
  2. MS search engines
  3. Biological Interpretation

Assignment Assessment

This assignment focuses on using tools for proteomic and transcriptomic analyses. The assignment will be assessed according to:


Data Files

Search engine settings

Pre-computed search results

Database: SwissProt;

Taxonomy: Mus musculus

ESC_Nuc_fraction.mgf searched on X! Tandem

ESC_Cyto_fraction.mgf searched on X! Tandem


Song J, Saha S, Gokulrangan G, Tesar PJ, Ewing RM. DNA and chromatin modification networks distinguish stem cell pluripotent ground states. Molecular & cellular proteomics : MCP. 2012; 11(10):1036-47 (PubMed).


Part II: Transcriptomics

B. The Gene Expression Omnibus is a huge repository of gene-expression studies. For this task you will perform statistical analysis on a given study in GEO and identify differentially expressed genes Download some data from GEO analyze using GEOQuery identify differentially expressed genes take those genes and run through DAVID r similar Analyze using Gene Ontology make conclusions

Part III: Databases and Resources

A. The BioGRID database is a repository of protein interaction information. In this task you will query the biogrid database, extract interacting partners for your query protein and construct a network Starting with one of the query proteins, search Biogrid, extract interactors. For selected interactors search again and add these interactors to your list Filter the protein interactions according to the technique that was used to analyze them. Are there technique-specific interactors? Filter the PPIs according to confidence. Take your list of interacting proteins and visualize the network using cytoscape (use scripting or R to construct Sif) Compare this network to the information in STRING Are there more or less interactors... Compare this network to TMM network of co-expression, are any of the protein interacting partners also co-expressed? Use cytoscape load yeast alactose utilization network derive network properties (nodes, hubs, cluster) select 2 hubs - find out whether the two proteins are functionally related explore different modes of visualization use mcode to cluster - how many clusters find a hub node, identify and write about why it is important clustering coefficients EXTRA NOTES: PART 1 PROCESSING AND ANALYSING OMICS DATA A. Mass spectrometry proteomics data * A large survey of Affinity-Purification Mass-Spectrometry (AP-MS) experiments was performed in human cells (Ewing et al, 2007). * Interacting proteins were identified for ~400 different bait proteins * In this task, you will take some of the raw data (mass spectra) from this study and identify corresponding proteins * Selected mass-spectrometry files are provided. Each of these files corresponds to a single affinity-purification mass-spectrometry experiment, and the bait protein used is indicated by the first part of the file name, which is the Gene Symbol. * In addition, control experiments were performed so that non-specific interacting protens could be identified. These are the files with Gene Symbol "na". * Select one of the baits (EIF4A2, TCF1, CTNNBIP1, PSMD13) and search the mass-spectrometry files for that bait using Tandem X! search engine. In addition, search the control files. * Appropriate initial settings for the search engine are ... Take some proteomics data from a spreadsheet (epi / esc?) , compute statisitcs - identify differrential expression, do some analysis
Use scaffold with ap-ms file or protein expression file: using spectral counting identify those protens that are differential, contaminants, look at a spectrum - find something out take the proteins and do some analysis look at peptides, what is % coverage, count number of peptides for a given protein, which peptide was identified the most - is it tryptic