Mass-spectrometry peak list files

Data in this tutorial is mainly provided in the form of peak list files. Two peak list formats are commonly used, DTA, and MGF. A comprehensive description of these file formats ia available here. Both DTA and MGF are simple text based formats containing the mass spectra that will be used by the search engine to identify peptides and proteins.

Click here for an example DTA file. Each spectrum begins with the parent mass and charge and is followed by a series of pairs of numbers (fragment ion masses and their intensity)

Click here for an example MGF file. Each spectrum is recorded between BEGIN IONS and END IONS

Mass-spectrometry Raw Data and File Formats

There are many different file formats for raw MS data, depending upon the manufacturer of the instrument. Several open, universal data formats for MS data have been developed, such as mzXML and the newer mzML. Typically, mass spectrometers do not generate data in these formats directly, but raw data can be converted to open formats using vendor-specific software.

About mzXML Files

A mzXML file is a XML-based (eXtensible Markup Language) file that uses Base-64 representation of the mz-intensity pairs to incorporate the large volumes of generated data into the XML format.

To visualize a mzXML File

    1. For an example mzXML file, right-click here, and click save link as.
    2. Optional: review the mzXML and its associated schema
    3. Open InsilicosViewer. If you have not installed InsilicosViewer, see Software Requirements in the Getting Started section.
    4. Click File Open, navigate to the saved mzXML file, and click Open.

    The two panes in the viewer represent the chromatogram (upper) and scan (lower) profiles respectively.
  1. Place the cursor in the chromatogram pane. The corresponding scan appears in the lower pane.
  2. Toggle between the base peak and total ion chromatogram views (Tools->drawing style)
  3. Find a parent ion scan and review its associated MS/MS scans.

Preparing MS data for Analysis

To simplify the identification process, mzXML files are often processed to simpler ASCII representations of the data, known as peak lists. There are several different peak list file formats (explained here). We will use the Mascot Generic Format (MGF) to analyze for peptides and proteins. The corresponding MGF file for the mzXML data file visualized above is here.

In the next section, we take peak list files and search sequence databases for matching peptides.

Further Analyses

How many MS/MS spectra are represented in the MGF file? Compare this number to the number of scans represented in the mZXML file.