Mass-spectrometry Raw Data and File Formats

There are many different file formats for raw MS data, depending upon the manufacturer of the instrument. Several open, universal data formats for MS data have been developed, such as mzXML and the newer mzML. Typically, mass spectrometers do not generate data in these formats directly, but raw data can be converted to open formats using vendor-specific software.

About mzXML Files

A mzXML file is a XML-based (eXtensible Markup Language) file that uses Base-64 representation of the mz-intensity pairs to incorporate the large volumes of generated data into the XML format.

To visualize a mzXML File

    1. For an example mzXML file, right-click here, and click save link as.
    2. Optional: review the mzXML and its associated schema
    3. Open InsilicosViewer. If you have not installed InsilicosViewer, see Software Requirements in the Getting Started section.
    4. Click File Open, navigate to the saved mzXML file, and click Open.

    The two panes in the viewer represent the chromatogram (upper) and scan (lower) profiles respectively.
  1. Place the cursor in the chromatogram pane. The corresponding scan appears in the lower pane.
  2. Toggle between the base peak and total ion chromatogram views (Tools->drawing style)
  3. Find a parent ion scan and review its associated MS/MS scans.

Preparing MS data for Analysis

To simplify the identification process, mzXML files are often processed to simpler ASCII representations of the data, known as peak lists. There are several different peak list file formats (explained here). We will use the Mascot Generic Format (MGF) to analyze for peptides and proteins. The corresponding MGF file for the mzXML data file visualized above is here.

In the next section, we take peak list files and search sequence databases for matching peptides.

Further Analyses

To get familiar with file formats, compare the number of MS/MS spectra in an MGF file and its corresponding mZXML file.