TCB No. 2012-005
March 2012
Technical Bulletin
GS FLX and GS FLX+ Systems
Configuration of Remote Data Processing Using
postAnalysisScript.sh
Summary
This Technical Customer Bulletin describes the intended use and configuration examples of the automation scriptpostAnalysisScript.sh, available for use with the GS FLX and GS FLX+ System. This includes how to extend usage of the script, in conjunction with a custom pipeline xml file, to enable multiple different types of automatic remote processing. Remote execution of the Amplicons pipeline and GS GType Assay Software Add-on are described, however the model is applicable to any type remote processing desired.
In general, this process will allow a new processing type to be selected when setting up a sequencing Run (e.g.
“Image Processing Only Amplicons”). The new processing type will trigger the automated data transfer to a GS FLX+ Computing Station (or any other DataRig or cluster) followed by remote execution of the configured pipeline.
Table of Contents
Summary
1
Workflow Overview
3
Pre-configuration Activities
4
Software versions
4
Password-less SSH communication
4
Login information for the remote computer
4
/data directory
4
The postAnalysisScript.sh file
4
A Text Editor
5
Example 1: Automating Shotgun Processing
6
Example 2: Automating Non-Shotgun Processing
7
Create a Custom Image Processing Only Pipeline
7
Create a Custom postAnalysisScript.sh File
9
Example 3: Automating GS GType Assay Software Add-on Processing
11
Create a Custom postAnalysisScript.sh File
11
Create a Custom Image Processing Only Pipeline
11
Workflow Overview
GS FLX or GS FLX+ Instrument GS FLX or GS FLX+ Instrument GS FLX+ Computing Station
The data processing workflow for the GS FLX and GS FLX+ Systems can be divided into three main steps: 1) The Data Acquisition step captures images for each nucleotide flow on the GS FLX or GS FLX+
Instrument. After all the images have been captured and the fluidics have completed, the script named backupScript.sh is executed. For usage guidance on backupScript.sh, refer to TCB 2012-006,
“Configuration of Data Backup Using backupScript.sh”.
2) The Image Processing step takes the raw images as input and generates an intermediate set of data files that will be the input for the Signal Processing step. The Image Processing step can technically be
computed on the GS FLX or GS FLX+ Instrument, the GS FLX+ Computing Station, or any other system running the 454 Sequencing System Software. The automation script postAnalysisScript.sh is only executed once the Image Processing step has completed. Since postAnalysisScript.sh is the trigger for remote processing, the automation procedures documented in this TCB therefore require that Image Processing be run on the GS FLX or GS FLX+ Instrument. If No Processing is selected on the
instrument, postAnalysisScript.sh will not be called.
3) The Signal Processing step is computationally intensive, and it is therefore not recommended to ever perform this step on the GS FLX or GS FLX+ Instrument. The postAnalysisScript.sh mechanism was developed to provide a convenient way to work around this fact.
As indicated, Image Processing is necessary in order to use the automation mechanism described in this document. There is no automation mechanism readily available for customers who choose not to do on-instrument Image Processing. Contact your Roche Customer Support representative for more information on how to automate both Image Processing and Signal Processing on a remote computer.
Data Processing Image Processing Data Processing Signal Processing Data Acquisition Instrument Camera
Pre-configuration Activities
The following activities or items are all required in order for the subsequent automation instructions to work properly.
Software versions
The methods described here are applicable to all currently existing versions of GS FLX and GS FLX+ System Software. However it is necessary that the version of software on the remote computer is the same as that on the Instrument. Furthermore, if the GS GType Assay Software Add-on is also to be executed remotely, then the version of the Add-on must be compatible with the version of the GS FLX or GS FLX+ Software in use. Depending on which Add-on you need to use, it may be necessary to upgrade your system. Refer to TCB 2012-003, “Installation of GS GType Assay Software Add-on v2.0”, for more details.
Password-less SSH communication
Unattended file transfer from the GS FLX or GS FLX+ Instrument is a necessary requirement for automating remote data processing. Refer to TCB 2012-004, “Configuration of Password-free SSH Access between a GS FLX or
GS FLX+ Instrument and a Remote Datarig or Cluster”, for instructions on how to set this up.
Login information for the remote computer
You will need the username, password, and IP address of the remote computer. You should already have these items, since they are also required for configuring password-less SSH communication.
/data directory
This document assumes that the remote computer has a /data directory within which the data will reside. This is the target directory for the data transfer from the Instrument. If no such directory exists on the remote
computer, create one by executing these two commands: mkdir /data
chown adminrig /data
Depending on how your remote computer is set up, you may need root access to execute the above commands. If this is the case, you may need to contact whoever set up your computer to obtain such access.
The postAnalysisScript.sh file
postAnalysisScript.sh is available for download from http://www.my454.com and is provided as an empty container to be customized as described later in this document.
1) Log in to the GS FLX or GS FLX+ Instrument as user adminrig.
2) Download the file (postAnalysisScript.sh.tar.gz) into the HOME directory (/home/adminrig). Any username will work as long as that user has permissions to run the pipeline on the remote computer. However, for simplicity the remainder of this document assumes the username for the remote computer is adminrig.
3) Start a terminal session by double-clicking on the “Terminal” icon:
4) Unzip the script by typing:
tar -zxvf postAnalysisScript.sh.tar.gz <Enter>
5) Back up any previously existing postAnalysisScript.sh executing the following commands:
cd /usr/local/rig/bin <Enter>
mv postAnalysisScript.sh postAnalysisScript.sh.old <Enter> 6) Place the newly downloaded script into the /usr/local/rig/bin directory (which is your current directory
by virtue of the previous step) by executing this command:
cp /home/adminrig/postAnalysisScript.sh postAnalysisScript.sh <Enter>
A Text Editor
The procedure for configuring automation requires manual editing of text files. Therefore, you will need to be comfortable with a text editor to use it for this purpose. Although any text editor will suffice, a convenient one that is included on the GS FLX and GS FLX+ Instruments is called nedit. To open a file for editing in nedit, type the following command:
nedit <path to file> <Enter>
After editing a file, be sure to save it before closing. With nedit, this is done with a typical File->Save menu option, followed by File->Exit.
This document uses the angle-brackets convention for specifying placeholders in files and commands. For a command to execute properly, you need to replace the placeholder with the information indicated by the placeholder text, which in this case is the path to the file you wish to edit. Throughout this document, file names, commands and parts of commands will be represented by placeholders that will need to be replaced.
Example 1: Automating Shotgun Processing
This is the simplest case and requires the least amount of configuration. If all you need to do is automate remote execution of the Shotgun Signal Processing pipeline, then follow these instructions.
1) Open /usr/local/rig/bin/postAnalysisScript.sh in a text editor. For example:
nedit /usr/local/rig/bin/postAnalysisScript.sh <Enter>
2) Line 26 of /usr/local/rig/bin/postAnalysisScript.sh looks like this:
DESTINATION_IP="w.x.y.z"
Replace the “w.x.y.z” with the IP address of the remote computer, making sure to retain the double-quotes. It should end up looking similar to this, with whatever numbers correspond to your machine:
DESTINATION_IP="11.22.33.44"
3) Line 57 of /usr/local/rig/bin/postAnalysisScript.sh looks like this:
#ssh $DESTINATION_USER@$DESTINATION_IP runAnalysisPipe $DESTINATION_PATH/$D_DIR
Delete the “#” symbol so that the line looks like this:
ssh $DESTINATION_USER@$DESTINATION_IP runAnalysisPipe $DESTINATION_PATH/$D_DIR
4) Save and close the file. The runAnalysisPipe part of the above command corresponds to the normal command for running the Shotgun Signal Processing pipeline on a dataset that has already been image-processed. The commands that actually copy the data from the Instrument to the remote computer do not need to be customized, and so they were not included in these instructions. (If you are curious, they are found on lines 50 and 53 of /usr/local/rig/bin/postAnalysisScript.sh). This concludes the procedure for configuring automatic remote execution of the Shotgun Signal Processing pipeline.
Example 2: Automating Non-Shotgun Processing
By default, there are two main pipelines included in the 454 Sequencing System Software: the Shotgun pipeline and the Amplicon pipeline. However, there is also the ability to make custom pipelines as well as to execute GS GType Assay Software Add-on modules after a pipeline has been run. Therefore, this section describes in general terms how to set up automatic processing for a pipeline of your choice. We will use Amplicon processing as an example, thereby superseding TCB 2010-003, “Setting up automatic Signal Processing for Amplicon experiment”. Incorporating Add-on processing will be discussed in the next section, “Example 3: Automating Add-on Processing”.
In “Example 1: Automating Shotgun Processing”, the only modifications that needed to be made were to edit postAnalysisScript.sh to include the correct IP address of the remote computer and to remove a comment character from the remote command (Steps 2 and 3 of that section). One limitation of postAnalysisScript.sh is that it can only execute one specific remote command (runAnalysisPipe, above). That is, there is no
parameter that can be passed to it in order to specify a different command. Therefore, in order to configure a different kind of remote processing (e.g. Amplicon processing) we need to do the following:
Create a copy of postAnalysisScript.sh with a unique name, i.e.postAnalysisScript<custom>.sh Edit the remote command to reflect the pipeline of choice, i.e.runAnalysisPipe<custom>
Furthermore, since the Image Processing pipeline is hard-coded to execute only
“postAnalysisScript.sh”, and NOT “postAnalysisScript<custom>.sh”, we also need to create a copy of the Image Processing pipeline that executes postAnalysisScript<custom>.sh. As we will see in the next section, we will call it imageProcessingOnly<custom>.xml.
Create a Custom Image Processing Only Pipeline
1) The default Image Processing pipeline configuration file is located the directory
/usr/local/rig/apps/gsRunProcessor/etc/gsRunProcessor/. Create a new copy of it using the following commands:
cd /usr/local/rig/apps/gsRunProcessor/etc/gsRunProcessor <Enter> cp imageProcessingOnly.xml imageProcessingOnly<custom>.xml <Enter>
In the Amplicons processing example, the second command would equate to:
cp imageProcessingOnly.xml imageProcessingOnlyAmplicons.xml <Enter> 2) Open the imageProcessingOnly<custom>.xml file in a text editor, for example:
nedit imageProcessingOnlyAmplicons.xml <Enter>
3) Near the top of the file is a block of text that should look similar to this, which is taken from the v2.6 Image Processing pipeline:
<info>
<id>ImageProcessingOnly</id>
<gsRunProcessorVersion>2.6</gsRunProcessorVersion> <displayText>Image processing only</displayText> <description lang="en">
This is the default pipeline for taking the raw images and making intermediate files that can be analyzed by
other pipelines. </description>
<displaySortOrder>10</displaySortOrder> </info>
a. Replace the <id> block with ImageProcessingOnly<custom>
b. Replace the <displayText> block with “Image processing only Amplicons”
c. Replace the <displaySortOrder> block with a unique number not present in any of the other pipeline files located in the /usr/local/rig/apps/gsRunProcessor/etc/gsRunProcessor directory. The modified file should now have an <info> block that looks like this:
<info>
<id>ImageProcessingOnlyAmplicons</id>
<gsRunProcessorVersion>2.6</gsRunProcessorVersion>
<displayText>Image Processing only Amplicons</displayText> <description lang="en">
This is the default pipeline for taking the raw images and making intermediate files that can be analyzed by
other pipelines. </description>
<displaySortOrder>11</displaySortOrder> </info>
These changes result in the custom pipeline being listed as a selectable pipeline, available for you to choose in the “Choose Run Processing Type” window of the Instrument Procedure Wizard, during Run set-up.
4) One additional edit is required to the imageProcessingOnly<custom>.xml file. Near the very bottom of the file, the last block in the file is the one that specifies which script to execute, which is
<postAnalysisScript>postAnalysisScript.sh</postAnalysisScript>
Replace the contents of this block with postAnalysisScript<custom>.sh, as shown here for our Amplicons example:
<postAnalysisScript>postAnalysisScriptAmplicons.sh</postAnalysisScript>
5) Save and close the imageProcessingOnly<custom>.xml file.
Create a Custom postAnalysisScript.sh File
Steps 1 through 4 of this section are very similar to the instructions provided in “Example 1: Automating Shotgun Processing”, but adapted for a custom postAnalysisScript<custom>.sh file instead of the default. The custom pipeline is specified in Step 5.
1) Assuming you have downloaded postAnalysisScript.sh into the /home/adminrig directory as indicated previously, execute the following commands to create a custom file that corresponds to the one you specified in Step 4 of the previous section:
cd /home/adminrig <Enter>
cp postAnalysisScript.sh /usr/local/rig/bin/postAnalysisScript<custom>.sh <Enter> 2) Open /usr/local/rig/bin/postAnalysisScript<custom>.sh in a text editor. For example: nedit /usr/local/rig/bin/postAnalysisScriptAmplicons.sh <Enter> 3) Line 26 of /usr/local/rig/bin/postAnalysisScript<custom>.sh looks like this:
DESTINATION_IP="w.x.y.z"
Replace the “w.x.y.z” with the IP address of the remote computer, making sure to retain the double-quotes. It should end up looking similar to this, with whatever numbers correspond to your machine:
DESTINATION_IP="11.22.33.44"
4) Line 57 of /usr/local/rig/bin/postAnalysisScript<custom>.sh looks like this:
#ssh $DESTINATION_USER@$DESTINATION_IP runAnalysisPipe $DESTINATION_PATH/$D_DIR
Delete the “#” symbol so that the line looks like this:
5) Replace “runAnalysisPipe” with runAnalysisPipe<custom>, which is the name of the pipeline you would like to execute, for example:
ssh $DESTINATION_USER@$DESTINATION_IP runAnalysisPipeAmplicons $DESTINATION_PATH/$D_DIR
This example is wrapped between two lines, but in the script itself all this text should be included on a single line. The remote pipeline command itself, runAnalysisPipeAmplicons, is part of the 454 Sequencing System Software, so it will already be present on the remote computer. We shall see in the next section that when automating Add-on processing this is not the case, which is why it will be our third and most complex example.
6) This concludes the procedure for configuring automatic remote execution of a non-Shotgun Signal Processing pipeline, using the Amplicons pipeline as our example.
Example 3: Automating GS GType Assay Software Add-on
Processing
In our final example, we describe how to configure automatic remote processing of the GS GType Assay Software Add-on. This is the most complex example because the processing requires an additional step to be run after Signal Processing. The process entails everything described in the previous section, “Example 2: Automating Non-Shotgun Processing”, with one major addition. In the previous section, it was assumed that the
runAnalysisPipe<custom> pipeline-launching script was already installed on the remote computer. Here, it needs to be created using components provided with the GS GType Assay Software Add-on package.
Create a Custom postAnalysisScript.sh File
Create a custom postAnalysisScript<custom>.sh file by following the instructions in the previous section, “Example 2: Automating Non-Shotgun Processing.” Adapt it in the following ways:
In Step 1, replace the “<custom>” placeholder with “GType<assay>”, where <assay> is another placeholder representing the name of the Assay for which you are automating processing. For the GS GType Leukemia Assay, for example, this would equate to postAnalysisScriptGTypeLeukemia.sh. For the GS GType HLA Assay, it would be postAnalysisScriptGTypeHLA.sh.
In Step 5, again replace the “<custom>” placeholder with “GType<assay>”. This would equate to runAnalysisPipeGTypeLeukemia for GS GType Leukemia.
Create a Custom Image Processing Only Pipeline
Create a custom Image Processing pipeline by following the instructions in the previous section, “Example 2: Automating Non-Shotgun Processing.” Adapt it in the following ways:
In Steps 1 and 3, again replace the “<custom>” placeholder with “GType<assay>”, where <assay> is another placeholder representing the name of the Assay for which you are automating processing. For the GS GType Leukemia Assay, for example, this would equate to
imageProcessingOnlyGTypeLeukemia.xml. For the GS GType HLA Assay, it would be imageProcessingOnlyGTypeHLA.xml.
In Step 4, remember to correctly specify the name of your custom postAnalysisScript<custom>.sh file, (e.g. postAnalysisScriptGTypeLeukemia.sh).
Create a Custom Pipeline-launching Script on the Remote Computer
The GS GType Assay Software Add-on comes with one Add-on script for each of the assays. Refer to TCB 2012-003, “Installation of GS GType Assay Software Add-on v2.0” (or newer, if one exists at the time you are reading this), for more information on the Add-on scripts and how to manually execute them. The GS GType Assay Software Add-on also comes with custom pipeline configuration files (XML) for each Assay. These pipeline configuration files will tell the pipeline to automatically run the appropriate Add-on script once the respective
Signal Processing is complete. However, one thing that the Add-on package does not provide is a set of scripts to launch those Assay pipelines analogous to runAnalysisPipe and runAnalysisPipeAmplicons for the default pipelines. In order to avoid having to execute the default Amplicons Signal Processing pipeline followed
separately by the Add-on script, and to therefore keep the remote automation procedure for Add-ons consistent with the procedures detailed in the previous Examples, it is necessary to create a wrapper script for the selected Assay that refers to the respective bundled Add-on XML files.
1) Log into the remote computer as root, or as a user that has sudo privileges. 2) Execute the following commands:
cd /opt/454/bin <Enter>
sed 's/_PIPE=/_PIPE=GType<assay>-/' runAnalysisPipeAmplicons >
runAnalysisPipeGType<assay> <Enter>
chmod a+rx runAnalysisPipeGType<assay> <Enter>
The second of these three commands is wrapped between two lines in this document, but the command itself should be typed on a single line. Note the positions of the placeholders that you need to replace with your custom identifier. Note also that the standalone “>” is not a placeholder and must be typed, as do all the other non-placeholder symbols, quotes and spaces. Keeping with Leukemia as an example, these commands would equate to:
cd /opt/454/bin <Enter>
sed 's/_PIPE=/_PIPE=GTypeLeukemia-/' runAnalysisPipeAmplicons >
runAnalysisPipeGTypeLeukemia <Enter>
chmod a+rx runAnalysisPipeGTypeLeukemia <Enter>
3) This concludes the procedure for configuring automatic remote execution of GS GType Software Add-on processing.
For life science research only. Not for use in diagnostic procedures.
License disclaimer information is subject to change or amendment. For current license information on license disclaimers for a particular product, please refer to https://www.roche-applied-science.com/new/legal/index.jsp?id=legal_000000.
454, 454 LIFE SCIENCES, 454 SEQUENCING, GS FLX, GS FLX TITANIUM, GS JUNIOR, EMPCR, PICOTITERPLATE, PTP, NEWBLER, REM, GS GTYPE, GTYPE, AMPLITAQ, AMPLITAQ GOLD, FASTSTART, NIMBLEGEN, SEQCAP, MAGNA PURE, and CASY are trademarks of Roche.