9.10 Shell scripts
Finally, one can parallelize tasks at the level of the shell, even if the programs you write/run aren’t parallelized, using a tool like GNU Parallel (see below). Briefly, with GNU Parallel you can split a list of (ambarassingly parallel) tasks across multiple cores even if the program itself is serial in nature. See the GNU Parallel page below and the tutorial page for some examples. In our lab we use GNU Parallel to distribute subject-level brain imaging processing across multi- ple cores.
• GNU Parallel121
Exercises
E 9.1 Here is a serial version of a simple MATLAB program that iterates through a for loop, printing to the screen each time. Your task is to parallelize that for loop so that iterations are distributed among available compute cores. Note that this is an “embarrasingly parallell” problem, i.e. each iteration of the for loop is completely independent of the others.
Also note that you will likely not see the loop iterations appear in sequen- tial order, once the loop is parallelized. When operations are parallelized the order is essentially arbitrary from one run to another. This is why this method of parallelization is only appropriate for computations where each loop iteration is truly independent of the others.
for i=1:10
disp(sprintf('hello from iteration # %d', i)) end
E 9.2 Parallel bootstrap
As we will find out in my statistics course next semester, we can useran- dom resampling with replacementto simulate drawing samples from a pop- ulation, and we can use this method to quantitatively assess the probabil- ity of the null hypothesis in the context of an experiment, and/or some data.
Assume we have the following sample of data x:
x = [3 5 4 3 6 7 -1 2 -3 -4 2 -5 3 2 -1]
and a second sample, y:
153 9.10. SHELL SCRIPTS
The mean of the sample x is 1.533 and the mean of sample y is 2.933. Let’s compute a statistic we’ll calldsampwhich is the difference between means,
¯
y−¯xwhich in this case is equal todsamp=1.400.
The null hypothesis is that the samples were taken from a single popula- tion with the same mean, and sodpop =0.00, and any departure from this
in our sample statistic dsamp is due to random sampling error.
Question 1
Use bootstrapping to compute the probability of the null hypothesis, with one million bootstrap (re)samples. Write your code in the usual serial fash- ion, using for-loop(s).
Question 2
Rewrite your solution to question 1, and parallelize the bootstrap. In other words execute the one million bootstrap iterations in parallel, with the work spread over your available compute cores.
Links
97http://en.wikipedia.org/wiki/Parallel_computing 98http://en.wikipedia.org/wiki/Multithreading_(computer_architecture) 99http://en.wikipedia.org/wiki/Symmetric_multiprocessing 100http://en.wikipedia.org/wiki/Hyper-threading 101http://en.wikipedia.org/wiki/Supercomputer 102http://en.wikipedia.org/wiki/Tianhe-2 103https://www.sharcnet.ca/my/front/ 104http://aws.amazon.com/ec2/ 105http://en.wikipedia.org/wiki/Oracle_Grid_Engine 106http://en.wikipedia.org/wiki/Cluster_(computing) 107http://en.wikipedia.org/wiki/SETI@Home 108http://en.wikipedia.org/wiki/Folding@home 109http://en.wikipedia.org/wiki/Botnet 110http://en.wikipedia.org/wiki/Distributed_denial-of-service_attack#Distributed_ attack 111http://en.wikipedia.org/wiki/Grid_computing 112http://www.nvidia.com/object/tesla-workstations.html 113http://www.nvidia.com/object/cuda_home_new.html 114http://www.khronos.org/opencl/ 115http://www.mathworks.com/discovery/matlab-gpu.html 116http://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_ units 117http://www.mathworks.com/products/parallel-computing/ 118http://www.mathworks.com/help/distcomp/parfor.html 119http://www.mathworks.com/help/distcomp/getting-started-with-parfor.html 120http://www.mathworks.com/help/distcomp/examples/155 LINKS
121http://www.gnu.org/software/parallel/
10
Graphical displays of data
MATLAB comes with lots of built-in functionality for generating both 2-D and 3-D graphics. You will learn a collection of commands (mainlyplot()) that will enable you do generate plots and configure their appearance with just about any degree of precision you want. The great advantage of generating graph- ics purely based on scripted commands (as opposed for example to graphical, pointy-clicky approaches based on graphical user interfaces (GUIs)) is that you can easily and instantly re-generate graphics simply by re-running the script(s). In addition making changes to your graphics becomes a simple matter of tweak- ing commands and scripts.
Rather than writing notes that essentially duplicate the extensive online docu- mentation included with MATLAB, I will instead point you to some intial start- ing locations where you can learn about generating graphics in MATLAB.
Graphics in MATLAB123
Common Graphics Functions in MATLAB124(a graphical menu showing differ-
ent kinds of plots and how to generate them)
2-D and 3-D Plots125
Theplot()command126
Creating 2-D Graphs and Customizing Lines127
Add Titles, Axis Labels and Legends to Graphs128
Figures with multiple axes using subplots129
3-D Surface and Mesh Plots130
Shaded Polygons131
159
Exercises
E 10.1 Line PlotLet x be a vector:
[0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9]
and set y to be equal to sin(2*pi*x).
Generate a line plot that looks like this:
Smoother Line
Change Line Colour
Plot it again but make the line blue, and thick:
E 10.2 Bar Plot
Let y be a vector with the following values:
161
Generate the following bar plot:
Again with x labels
Plot it again but this time include labels on the x axis:
E 10.3 Multiline plot
x = [1,2,3,4,5] y1 = [1,2,3,4,4] y2 = [1,5,6,8,10] y3 = [5,4,2,2,1]
Generate a multi-line plot like this:
Add a Legend
163
E 10.4 Scatterplot
Let x be a vector starting at 1 and ending at 100. Let y be a vector equal to (x * 0.15) + N where N is a vector of random values chosen from a gaussian distribution with mean 0.0 and standard deviation 1.0.
Generate a scatterplot of x vs y:
Let z be equal to ((x * 0.05) + 2) + N where N is a vector of random values chosen from a gaussian distribution with mean 0.0 and standard deviation 1.5.
Generate a scatterplot of y and z vs x, including a legend:
E 10.5 Subplots
Let x be a vector starting at 1 and ending at 100. Let y be a vector equal to (x * 0.15) + N where N is a vector of random values chosen from a gaussian distribution with mean 0.0 and standard deviation 1.0.
Let z be equal to ((x * 0.05) + 2) + N where N is a vector of random values chosen from a gaussian distribution with mean 0.0 and standard deviation 1.5.
Generate a figure with scatterplots of (x vs y) and (x vs z) in separate sub- plots, stacked vertically:
165
horizontally
Generate the plot again but stack the subplots horizontally:
more complex arrangements
Generate the plot again with 3 subplots: one on the top, occupying the full width of the figure, in which you plot (x vs y) and (x vs z) overlayed, and
two on the bottom, in which you plot (x vs y) in one and (x vs z) in the other:
E 10.6 Histograms
Define a variable y that contains 1000 values drawn from a gaussian dis- tribution with mean 0.0 and standard deviation 1.0. Plot a histogram of y. For now just use the default settings for whatever function you find that produces a histogram.
167
Number of bins
Replot the histogram using 25 bins:
Bin width
E 10.7 Line of best fit
Let x be a vector starting at 1 and ending at 100. Let y be a vector equal to (x * 0.15) + N where N is a vector of random values chosen from a gaussian distribution with mean 0.0 and standard deviation 1.0.
Generate a scatterplot of (x vs y) and overlay on the plot, the line of best fit after fitting a linear model to the data:ˆy=β0+β1x.
169
Add a legend
Add the equation of the line
E 10.8 Surface plot
Define x and y each over a grid [-8,8] in steps of 0.5.
Definex=sin(r)/r
171 LINKS
Links
123http://www.mathworks.com/help/matlab/graphics.html
124http://www.mathworks.com/help/matlab/creating_plots/types-of-matlab-plots.html#
btjs9s4-1
125http://www.mathworks.com/help/matlab/2-and-3d-plots.html 126http://www.mathworks.com/help/matlab/ref/plot.html 127http://www.mathworks.com/help/matlab/creating_plots/using-high-level-plotting-functions. html 128http://www.mathworks.com/help/matlab/creating_plots/add-title-axis-labels-and-legend-to-graph. html 129http://www.mathworks.com/help/matlab/creating_plots/create-graph-with-subplots. html 130http://www.mathworks.com/help/matlab/surface-and-mesh-plots-1.html 131http://www.mathworks.com/help/matlab/geometry.html 132http://www.mathworks.com/help/matlab/graphics-objects.html
11
Signals, sampling & filtering
Whereas signals in nature (such as sound waves, magnetic fields, hand posi- tion, electromyograms (EMG), electroencephalograms (EEG), extra-cellular po- tentials, etc) vary continuously, often in science we measure these signals by
samplingthem repeatedly over time, at somesampling frequency. The resulting collection of measurements is adiscretizedrepresentation of the original contin- uous signal.
Before we get intosampling theoryhowever we should first talk about how sig- nals can be represented both in thetime domainand in thefrequency domain.
Jack Schaedler has a nice page explaining and visualizing many concepts dis- cussed in this chapter:
Seeing circles, sines and signals133
11.1 Time domain representation of signals
This is how you are probably used to thinking about signals, namely how the magnitude of a signal varies over time. So for example a signalscontaining a sinusoid with a periodTof 0.5 seconds (a frequency of 2 Hz) and a peak-to-peak magnitudebof 2 volts is represented in the time domaintas:
s(t) = ( b 2 ) sin(wt) (11.1) where 173
w= 2π
T (11.2)
We can visualize the signal by plotting its magnitude as a function of time, as shown in Figure 11.1.
Figure 11.1:Time domain representation of a signal.