Saturday, February 16, 2013

Simple Monte Carlo simulation: spmd approach

In the past I've solved a Monte Carlo simulation that returned the area of a quarter of a circle (see figure below) using both a serial and a parallel code, taking advantage of the parfor loop (here). Today I will rewrite the same simulation once again, but I'll use spmd command this time.
Once again I won't lose time explaining the code. It's very simple and I'm sure no one will have doubts understanding it. Here's the original code:
hit = 0;
tic
for n=1:N
    point = rand(1,2);    
    if norm(point)<=1
        hit = hit+1;
    end
end
toc
If I run this command the script will take around 15/16 seconds to run. Let's try to beat that.
Rewriting the function will require more work than with parfor. With parfor we simply had to replace a word and the code ran smoothly. With spmd we have to explicitly (and manyally) tell what iterations are assigned to each lab. Well, let's get started, shall we?
To start let's open our Matlab pool, in my case with 4 workers, otherwise our work will be fruitless.
matlabpool open 4
Now we need to tell each available lab their task, which in this case will be the iterations they'll be responsible for. To do that we can use conditions as switch/case or if/elseif to tell each worker to take care of one quarter of the total iterations, N. It is the easiest way to go and the code is easier to interpret. Since it is so simple I'll leave it to you to write it. I, on the other hand, will write the same code but using only one for loop inside the spmd block, as follows:
N = 4000000;
tic
spmd
     for n=labindex:numlabs:N
         point = rand(1,2);
          if norm(point)<=1
             hit = hit + 1;
          end
     end
end
toc
Since each lab has a different labindex and the step is constant the labs will take care of different quarters of the iterations. If you got confused by this I suggest you read this post.
As we run the script we see that it takes around 7 seconds, while the serial version takes around 15/16 seconds to run. This is a major improvement. It is about 2.2 times faster. It is also around the same time we got with parfor. As in our previous examples this ratio is constant, meaning that as we increase the dimension of the simulation the difference between the times of execution will also increase.

I won't lose much more time with this, as this code is not very challenging. In this problem there's really no need to use spmd. A parfor loop achieves identical results and is much simpler to implement.
As a general rule whenever you can use parfor you can also use spmd, it just isn't worth it most times.

Cheers!

Friday, February 15, 2013

Retrieving data from composite objects

After you execute some code inside an spmd block you're left with a composite object. Example:
spmd
    y = zeros(1,100);
    for n = labindex:numlabs:100
        y(n) = n;
    end
end
If you run this code you'll be left with two variables in the workspace, y and n. Let's focus on the variable y:
>> whos y
  Name      Size            Bytes  Class      
  y         1x4              1145  Composite
 
As you can see we're dealing with a composite object. And what is that? A composite object is pretty similar to a cell array. It has as many elements as the number of labs and each element corresponds to the part of the variable y that is stored in the respective element.
>> y
y =
   Lab 1: class = double, size = [1  97]
   Lab 2: class = double, size = [1  98]
   Lab 3: class = double, size = [1  99]
   Lab 4: class = double, size = [1  100]
You can manipulate a composite object in the same way you deal with a cell array. Running the command y{1} will return the vector y as stored in the first lab.
y{1}
ans =
  Columns 1 through 21
     1     0     0     0     5     0     0     0     9 ...
In our case each lab stores a different portion of the variable. This could not be the case. Various labs can store the same portion of the variable, or no part of it at all.
If you want to know in which workers the variable exists just run the function:
>> exist(y)
ans =
     1     1     1     1
The function exist returns 1 when the variable exists in the respective lab, and 0 otherwise.

There's not much left to say. If you are familiar with cell arrays then then I'm sure you haven't learned anything new. If you didn't then I hope this was enough to get you started.

Cheers. 

Wednesday, February 13, 2013

Command spmd: How and When

Today I'll talk about the spmd command.
The spmd command is not as easy to implement as parfor but it offers the programmer more control over the code. In comparison to the parfor loop we can highlight that spmd lets us:
  • Define which tasks are assigned to each individual worker. 
  • Control how data is shared between the workers. 
  • See the work of each individual worker. 
  • Determine which loop iterations are assigned to each individual worker. 
  • Assign not only different data but also different tasks to different workers. 
  • Work with distributed arrays.
To put it in a simple way the spmd command allows us to run different programs, or the same program with different inputs. When using this function each worker has a different ID, which you can check by running the command labindex inside an spmd block.
NOTE: You can also test this function by opening the parallel command window. In this interface you can see how the commands run in each worker. You can access it by typing pmode start (you can't have any Matlab pool open).
Another useful function is numlabs, which returns the number of workers available.

Let's try exploring the spmd command then. Any spmd block has the following syntax:
The spmd only tells matlab that each lab will run the code inside the block concurrently and independently, without communicating or sharing results with other workers. There are ways for the labs to communicate with each other, but the programmer must explicitly say that in his code. I'll cover it later.
So how do we run a loop? If we write a for-loop inside it we'll verify that each worker will run it identically.

NOTE: If you took the time to run this code you probably noticed that a composite object appeared in your workspace. Don't worry about it right now. I'll talk about them later.
This is not what we desire. We want each lab working on different iterations of the loop to cut down the time of execution to the minimum. The easiest way to do this is by writing conditions, taking advantage of either the switch/case or the if/elseif function. Below I wrote some random code using switch. The use of switch instead of if is a matter of preference. The performance is identical.

As you can see we have total control over which iterations are assigned to each lab.
We can also write a for loop in a more compact manner in the following way:

This way each lab will be assigned the iterations associated with labindex, labindex + numlabs, ... Since labindex is different for each worker and the step is equal every lab will take care of a different set of iterations.

I think the basics are covered. In my next post I'll write about the objects returned after an spmd execution (composite) and how to return data from them. I'll also write some examples on how to implement spmd.

Cheers.

Sunday, February 10, 2013

Example 2: parfor in a simple script

I've talked before about the parfor loop here and gave an example of its implementation in a simple Monte Carlo simulation here. In this second example we will see the effect of a parfor loop used on a simple function that gives back the number of prime numbers between 1 and N. As last time we will start by running the function serially:
When I run the command: compute(1,1100000)I get back the message:
>> Elapsed time is 12.940753 seconds.

Now let's try running the function in parallel. First we must open a Matlab pool with the workers we want. To do that simply write:
matlabpool open i  % Opens a matlab pool with i workers.
After that we just need to substitute the for in line 11 by a parforRunning the command compute(1,1100000)again I get the message:
>> Elapsed time is 3.713284 seconds.

That's a major improvement! Our parallel code is around 3.5 times faster than his serial equivalent! Let's plot the time difference and ratio of our function for different values of in2 and with in1 fixed ( in1= 1 ).


As we can see the ratio converges to a value around 3.5/3.6, which is pretty good. As we increase the number of simulations the difference between the duration of the simulations also increase, which makes this function a good subject of parallelization.

Obviously the time it takes for the function to run depends not only in our code but also in the processor we're using. Unless you have an identical processor you will achieve different time differences. However the ratio between a serial and parallel execution will always be the same!

I wrote the ratio as the time it takes for the parallel code to run divided by the time it takes for the serial code to run. However it is more common to do the inverse. We define the ratio between the execution time of the serial code (Ts) divided by the execution time of the parallel code with n workers (Tn) as the speedup of the parallel algorithm.
speedup = Ts / Tn
This is an important property of the code as it doesn't depend on the type of processors. as long as the number of processors is the same. Try to run the code above with 4 workers and I'm sure you will achieve a similar speedup.

If you feel like trying to implement the parfor loop in some code I suggest you try to write a code, both in series and in parallel, that tests Girko's circular law. Girko's circular law states the following:

You don't need to understand the math behind it. All you need to test is the value of
y = max(svd(randn(N))); 
for a big set of integers N. Try N between 1 and 800. The code is pretty simple. All you need is a loop and a vector y to store the values calculated in the loop. Write it in parallel and in series and find out the difference between of time between both executions and, more importantly, its speedup.

Example 1: parfor in a simple Monte Carlo simulation

In this post I will use a simple Monte Carlo simulation that calculates the area of a quarter of a circle (see the figure below), using both a for and a parfor loop. The script is pretty straight-forward and I'm sure that anyone interested in parallel programming will have no problems understanding it.


As you will see the code will be identical, except for the type of loop, but we will see a clear difference between the time it takes for the code to run. Let's start by the script using the for loop:

When I execute the code I get:
>> Elapsed time is 15.142193 seconds.

Let's try again substituting for by parfor. The code is pretty much the same:

Now, when executed, we get:
>> Elapsed time is 6.140027 seconds.

As you can see the code is now about 2.5 times faster! This is a huge improvement. As we increase the number of points (N) the ratio between the two times will also increase. This is a property known as scalability. Below is a plot with the number of points used, N, and the time it takes to run the simulation with for and parfor.

As you can see this script sees a great improvement with a simple use of parallel functions. In fact all Monte Carlo simulations can be improved with parallel code! We can conclude that parallel computing only starts to pay off when we have a large number of simulations to run. It also stands for reason that the more labs we use the faster the code will run. Finally I will post the plot of the ratio between the time it takes for the script to run in series and in parallel. This is probably the most relevant piece of information that measures how successful your parallelization is.

As we can see the ratio converges to a value between 2.3 and 2.4. The convergence of the ratio is not a surprise. In general to improve this ratio alterations to the code are necessary. In our case there isn't much we can do because of the simplicity of the script!

In my next post I will post another good use of parfor in an equally simple script.

Function parfor: How and When

The easiest (but not necessarily the best) way to make use of the Parallel Computing Toolbox is without doubt the parfor loop. This is due to it's implementation simplicity!
A parfor loop has the same syntax as a for loop. An example follows:

As you can see this loop is similar to the familiar for loop. The difference between them is that the parfor loop doesn't run in a serial order, which means that each lab will take care of a unique set of statements, associated with the variable n. Imagine we have two labs and the loop above. Then Matlab will order the lab 1 to take care of the statements when n equals 1,3,6,7,10 while the lab 2 will run the remaining statements in no particular order.

A few problems arise from the way parfor works. Whenever we have a dependency between iterations we can't use this function. Example:

If you tried to use parfor instead of for in the example above you would receive an error. Why ? Because the iterations are dependent of the iterations before. To calculate S(6) you need to know the value of S(5) and S(4), but because of the way parfor loop work there is no guaranties that the statements for n=6 won't be run before the statements for n=4. And even if it was run in a increasing order we would still have a problem. Each lab has a different memory that is used to store the variables they come across. This means that when we run a parfor loop each lab will store the values for the statements they run, and won't share them with the remaining labs until the loop is complete. Therefore, even if the loop was ran in an increasing order the lab assigned to n=6 might not have the value of S(5) or S(4) stored.

When trying to improve our code by substituting parallel commands, in this case parfor, we always have to make sure there isn't any kind of dependence between the iterations. If there is we should just leave that section of the code and move on to another one that might be easier to improve.

Other things you must be cautious about when dealing with parfor:
- You can't have nested parfor loops.
- You can't have the command break or return inside a parfor loop.
- You can't use the commands evalc, eval, evalin and assignin.

There are more errors that can arise from the use of parfor. I've simply listed some of the more common.

In my next post I will show a simple program and the respective run time with for and parfor to give you an idea of the advantages of parallelizing your code!

Saturday, February 9, 2013

Introduction to Parallel Computing

As you probably know Matlab  has a Parallel Computing Toolbox available that allows you to use multicore processors, GPUs and computer clusters to improve your scripts. In my next posts I will talk about the main parallel functions that come with this toolbox (parfor, spmd, batch) and how/when to properly use them.

For now I'll leave you with the description of the Parallel Computing Toolbox taken from mathworks.com
  
MathWorks 
Parallel Computing Toolbox™ lets you solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters. High-level constructs—parallel for-loops, special array types, and parallelized numerical algorithms—let you parallelize MATLAB®applications without CUDA or MPI programming. You can use the toolbox with Simulink® to run multiple simulations of a model in parallel.
The toolbox provides twelve workers (MATLAB computational engines) to execute applications locally on a multicore desktop. Without changing the code, you can run the same application on a computer cluster or a grid computing service (using MATLAB Distributed Computing Server). You can run parallel applications interactively or in batch.

Introduction

Hello fellow programmer.
This will be my first post ever in this blog, therefore I'll talk about what I pretend to achieve with it.
What are my goals? Not many really... I only want to kill some time, improve my english (feel free to correct me) and, hopefully, provide some helpful hints to fellow matlab users along the way.
If you do not know what Matlab is I suggest you visit immediatly www.mathworks.com. I promise you won't lose your time.

To quote them:  (taken from here)

MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java.
You can use MATLAB for a range of applications, including signal processing and communications, image and video processing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.