CodingLad
parallel computing

PRAM: Explanation and Example in Parallel Computing

PRAM: Explanation and Example in Parallel Computing
0 views
11 min read
#parallel computing

PRAM: Explanation and Example in Parallel Computing

When someone says "RAM" in the context of computers, most people immediately think of DRAM (Dynamic Random Access Memory), the primary memory used by systems to store data temporarily. However, there's another term that often gets mixed up: PRAM. But don't get confused! PRAM is not a hardware memory like RAM. Instead, PRAM (Parallel Random Access Machine) refers to a theoretical model used in parallel computing, particularly in the analysis of parallel algorithms.

In this blog post, we'll explore PRAM, its different memory models, and how it helps in the design and analysis of parallel algorithms. Let's start by understanding the basics.

What is PRAM?

PRAM is a theoretical model used in parallel computing to help understand how algorithms behave when executed by multiple processors simultaneously. It abstracts away the complexities of real-world hardware and instead provides a simplified view of parallel computation. The goal of PRAM is to allow computer scientists and researchers to analyze the performance of algorithms without worrying about low-level issues like hardware limitations, memory contention, or synchronization.

Think of PRAM as a tool for conceptualizing how parallel computations might work in an idealized environment where many processors are working in sync to solve a problem.

Key Features of the PRAM Model

The PRAM model is designed to be a simple yet powerful representation of parallel computation. It has the following key features:

1. Processors with Local Memory

PRAM assumes there are pp processors, each with its own local memory. These processors operate independently but can communicate via shared global memory.

2. Shared Global Memory

All processors have access to a single shared global memory. This memory is used to store data that can be accessed by all processors, but how processors interact with it needs to be carefully managed to avoid conflicts.

3. Unique Processor Identification (PID)

Each processor in the system has a unique identifier (PID) from 1 to pp. This allows the system to track and manage the actions of each processor during computation.

4. Synchronized Execution

All processors in a PRAM system operate synchronously, meaning they follow a global clock. In each cycle, all processors execute their instructions simultaneously. This synchronization is a core aspect of parallel computation in PRAM.

5. Constant-Time Memory Access

Memory access in PRAM is considered to take constant time, meaning that any processor can access any memory location in the same amount of time, regardless of the location's position in the memory.

Memory Contention in PRAM

A critical challenge in parallel computing is memory contention — the situation where multiple processors want to access the same memory location at the same time. To handle this, PRAM defines several models for managing memory contention. These models help ensure that memory access remains efficient and conflict-free.

Contention Resolution Models in PRAM

PRAM defines various memory models based on how processors can interact with memory. The following are the primary models:

EREW (Exclusive Read, Exclusive Write)

In the EREW model, only one processor can read or write to a specific memory location at any given time. This model eliminates memory contention entirely but is the most restrictive in terms of concurrency.

CREW (Concurrent Read, Exclusive Write)

The CREW model allows multiple processors to read from the same memory location simultaneously, but only one processor can write to the memory location at a time. This model strikes a balance between concurrency and memory safety.

CRCW (Concurrent Read, Concurrent Write)

In the CRCW model, multiple processors can both read and write to the same memory location concurrently. However, conflicts can arise when multiple processors try to write to the same memory location simultaneously. Various policies are used to resolve these conflicts:

  • Arbitrary CRCW: Any processor's write is accepted.
  • Priority CRCW: The processor with the highest priority writes its value.
  • Common CRCW: All processors must write the same value.
  • Sum CRCW: The values are summed together.

ERCW (Exclusive Read, Concurrent Write)

In the ERCW model, only one processor can read a memory location, but multiple processors can write to the same location simultaneously. This model is less commonly studied but is still part of the PRAM family.

Why PRAM Matters for Parallel Algorithms

The PRAM model simplifies the analysis of parallel algorithms by abstracting away hardware-specific concerns. Let's look at an example to see how PRAM can be used to analyze a common algorithm.

Example: Parallel Binary Search in CREW PRAM

Consider a sorted array of size nn and the task of searching for a key KK. In a sequential binary search, we would compare KK with the middle element and recursively search either the left or right half of the array. The time complexity of this operation is O(logn)O(\log n) in sequential execution.

But in a parallel environment, we can use multiple processors to speed up the search. Let's see how we might implement a parallel binary search using the CREW PRAM model.

Problem Overview

The goal is to search for a key KK in a sorted array of size nn using binary search.

Sequential Complexity: O(logn)O(\log n) comparisons

CREW PRAM Implementation

Algorithm Idea

The key idea is to utilize multiple processors to check multiple midpoints simultaneously, exploiting concurrent reads to quickly narrow down the search space.

Parallel Binary Search (CREW PRAM)

Initialize:

  • Set low=0low = 0, high=n1high = n-1
  • PP = number of processors

While (highlow+1)>P(high - low + 1) > P:

  1. All processors can read the current lowlow and highhigh bounds simultaneously.

  2. For each processor ii in parallel:

    • Compute:

      probei=low+iP1×(highlow)probe_i = low + \frac{i}{P-1} \times (high - low)

    • Read A[probei]A[probe_i] concurrently.

    • Compare A[probei]A[probe_i] with KK.

  3. Concurrent Comparisons (All processors write results to shared memory):

    • If any processor finds KK, return its index.
    • Otherwise, determine the new range for [low,high][low, high] based on the comparisons.

End While Loop:

When the range is small enough (less than or equal to the number of processors), perform a sequential binary search.

Understanding the While Loop Condition

The condition (highlow+1)>P(high - low + 1) > P ensures that the number of elements in the search interval is greater than the number of processors, PP. This is important because the algorithm relies on having enough processors to check multiple positions (probes) at once. If the number of elements in the current search range is less than or equal to PP, the search can be handled with a sequential binary search, as there wouldn't be enough "room" to distribute the probes.

Parallel Idea

Use multiple processors to check several candidate positions simultaneously.

This allows for faster shrinking of the search interval at each step.

Instead of the usual binary search where:

mid=low+high2mid = \frac{low + high}{2}

We choose PP probe positions inside the interval [low,high][low, high]:

probe0,probe1,,probeP1probe_0, probe_1, \ldots, probe_{P-1}

These probes are:

  • Evenly spaced.
  • Partition the interval into PP segments.

Why CREW PRAM Matters Here

Concurrent Reads

All processors can:

  • Read the lowlow and highhigh values simultaneously.
  • Read their own A[probei]A[probe_i] values concurrently.

CREW (Concurrent Read, Exclusive Write) allows:

Processors P0,P1,,PP1P_0, P_1, \ldots, P_{P-1} can all read values from A[x]A[x] or A[y]A[y] simultaneously without conflict.

This enables fast probing without read conflicts.

Example

low=0low = 0, high=15high = 15, P=4P = 4

Probes: [0,5,10,15][0, 5, 10, 15]

Each processor:

  • Reads A[probei]A[probe_i]
  • Compares A[probei]A[probe_i] with KK

All reads happen concurrently, which is compliant with CREW.

Example Walkthrough

ProcessorProbe IndexWhat it Does
P00Read A[0]=2A[0] = 2, compare with K=26K = 26
P15Read A[5]=15A[5] = 15, compare with K=26K = 26
P210Read A[10]=27A[10] = 27, compare with K=26K = 26
P315Read A[15]=40A[15] = 40, compare with K=26K = 26

What do these probes tell us?

From the comparisons, we know that:

A[5]<K<A[10]A[5] < K < A[10]

Therefore, KK must lie between indices 6 and 9.

New Range:

low=6low = 6, high=9high = 9

Instead of halving the range like in normal binary search, the search space has been reduced by 4 positions in one step.

Detailed Numerical Example

Let's walk through a complete example to see how the algorithm works step by step.

Setup:

  • Array size: n=16n = 16
  • low=0low = 0, high=15high = 15 (the full array)
  • Number of processors: P=4P = 4

Iteration 1 (Before entering the loop)

The condition to check is:

(highlow+1)>P(high - low + 1) > P

Substitute the values:

(150+1)>4(15 - 0 + 1) > 4

16>416 > 4(True)

So, the loop begins. We distribute the probes across the interval [0,15][0, 15] and check 4 positions at once. The processors read the values at positions:

  • probe0=0probe_0 = 0
  • probe1=5probe_1 = 5
  • probe2=10probe_2 = 10
  • probe3=15probe_3 = 15

Each processor compares the corresponding value with KK, and based on those comparisons, the range is updated.

Iteration 2 (New range after comparisons)

Suppose the comparisons give us the following result:

A[5]<K<A[10]A[5] < K < A[10], so the new range is between index 6 and 9.

Now the search interval has been reduced to [6,9][6, 9]. Let's check if the loop should continue.

New Range:

low=6low = 6, high=9high = 9

The new condition to check is:

(highlow+1)>P(high - low + 1) > P

Substitute the values:

(96+1)>4(9 - 6 + 1) > 4

4>44 > 4(False)

Now, since the number of elements in the range is less than or equal to the number of processors (P=4P = 4), the while loop stops.

At this point, we've successfully reduced the search space from the original range [0,15][0, 15] (16 elements) down to [6,9][6, 9] (4 elements) using the parallel PRAM algorithm. Now, a normal midpoint binary search is applied to this final small range [6,9][6, 9] to find the exact location of KK.

Key Insight: The PRAM algorithm's purpose is to quickly reduce the search space using parallel processors. Once the range becomes small enough (≤ PP elements), we switch to sequential binary search because:

  • The remaining range is small enough to search efficiently with a single processor
  • There's no benefit to parallel processing on such a small range
  • Sequential binary search on 4 elements is very fast

This two-phase approach (parallel reduction + sequential finish) combines the speed of parallel processing for large ranges with the simplicity of sequential search for small ranges.

Why Is This Useful?

In sequential binary search, there is only one probe per step, and the range is halved.

In parallel binary search, there are PP probes per step, so the search range shrinks by a factor of approximately PP.

This parallel approach leads to faster reductions in the search space, making the algorithm more efficient with a larger number of processors.

Summary

  • ✅ PRAM is a theoretical model for analyzing parallel algorithms, not a hardware memory type.
  • ✅ PRAM abstracts away hardware complexities to focus on algorithmic analysis.
  • ✅ Different PRAM models (EREW, CREW, CRCW, ERCW) handle memory contention differently.
  • ✅ CREW PRAM allows concurrent reads while maintaining exclusive writes.
  • ✅ Parallel binary search demonstrates how multiple processors can speed up algorithms.
  • ✅ The PRAM model helps computer scientists design and analyze efficient parallel algorithms.

PRAM is a powerful theoretical tool that enables researchers and developers to understand and optimize parallel algorithms, making it essential for modern computing systems that rely on parallel processing.