Prefix Sums
Computing range sum queries on a fixed array
To better understand why Prefix Sums is such a useful technique, one must understand what a ‘query’ is. In simple terms, a query in computer science is the act of requesting information from a database. For example, asking a search engine like Google or Yahoo! would be considered a query. Computers may have to process thousands of queries at a time, so despite how quickly modern computers can run commands, there is always room for us humans to improve asymptotic runtime for queries. Prefix Sums are an incredibly efficient way to answer a very simple kind of query: Range Sum Query (RSQ). In the case of an RSQ, the database you are requesting information from is an array of integers, size N, and the information requested is the sum of all array values of a certain subarray (a contiguous portion of the original array). To avoid confusion, let’s define each query as two numbers, i and j, where 0 ≤ i ≤ j<N. We wish to sum all the elements from i to j inclusive. More formally, if our array is called arr, we want RSQ(i,j) to return Σarr[k].
If we are only asked one query, this problem can be solved by brute force. One can just run a simple linear scan through all elements from i to j and keep track of the sum on that range.
Code:
arr = [1,3,10,2,15,4,9,12]
def rsq(i,j): 
  ans = 0
  for k in range(i,j+1):
    ans += arr[k]
  return ans

print(rsq(3,6))
Output:
> 30
Code:
public class Main {  static int[] arr = {1,3,10,2,15,4,9,12};

  public static int rsq(int i, int j) {
    int ans = 0;
    for(int k=i;k<=j;k++) {
      ans += arr[k];
    }
    return ans;
  }

  public static void main(String[] args) {
     System.out.println(rsq(3,6));
  }
}
Output:
> 30
Code:
#include <bits/stdc++.h>
using namespace std;

int arr [8] = {1,3,10,2,15,4,9,12};

int rsq(int i, int j) { 
  int ans = 0;
  for(int k=i;k<=j;k++) {
    ans += arr[k];
  }
  return ans;
}

int main() {
  cout << rsq(3,6) << "\n"; 
}
Output:
> 30
However, sometimes we have to process multiple queries efficiently. Unfortunately, the approach used above is quite slow with many queries. It would have a O(NQ) runtime, where N is the size of the array and Q is the number of queries. If Q was large, our current algorithm would be far too slow.
Let's revisit the idea of prefix sums. What is a prefix? Well, it’s a subarray of an array that begins at index 0. A prefix sum is just the sum of a given prefix of an array. This begs the question: how do prefix sums help in calculating RSQs quickly? The answer is clear after one observation: We can represent RSQ(i,j) as the difference of two prefix sums. Below is a visualizer to help you understand this more easily.
pre-img-1.jpg
As you can see, by taking the prefix sum of the first 7 elements and subtracting the prefix sum of the first 3 elements, we get the desired value for RSQ(3,6), which is 30. This leads us to a simple equation for calculating RSQ(i,j).
RSQ(i,j) = prefix[j+1]-prefix[i]
One important thing to note is that we have indexed our prefixes by 1, instead of 0. Think about it. If we had indexed our prefixes at 0, how would we calculate RSQ(0,2)? The prefix value at 0 would be 1, and the prefix value at 2 would be 14. Taking the difference we’d get 13, which is incorrect! This is because we’re accidentally subtracting the 1 from our summation; indexing the prefix sums by 1 allows us to successfully compute RSQ(0,2). As seen in the image, we would have
RSQ(0,2) = prefix[2+1] -prefix[0] = 14-0=14
Now we know that prefix sums can correctly compute RSQ(i,j), but are they actually faster than the naive approach? Technically, no, they aren’t. It still takes O(N) time to calculate these prefixes, so the runtime would still be O(NQ). To speed up this process, we must use a technique known as precomputation. The key observation is to note that there are only N different possible prefix sums, whereas there are (N*(N+1))/2 subarrays. This means that before we process any queries, we can calculate the prefix sums in O(N) time and store these values in an array. These prefix sums can be calculated in linear time by using the fact that prefix[i] = prefix[i-1]+arr[i-1] (Take a look at the Dynamic Programming Article for more information on such a technique). Then, for each query, we can compute RSQs in constant time, since we already know all the values of each prefix sum! This leaves us with a runtime of O(N+Q)! Feel free to play around with the code below:
You can play around with all the code we've used in this article on Replit:
Code:
n = 8
arr = [1,3,10,2,15,4,9,12]
prefix = [0]*(n+1)

def precompute():
    for i in range(1,n+1):
    prefix[i] = prefix[i-1]+arr[i-1]

def rsq(i,j): 
    return prefix[j+1]-prefix[i]

precompute()
print(rsq(3,6))
print(rsq(0,2))
print(rsq(4,4))
print(rsq(2,4))
print(rsq(0,7))             
Output:
> 30
> 14
> 15
> 27
> 56
Code:
public class Main {
    static int[] arr = {1,3,10,2,15,4,9,12};
    static int n = 8;
    static int[] prefix = new int[n+1];
    
    public static void precompute() {
        for(int i=1;i<=n;i++) {
        prefix[i] = prefix[i-1]+arr[i-1];
        }
    }
    
    public static int rsq(int i, int j) {
        return prefix[j+1]-prefix[i];
    }
    
    public static void main(String[] args) {
        precompute();
        System.out.println(rsq(3,6));
        System.out.println(rsq(0,2));
        System.out.println(rsq(4,4));
        System.out.println(rsq(2,4));
        System.out.println(rsq(0,7));
    }
    }
    
Output:
> 30
> 14
> 15
> 27
> 56
Code:
#include <bits/stdc++.h>

using namespace std;

const int n = 8;
int arr [8] = {1,3,10,2,15,4,9,12};
int prefix [n+1] = {};

void precompute() {
    for(int i=1;i<=n;i++) {
    prefix[i] = prefix[i-1]+arr[i-1];
    }
}

int rsq(int i, int j) { 
    return prefix[j+1]-prefix[i];
}

int main() {
    precompute();
    cout << rsq(3,6) << "\n";
    cout << rsq(0,2) << "\n";
    cout << rsq(4,4) << "\n";
    cout << rsq(2,4) << "\n";
    cout << rsq(0,7) << "\n";
}
                            
Output:
> 30
> 14
> 15
> 27
> 56
Join our discord server for updates, announcements, future event planning, to ask questions, or just for fun!
Copyright ©2021 Howard County Hour of Code