# Histogram based water volume calculation Algorithm

Here is the problem I was given to solve in an interview long time ago. During the interview – I had taken a hasty approach to solving the problem and used brute force.

However, later on I gave another good thought to the problem and found an elegant the solution using Recursive Divide and Conquer approach.

Assume you have a 3 dimensional histogram where the width of each bar is 1 unit and the depth is 1 unit too. However what varies is the height of each successive bar – depending on the data content.

If we pour some liquid into the structure what will be the volume of that liquid that can accumulate in this structure.

See the attached picture for more clarity on what is needed. ( Calculate the Area and hence Volume of the Red Area )

Here is my approach to solving this

So the input data set is a series of +ve numbers which represent the height of each bar. This is any random finite stream of +ve numbers – each of which represents the height of the bar.

Take this set of numbers and find the first 2 largest numbers.

These 2 numbers and the indexes at which they are located in the dataset represent the boundaries of 1 segment.

These 2 ends have now segmented the whole data set into 3 regions –

Example – if the index of these 2 largest elements in the whole entire data set is 12 and 20 then we divide the whole data set into 3 regions now (assuming the data set has 25 numbers)

Region 1 – (0 to 12) – inclusive

Region 2 – (12 to 20) – inclusive

Region 3 – (20 to 25) – inclusive

* Interesting thing to observe is that Region1 and Region3 include the 2 endpoints of the middle region.

Now in Region 2 – which we found just now, we apply the following steps

Here in this Region we have the 2 end points being the 1st and 2nd largest numbers in the data set.

Find the minimum of the 2 end points and the volume of this region is

Volume of Region 2 = min (numbers at the 2 endpoints)*(abs(index of endpoint 2 – index of endpoint 1)+1) –

sum( all of the bars between the 2 endpoints )

Now apply the same steps to Region1 and Region 3: So we have divided the original data set into basically 3 regions and apply the algorithm to each region recursively

Region1 now becomes the data set, similarly Region 3 now becomes the data set.

The runtime of the algorithm is log(n) base 3

= ln(n)/ln(3) ( base e )

* Attached is the python based recursive program – which computes the volume of liquid in this structure.

Here is the Python code for the above recursive algorithm

( this is not exhaustively tested program – however, I wanted to make sure my algorithm is correct on most use cases ).

This is an interesting problem where the program can be tested with a lot of very unique edge cases

**Command Line**

python HistogramWaterContent.py 5 1 2 1 8

python HistogramWaterContent.py 6 1 8 1 2 1 5

and so on – where the numbers represent the heights of the bars

**Code**

https://github.com/palsumitpal/sumitpython/blob/master/HistogramWaterContent.py

Isnt the complexity of program given by:

T (n) = 2T(n/3) + O(n)

where O(n) is for finding the max1 and max2 in a region input.

So, do you want to say T(n) is O(n) (using master theorem)

Yes you are right about the complexity

You can’t say that because the distance between the largest number and the second largest may not necessarily be n/3.

Dude, your algorithm is O(n^2) in the worse case, the reason is the following:

Imagine the input consist of a sorted array: i.e. [1,2,3,4,5,6…]. Then your algorithm will make a linear search for finding the 2 largest elements (which are at the end), then it will discard them and do a recursive search on the n-2 remaining elements and so on, finally the amount of checks on the array will be:

n + (n-2) + (n-4) + … = (n^2)/2 – 2 – 4 – 6… = …(some arithmetic)… = (n^2)/4 – n/2 = O(n^2)

This problem can easily be solved in two linear scans of the data, I don’t feel like explaining the details (I’ll leave them to you) but I’ll tell you the key observations:

Suppose you are at the i-th bar, then assume that the j-th bar is the first bar at your right (i<j) which height is strictly greater or equal to yours (i.e. there is not a k such that i<k<j and height[i]<=height[k], and height[i]<=height[j]). There will be water trapped between both bars so then you can use the formula you pointed out for calculating the volume of water to be held between the i-th and j-th bars. Then you jump to the j-th bar and continue the procedure (yes, jump to the j-th bar, not to the i+1 bar).

This approach works well until you reach the tallest bar on the histogram (so, there will be no taller bar at your right). Once that happens, then pick the j-th bar as the tallest bar on the histogram to the left of the i-th bar (i.e. i<j and the j-th bar is the tallest from i+1 to n). Then use the formula again to calculate the amount of water to be held between both bars, and jump to the j-th bar.

For assuring the O(n) time you will have to compute for each bar the index of the first taller or equal bar to its right, as well as the tallest bar from it until the end of the histogram. Also you will need the cumulative sums of the heights of the bars for being able to calculate the volume of the water that gets trapped between two bars in O(1) time. All of this can be calculated with two linear scans on the data, which can be mixed with the final scan (to make it two passes and not three) and yield finally the O(n) time.

Hope it was useful!

Cheers!

PS: it's not my intention to make a broader explanation of the solution, I think that with these key insights is more than enough to solve it; I believe that just popping out the exact answer kinds of kills the fun.

Hi

Thanks for your thoughtful comments and observations.

Here are some of my thoughts on this

For the section below what you have written

”

Once that happens, then pick the j-th bar as the tallest bar on the histogram to the left of the i-th bar (i.e. i<j and the j-th bar is the tallest from i+1 to n). Then use the formula again to calculate the amount of water to be held between both bars, and jump to the j-th bar.

"

When we are going left of the i(th) bar – you mean j

ith bar sizeWhich means the water that can be accumulated between xth and yth bar is NOT EQ to sum { ( water between ith and xth ), ( water between ith and yth )}

The actual water is min(height(xth), height(yth)) * { Distance between (xth,yth)}