# Calculating moving median in Postgres

## 0. The problem

Some time ago I’ve stumbled on this interesting question on stackoverflow (as stated in the title). I could not find any satisfactory answer at that time, so I came up with this solution. I think the problem is interesting enough to make it into a blog post.

The following table shows the expected results. The x column contains the original data whereas mdn_x contains the median computed from current up to 3 preceding rows.

## 1. Quick solution

Unfortunately ordered-set aggregate functions do not support windows, which would be the most intuitive approach. However, the window size in this example is fixed as 4 – it can be easily calculated using functions like lead and lag. A possible solution would look like this:

select x,
(lag(x, 2) over w + lag(x) over w) / 2. as mdn_x
from tmp t
window w as (rows between 3 preceding and current row)
order by 1;


It reads pretty easily – calculate the average of two previous values. It works for all rows except for the first three, because of the lag returning null. We can easily fix this by introducing a null-check:

select x,
case
when lag(x) over w is null then x
when lag(x, 2) over w is null then (x + lag(x) over w) / 2.
when lag(x, 3) over w is null then lag(x) over w
else (lag(x, 2) over w + lag(x) over w) / 2.
end
from tmp t
window w as (rows between 3 preceding and current row)
order by 1;


This is a bit more verbose. Suppose $x_i \leq x_j$ for $i < j$. The query can be read as:

1. If this is the first value – it’s also a median ( $median(x_1) = x_1$).
2. If this is the second value – the median is the average of both this and previous values ( $median(x_1, x_2) = \frac{x_1 + x_2}{2}$).
3. If it’s the third value – take the previous one ( $median(x_1, x_2, x_3) = x_2$).
4. Otherwise the moving median is the average of two previous values ( $median(x_{i - 3}, x_{i - 2}, x_{i - 1}, x_i) = \frac{x_{i - 2} + x_{i - 1}}{2}$).

We can easily see the pattern and writing moving median for any window size should be easy. For N it would be:

select x,
(lag(x, (N + 1) / 2) over w + lag(x, N / 2) over w) / 2. as mdn_x
from tmp t
window w as (rows between N preceding and current row)
order by 1;


Or, if we were to consider the corner case:

select x,
case
when lag(x) over w is null then x
when lag(x, 2) over w is null then (x + lag(x) over w) / 2.
-- other terms
when lag(x, N) over w is null then (lag(x, (N - 1) / 2) over w + lag(x, N / 2) over w) / 2.
else (lag(x, 2) over w + lag(x) over w) / 2.
end
from tmp t
window w as (rows between N preceding and current row)
order by 1;


## 2. General solution

The previous solution is flawed, because it needs to be rewritten for every query. Instead of this quick and dirty solution, we need a general one. We can achieve this by using a custom aggregator.

When moving along the window, we have to put each value in an ordered array. That is exactly what the following method does.

create function median_sfunc (
state integer[], data integer
) returns integer[] as
$$begin if state is null then return array[data]; -- if the array is null, return a singleton else return state || data; -- otherwise append to the existing array end if; end;$$ language plpgsql;


Once the array is populated, we need to take the middle value if the array’s length is odd and the average of two middle values otherwise. The implementation is quite straightforward.

create function median_ffunc (
state integer[]
) returns double precision as
$$begin return (state[(array_length(state, 1) + 1)/ 2] + state[(array_length(state, 1) + 2) / 2]) / 2.; end;$$ language plpgsql;


Now we can compose the aggregate using an empty array as the initial value:

create aggregate median (integer) (
sfunc     = median_sfunc,
stype     = integer[],
finalfunc = median_ffunc
);


Finally, we can use the median function in a beautiful, compact yet expressive way:

select x,
median(x) over w as mdn_x
from tmp t
window w as (order by x rows between 3 preceding and current row)


## 3. Conclusions

Two things can be noticed about the solution:

1. The median function can be parameterized in any way using the window clause.
2. The entire implementation can be further generalized for any input type by replacing integer[] with anyarray and integer with anyelement.

Other constraints and fine-tuning such as making the functions immutable was omitted for brevity.

The presented solution can be further improved by using moving-aggregate mode, but I’ll leave that for another post.

There is nothing scary in implementing own aggregations in Postgres. This way we can make our code more readable and faster.

## 3 thoughts on “Calculating moving median in Postgres”

1. Hairstyles says:

This is very fascinating, You are an overly skilled blogger. I have joined your rss feed and sit up for seeking more of your excellent post. Also, I’ve shared your website in my social networks!

Like

2. Anonymous says:

Nice solution. But are you not missing the sorting of the array in median_ffunc(). Without sorting, the median will not be returned.

Like

1. andronicust says:

Thank you, I hope it helped. You’re right that the function only picks the middle element, so you need to specify the ordering when you’re using the window function as I did with window w as (order by x rows between 3 preceding and current row)

Like