In fact this is a well-studied statistical problem, known since WW2 as the German Tank Problem. For these purposes you can get away with the simple unbiased estimator for the maximum value in an unknown sequence, N̂ ≈ m + m/k - 1, where m is the highest number seen in the sequence, and k is the number of samples. So e.g. if a given boxel has 5 systems in the database, and the highest number is 20, then you can estimate that there are about (20 + 20/5 - 1) = 23 systems in that boxel.It's not perfect of course, since there's no way to estimate how many more stars may have numbering higher than those that are known. But I figure that will mostly average out, and I just need a ballpark estimate of how thoroughly explored each area is.
Of course the estimate is fairly meaningless if you have only one or two systems seen (the linked article talks about confidence intervals, which get quite large for very small k), but the nice thing about it being an unbiased estimate is that the errors really will wash out when averaged over a bunch of sparsely sampled boxels.
Even so, I can already say that the saturation map you're producing feels generally correct. At least the places I've been that are red were places where it felt like there were very few untagged systems - e.g. when I was there after DW2 I was surprised at how thoroughly the far western Abyss was already explored.
Agreed, it's extra headache for no gain to even try the calculation in the Bubble. It's a royal jumble of hand-placed systems and sectors, and it's all fully tagged anyway. I'd probably not bother doing the computation within 500 ly of Sol, and then kick in a gradient that falls to zero at about 1000 ly. That would keep the vast majority of the weird boundary issues and non-procgen systems safely out of consideration.Instead maybe I should assume exploration is 100% at Sol, and average in a gradient from there, fading out to zero effect at around 600 ly or so. Everything within 500ly seems pretty reliably tagged these days.