Hi Leif,
1) Sorry I spelled your name incorrectly - we don't have a lot of Norwegians around here.
2) Your example about the average height of a population is OK, but even then you need to keep in mind that it's only an approximation of a normal distribution. If it were a 100% true normal distribution, there would be some infinitesimal probability of those negative heights (which is, of course, impossible). The thing is, you can undoubtedly go many standard deviations away from the mean and not come close to zero, so you can EASILY get away with that normal distribution approximation, and be very accurate with it. Your "distance" distribution does NOT have that going for it, so it's on shaky footing to start with.
We had a similar discussion on here some years ago about the probability distribution of cM values, for a given relationship level. I tried to point out that while an author of some blog that some people were familiar with had made the assumption that the distribution would be a normal distribution, that I had empirically seen what the distribution was in my own data set, and it was not even close to gaussian. (They wouldn't listen).
I'm not saying that your distribution is "not even close", but I've also had occasion in an entirely different technical setting where there was an insistence on using a gaussian where the relevant properties of the distribution were simply nothing like that of a gaussian, aside from being big in the middle, and tapering off on the ends.
It's just unwise to tether yourself to that description, if that's what you're doing.
3) If you add even just three random variables together, each independent, with identical uniform distributions, it starts to look somewhat gaussian. Adding 6 dice is adding together 6 such random variables, so yeah, it might do pretty well. I would not call it "perfect", especially since it is a discrete distribution, but it might match up pretty well. It also probably has the problem of not going out very many standard deviations from the mean - so the "tails" of the distribution would have to be inaccurate. The normal distribution would probably tell you you have a one in a million chance (or some such low number) of rolling a negative number. So the approximation is OK, as long as the application you're looking at doesn't care about what it says about that one in a million (or whatever) case.
4) It sounds like you're trying hard to disagree with me, with your talk of a "small town area", but you're not saying anything that's any different. The "population" I'm talking about refers to one of several things, depending on the context. First, is the 38,000 in your distribution. I also spoke of it in the context of other isolated populations that other people might be related to, resulting in an additional "hill" in their "100 circles" distribution. Finally, I'm referring to the 30M+ population of WikiTree, which results in the main "hill" on anybody's "100 circles" distribution.
You described you research as encompassing virtually all of the population of the area within a given timeframe. You say it's not about your own family, but you say you're in the database, and the people in it are heavily interrelated - and I didn't say anything about it being only about your family anyway. You say that there are 120,000 living there now - I would assume that most of them are NOT in your database, but that most of their ancestors ARE, so actually your database probably has about 1/4 of ALL the population. So your study DOES "cover a small subset of the entire population", but that's completely irrelevant to what I was saying anyway. I made no such assumption, nor was any such assumption implied.
The only relevance of your own family to the discussion is to explain the first few numbers in your distribution, which I describe as simply making your way to the main part of your population. Your "dist=1" count is 2, which apparently means it's just your parents. If you have a spouse, or siblings, or children, they are (hopefully) alive, but do not appear in the count because you haven't put many living people in your database. The "dist=2" count being 5 is likely your grandparents, plus some other relation. These first few numbers are clearly what they are because of who you have included in the database from your immediate family. That's not a criticism - that's just explaining why the first few numbers (which are all about your own immediate family) are kind of artificial, and just about getting us from you to the main part of the population. It's the numbers that come after that which are what the real discussion is about.
My own "degree 1" number is 3, but if all people who ever lived had a WikiTree profile, it would be 8. My "degree 2" number, which is 13, would be doubled. I just don't see much value to adding profiles that only I can see (normally), and which can cause problems too.
5) You say 'The connection count will taper off ... because there will be ever less connections to make as the coverage within the area is being saturated. There is no "remaining" population here.' Actually, the "remaining" population I'm referring to is what's left of the 38,000 after the rest of them have already been assigned to the various previous circles (or "distances"). "The coverage ... being saturated" that you referred to literally occurs when the "remaining" population I referred to is down to less than about half of the whole population (the "whole" being the 38,000).
It's all about how when an individual with the database is assigned to a "distance" that they are removed from consideration, as far as being assigned to subsequent "distances". The people that have not been "removed" in that way are the ones that "remain" to still be considered for being assigned to the higher numbered "distances". I can't imagine what else you might have thought "remaining population" might mean in this context, but apparently you took it the wrong way.
6) As you can see, this distribution has exponential growth leading up to the top of the "hill", and exponential decay once you're past it. A normal distribution may look somewhat exponential in the tails, but is fairly linear leading up to (and after) the "hill". It doesn't really fit what we're seeing here, aside from the very crude "being big in the middle and dropping toward zero at the tails".