# The Saga of Overbooked Flights

by Kripa Jalan

Were you excited to go somewhere and get to the check-in counter to realise that the flight is overbooked and get frustrated? Have you wondered why airlines overbook and think that sometimes airlines make losses by overbooking and how they make profits irrespective of overbooking? Which models or mechanisms do they use to estimate how much to overbook? Well let me explain why and how airlines overbook and how they benefit from it.

The sole reason for airlines overbooking is profit maximization. From previous estimates of flyers and the particular flight the airlines company estimates the probability of flyers actually catching that particular flight and how many approximately do not end up catching the flight. Now if they have a rough estimate of the number of people who are not going to show up, they can sell more tickets and generate more revenue instead of losing the opportunity and making the most of it. Sometimes even passengers benefit from overbooked flights as tickets may be available at lower rates and sometimes they are offered high renumerations to take another flight in case the former flight is full which some may willingly accept.

Now airlines calculate the probability of the number of flyers showing up for a particular flight by using previous data already collected by them. They see every passengers probability for taking the flight with their previous flight record, also the number of flyers who have actually showed up for the same flight previously and overall the number of passengers who make it to a flight. Using these probabilities they estimate the probability for a particular flight and sell tickets accordingly. Now for the likeliness and the percentage for which they will be accurate varies depending on the number of seats and probability (and of course how accurate they are). They use various statistical tools and models like central limit theorem, Empirical Rule, (standard deviation) regression to the mean to calculate the same. Let me explain the following below.

What the central limit theorem states is that when variables which are random and independent are added their sum tends to a Normal distribution which is basically a bell curve even if the variables are not normally distributed. The samples are identical in size and as the sample size becomes larger, the probability distribution tends to reach normal distribution (bell curve) and more observations are closer to the mean. The samples are independent of each other, that means that the decision of any sample does not depend on another. The mean of the entire population is at the centre (highest point) of the bell curve and 50% of the data is on either side of the mean. The area under the curve is always equal to 1 and the size of the sample(N) should be asymptotically large otherwise it will not tend to normal.

Now when we come back to flights and overbooking lets take an example.

Let us take the case of the airline Boeing 737-800 since that is most widely used in the US domestic market (in 2018). After estimation, the the probability for the passengers to show up or to catch the flight is 0.85 or 85% (in the US domestic market). A Boeing 737-800 sells up to a maximum of 189 tickets. The mean number of passengers for this is given by p × N where p is the probability and N is the number of tickets sold. In our example the probability is 0.85 and N = 189. Hence the mean (represented by m) is 0.85 × 189 or simply 160.65 (≈ 161). Hence in the bell curve, since 160.65 (≈ 161) is the mean, the highest point of the bell curve is 160.65(≈ 161). The airlines use central limit theorem to obtain the normal distribution curve.

Now lets bring in the concept of standard deviation. What exactly is standard deviation? It is the measure of the amount of variation or dispersion of a set of values. Or in simple words, Standard deviation is a measure of dispersion in statistics. “Dispersion” tells you how much your data is spread out. Specifically, it shows you how much your data is spread out around the mean or average. Variance is the square of standard deviation.

Now in the central limit theorem and Empirical rule the formula for standard deviation is √𝑁𝑝𝑞 where q=(1-p), which is the square root of variance (Npq) or p×(1-p) ×N or simply √𝒑 × (𝟏 − 𝒑) × 𝑵. The symbol for standard deviation is σ.

So in our example the standard deviation is √𝟎. 𝟖𝟓 × 𝟎. 𝟏𝟓 × 𝟏𝟖𝟗 = √𝟐𝟒. 𝟎𝟗𝟕𝟓 = 4.908 (4.9). If we deviate from the mean by ±1σ (deviation on either side of the mean) that is ±4.9, we would get 156.1 and 165.9 . If we deviate by ±2σ we will get 151.2 and 170.8 and if ±3σ then 146.3 and 175.7. Now statistically there is a 68.2% chance that we would end up getting a value between 156.1 and 165.9 which means that there is a 68.2% chance that 156.1(≈ 156) and 165.9 (≈166)passengers will show up or make it for the flight. When we deviate from the mean by ±2σ there is a 95.4% chance that the value will lie between 151.2(≈151) and 170.8(≈171) and hence a 95.4% chance that the number of passengers that will show up for the flight will be between 146.3(≈146)and 175.7(≈176). And now if we deviate by ±3σ there will be a 99.75% chance that the value obtained will be between 146.3(≈146)and 175.7(≈176) or that there is almost a 99.75% chance that between 146.3(≈146)and 175.7(≈176) passengers will show up. And hence there is almost a 99.75% chance that this flight will not be full and in fact have 13 seats left and will not be overfull.

The distribution in the above example is a binomial distribution, with probability of success denoted as p, probability of failure denoted as q or simply as (1-p) [because the sum of probability of failure and success is = 1 since it is a binomial distribution]. N is the total number of passengers and mean is given by the formula Np, Variance as Npq and standard deviation as √𝑁𝑝𝑞.

Since N is asymptotically large and the we assume that passengers are independent in nature- one persons actions or decision does not affect the other and there is no dependence on each other, we consider each passenger as an individual variable and hence we can apply central limit theorem in this case, which states that the distribution tends to normal.

The percentages have been calculated using the Empirical rule or simply known as the 68-95-99.7 rule. The percentage is derived from the area that is covered below the graph by deviating from the mean by the respective standard deviations. For example, when we deviate by ±1σ, the total area covered by the graph between -1σ and +1σ will be 68.4% of the total graph and so on.

Now in the example mentioned above wouldn’t it make sense for the flight to sell more tickets than the actual number of seats?

There is a 99.75% chance that airlines will not have any customers who have to be bumped to another flight cause of the airlines fault and in case they didn’t overbook they would have lost an opportunity to make more revenue. If their estimations of the probability value is actually correct, airlines can actually make a lot more revenue, they just need accurate and correct data. The leeway that airlines can enforce in the number of extra tickets they sell depends on the probability estimated by the airline company and ofcourse the airline model. Now next time when you come across an overbooked flight you will definitely know why it happens and the model and mechanism they follow.

References: