Category Archives: Operations Management

Forecasting Attendance at Baseball Games

Here is a pdf of my most recent school project: Predicting Day-to-Day Variability in Baseball Attendance to Support Staffing

It details how I used 30 years of attendance data at MLB games to determine what is important in predicting attendance. A lot of businesses surrounding stadiums rely upon accurate forecasts of attendance in order to staff their business appropriately. If the effect of short-term factors (weather, recent performance) dominate, schedulers would do well to wait until the last minute to put out a staff schedule. However, it seems that long-term factors (date/time of game, opponent, performance in past seasons) dominates the attendance regression, giving schedulers the ability to put out schedules well in advance of gameday. A missing short-term regressor in my paper is pitching matchup. If star starting pitchers really bring more fans to the game, that might increase the importance of the short-term factors.

Theory Thursday- Conditional Probability

Leave a reply

Now that we know the Three Axioms of Probability, we can understand conditional probability.

First, let’s think about a normal (unconditioned) event. What is the probability of rolling an 8 with 2 normal dice (equally likely outcomes from 1 to 6)? Well, we can roll a 2-6, 3-5, 4-4, 5-3, and 6-2 with the two dice to get a sum of 8. That’s 5 possible outcomes that sum to 8. There are 36 possible outcomes, so the probability is 5/36.

The conditional probability of an event is the probability that the event happens, given that another event has happened or will happen. So, for example, what is the probability that I roll an 8 with 2 dice, given that the first die is a 2? Well, I would need the second die to be a 6 for them to add to 8, and there are 6 options for the second die. So I have a conditional probability of 1/6 of rolling an 8, given that the first die was a 2. The conditional probability of rolling an 8 given the first die is a 2 (1/6) is higher than the unconditioned probability of rolling an 8 with 2 dice (5/36).

What is the conditional probability of rolling an 8 with 2 dice, given that the first die is a 1? Well, we would need the second die to be 7. But the die only has options from 1 to 6. So we cannot roll an 8 with 2 dice if one of the dies is a 1. The conditional probability is 0.

We have an easy way to calculate the probability of a conditional event happening. Let $E$ be the event we want to happen, conditional on the event $F$ happening. Let $P(E|F)$ be the conditional probability of $E$ given $F$ . Then $P(E|F)=\frac{P(EF)}{P(F)}$ , where $P(EF)$ is the probability of both $E$ and $F$ happening.

In our example with the dice, $E$ =roll an 8 with 2 dice and $F$ =roll a 2 with the first die. There is one way to roll an 8 with two dice and roll a 2 with the first die: 2-6. There are 36 possible outcomes, so $P(EF)=1/36$ . $P(F)=1/6$ because there is a 1 in 6 chance of rolling a 2 with the first die. $P(E|F)=\frac{1/36}{1/6}=1/6$ , as we found above.

Theory Tuesday- 3 Axioms of Probability

Leave a reply

Everyone has an intuitive concept of chance/probability. When we say that there is a certain probability of an event happening, what do we mean?

First, you need to understand the concept of an “event”. An event is one possible outcome of some probabilistic scenario. Say you are flipping a coin. The two possible events are “heads” and “tails” (though I guess you could argue that there is a third event: “coin lands on its edge”). As you will see in Axiom #3, since a coin cannot land on both “heads” and “tails, “heads” and “tails” are mutually exclusive.

There are three axioms upon which probability theory is built:
1. Let $P(E)$ be the probability of an event. Then $0 \leq P(E) \leq 1$ . The probability of any event is between 0 and 1.
2. Let $S$ be the set of all possible events. Anything that could possibly happen is contained in $S$ . $P(S) = 1$ . The probability of some event happening is 1.
3. Let $E_1$ , $E_2$ , …, $E_n$ be a sequence of mutually exclusive events. Two events are mutually exclusive if only one of the two can happen at any time. $P(\cup_{i=1}^n E_i)=\sum_{i=1}^n P(E_i)$ . This says that you can add the probabilities of two or more mutually exclusive events together to get the probability of any one of them happening.

With these three axiomatic building blocks, all of probability theory can be built.

Book Review- Mathletics

Leave a reply

Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use Mathematics in Baseball, Basketball, and Football
By Wayne Winston, 2012 edition

I met Wayne in October 2013. I knew that I had one of his books about operations research, but didn’t know much about him. After meeting with him and doing a little research on my own, I’ve learned that he has written an infinite number of books and that he is very skilled at Excel and at sports analytics (among many other things). This book gives intuitive ways to model sports questions and perform analysis with Excel. Example questions include:
-Should we go for the one-point or two-point conversion?
-Should we foul at the end of a basketball game in which we are up 3 points?
-How do different baseball ballparks affect game outcomes?

I’m not a huge fan of analysis in Excel, but Wayne is a wizard at it. I prefer to do a lot of customized number crunching via a programming language (Python/R/Matlab), as opposed to massaging Excel to get what I want. However, this book gave me a lot of ideas for future analysis. It’s kind of a tough read, in that it has 51 chapters over 340 pages and starts/stops thoughts pretty quickly. There are also a lot of editing mistakes in my version, which can sometimes be confusing if you’re not paying attention. I read Mathletics over the span of about 8 months, starting and stopping frequently. I’m glad I finished it, though, because some of the best chapters are in the last part of the book.

Theory Tuesday- Newsvendor Setup

Leave a reply

Let’s say you’re a newsvendor. You sell news(papers) or some other perishable good. You used to be a “newsboy”, but now we’re gender-sensitive, so you’re a “newsvendor”.

In the morning, you buy/build/accumulate an amount of product. You want to sell all of your product (newspapers/pastries/flowers/perishable widgets) by the end of the day. All the left-over product goes to waste. But you don’t wan to run out of product too early, or you’ll miss out on potential customers who want to buy your product later in the day.

The balancing act between having too much product and having too little product makes this setup ideal for an optimization. Operations research can tell you how much of your product to have in inventory at the beginning of the day to maximize your profit.

Let’s say your product costs $c to buy/build/accumulate. And it sells for price $p to customers. Then your profit, if you stock q quantity and see demand D is: p*min(q,D)-c*q. You only sell the minimum of q, the inventory you have initially, and D, the number of customers that arrive throughout the day to buy your product.

Every lost-sale (because you were out of inventory) costs you p-c dollars, because that would have been your additional profit if you have another unit of inventory. Every unsold unit of inventory costs you c dollars, the stocking cost.

The optimal order quantity, q, is $F^{-1}((p-c)/p)$ , where $F()$ is the cumulative density function for your demand for the day. The easiest way to think about this is that you want to have excess inventory (p-c)/p proportion of the time and “stock out” (sell all of your stock) c/p proportion of the time. So if your product costs you $5 to stock and sells for $8, you want to stock out 5/8 of the time. If you consistently order that amount, you will maximize your profit over time. Sometimes you will stock out and sometimes you will have excess inventory, but you will be maximizing profit. (p-c)/p is called the “critical ratio”.

Think about this the next time you see the vast stock of magazines at a bookstore. Since they never seem to stock out, c/p must be pretty low. i.e. the magazines are pure profit for them.

Book Review- Queueing Methods for Services and Manufacturing

Leave a reply

Queueing Methods for Services and Manufacturing
Randolph W. Hall, 1991

I am reading 4-6 books at any given moment. They range from easy fiction reads to dense technical books, and I read the book that matches my mood and attention span at the time. This summer, I added textbooks to my reading rotation. This is the first textbook I’ve read outside the bounds of a structured class.

This book provides a good overview of options available to diagnose and fix queueing issues. Whether queues are perpetual, predictable, or stochastic, the book discusses them at length. It’s written more for practitioners than academics, in that there are practical diagnostic tools and suggestions for improvement instead of theorems and proofs. There is a good discussion of Little’s Law and steady-state analysis, for those interested in introductory stochastic analysis of queueing systems. I think this is a good book for practitioners and perhaps analytical MBAs, but I don’t know if it dives deep enough into the theoretical backing to be extremely useful for academics. I am glad I read it to get a stronger queueing background, though, and I will reference some of its simpler results and its results on steady-state approximations.

Part 2 of Sacramento Kings Crowdsourcing their Draft Analytics

Leave a reply

Part one of Grantland’s video showing the Kings crowdsourcing their draft analytics.

Here’s part two, in which the Kings fail to do anything other than draft Nik Stauskas. Success?

https://www.youtube.com/watch?v=rEN6ad_Aw-M

Theory Tuesday- Statistics’ Place in Big Data

Leave a reply

Interesting, but long, talk about statistics place in the Big Data world:

I’d suggest watching from about 10 minutes in to about 40 minutes.

“Statistics”, “data mining”, and “bioinformatics” are all on the decline according to Google Trends, while “Big Data” is booming. Many big data people don’t see the need for statisticians because of their seemingly antiquated/belligerent/unhelpful opinions on model validity, result confidence, and experiment design. However, people who ignore statistics are condemned to re-create statistics.

In my experience, the people who don’t see value in statistics are action-oriented and typically mathematically-ignorant. These people want to do something, and they are not especially interested in how accurate their actions are. More responsible big data teams will be built with people with three skill sets: programming, math/statistics, and domain knowledge.

Theory Thursday- Simulation of a Poisson Process

Leave a reply

You are using discrete-event simulation to analyze a process or system. Imagine that your arrivals occur according to a Poisson Process. Without using specialized simulation software, this post shows how to code up a Poisson arrival process. $Y_n$ will represent the n’th arrival time.

Stationary arrivals:
If your arrival rate does not vary over time, then your task is easy. You have two main options:
1. Generate exponential random variables $X_1$ , $X_2$ , … , representing interarrival times. Set $Y_1 = X_1$ , $Y_2 = Y_1+X_2$ , $Y_3=Y_2+X_3$ ,… Stop generating random variables when $Y_m>T$ for some m, where T is your time interval length that you want to simulate.
2. Generate a Poisson random variable $N(T)$ , representing the number of arrivals over a time interval of length T. Then generate $N(T)$ uniform [0,T] random variables, representing arrival times. Sort the arrival times in ascending order to obtain $Y_1$ , $Y_2$ , …

Non-stationary arrivals:
If your arrival rate varies over time, you’ll need an extra step or two of logic to generate your arrival times. Here are two options for simulation:
1. Find the highest arrival rate in your time interval, $\lambda_{max}$ . Generate a stationary Poisson process with rate $\lambda_{max}$ as in option 1 in the stationary arrival section above. For each arrival simulated, generate a uniform[0,1] random variable. If this uniform random variable is less than $\lambda(t)/\lambda_{max}$ , where $\lambda(t)$ is the arrival rate at the generated arrival time, accept the arrival as “real” and keep it. If not, discard the arrival. The “real” arrivals make up an accurate non-stationary Poisson arrival process.
2. Divide the time period [0,T] into small time increments. For each time increment, generate a uniform[0,1] random variable. For each time increment, if the uniform variable is less than or equal to $\lambda(t) dt$ , where $dt$ is the time increment size, then an arrival occurs during that increment. Assign the arrival time to be a random time in the increment. This method is only approximately correct, but it is good enough in most cases and may be faster to simulate in certain cases.

There are other options, but these will get you started in simulating your Poisson processes.

Sacramento Kings Crowdsource Draft Analytics

Leave a reply

Painfully interesting mini-movie from Grantland about the Kings’ GM crowdsourcing some of their draft analytics to get fresh ideas.

https://www.youtube.com/watch?v=OuwvsZOvcms

It’s a deep draft. But with my minimal analysis, I don’t seem a huge difference in player potential from the ~4-~20 range. The Kings have the 8 pick, right in the middle of the “really good, but not necessarily great” range. If they can identify some value picks toward the end of the lottery or the late first round, I think it makes sense to move down from the 8 spot if they can get 2 first round picks in return. But I doubt anyone will offer that. Looking forward to seeing what happens in the draft.

Eric Webb

Personal blog with a mix of operations management, energy, and sports.

Category Archives: Operations Management

Forecasting Attendance at Baseball Games

Theory Thursday- Conditional Probability

Theory Tuesday- 3 Axioms of Probability

Book Review- Mathletics

Theory Tuesday- Newsvendor Setup

Book Review- Queueing Methods for Services and Manufacturing

Part 2 of Sacramento Kings Crowdsourcing their Draft Analytics

Theory Tuesday- Statistics’ Place in Big Data

Theory Thursday- Simulation of a Poisson Process

Sacramento Kings Crowdsource Draft Analytics