Take the sports knowledge test given to wannabe ESPN employees. Given that you can look up answers on the internet, the box score question seemed like the only hard one. I spent about 2 minutes on the NBA box score and found 2 errors.
Monthly Archives: July 2014
Theory Tuesday- Newsvendor Setup
Let’s say you’re a newsvendor. You sell news(papers) or some other perishable good. You used to be a “newsboy”, but now we’re gender-sensitive, so you’re a “newsvendor”.
In the morning, you buy/build/accumulate an amount of product. You want to sell all of your product (newspapers/pastries/flowers/perishable widgets) by the end of the day. All the left-over product goes to waste. But you don’t wan to run out of product too early, or you’ll miss out on potential customers who want to buy your product later in the day.
The balancing act between having too much product and having too little product makes this setup ideal for an optimization. Operations research can tell you how much of your product to have in inventory at the beginning of the day to maximize your profit.
Let’s say your product costs $c to buy/build/accumulate. And it sells for price $p to customers. Then your profit, if you stock q quantity and see demand D is: p*min(q,D)-c*q. You only sell the minimum of q, the inventory you have initially, and D, the number of customers that arrive throughout the day to buy your product.
Every lost-sale (because you were out of inventory) costs you p-c dollars, because that would have been your additional profit if you have another unit of inventory. Every unsold unit of inventory costs you c dollars, the stocking cost.
The optimal order quantity, q, is , where is the cumulative density function for your demand for the day. The easiest way to think about this is that you want to have excess inventory (p-c)/p proportion of the time and “stock out” (sell all of your stock) c/p proportion of the time. So if your product costs you $5 to stock and sells for $8, you want to stock out 5/8 of the time. If you consistently order that amount, you will maximize your profit over time. Sometimes you will stock out and sometimes you will have excess inventory, but you will be maximizing profit. (p-c)/p is called the “critical ratio”.
Think about this the next time you see the vast stock of magazines at a bookstore. Since they never seem to stock out, c/p must be pretty low. i.e. the magazines are pure profit for them.
Code Monkey Monday- Keeping Excel from Auto-updating
Quick tip: When you use the RAND function in Excel to generate a random number, the number tends to update everytime you change something in Excel. To stop the auto-updating, copy the random number(s), and then re-paste them into the same cell(s) using Paste Special -> Values. This will save the random value and get rid of the auto-updating random function.
Book Review- Queueing Methods for Services and Manufacturing
Queueing Methods for Services and Manufacturing
Randolph W. Hall, 1991
I am reading 4-6 books at any given moment. They range from easy fiction reads to dense technical books, and I read the book that matches my mood and attention span at the time. This summer, I added textbooks to my reading rotation. This is the first textbook I’ve read outside the bounds of a structured class.
This book provides a good overview of options available to diagnose and fix queueing issues. Whether queues are perpetual, predictable, or stochastic, the book discusses them at length. It’s written more for practitioners than academics, in that there are practical diagnostic tools and suggestions for improvement instead of theorems and proofs. There is a good discussion of Little’s Law and steady-state analysis, for those interested in introductory stochastic analysis of queueing systems. I think this is a good book for practitioners and perhaps analytical MBAs, but I don’t know if it dives deep enough into the theoretical backing to be extremely useful for academics. I am glad I read it to get a stronger queueing background, though, and I will reference some of its simpler results and its results on steady-state approximations.
Sports Betting Basics
Betting the spread: If the Bengals are favored to win by 7, they are listed as -7 on the betting sheet. That means, if you bet on the Bengals to cover the spread, they have to win by more than 7 for you to win the bet. If they win by exactly 7, the bet pushes and you get your money back. If they lose or win by less than 7, you lose the bet. Similarly, if you bet on a team that is +3.5, the team you bet on either needs to win or lose by 3 or less for you to win the bet.
Most bookmakers give 11-10 odds. So you bet $11. If you win, you get $21 back (your $11 plus $10). If you push, you get your $11 back. If you lose, you get nothing.
Over/Under Line: You can also bet on the total number of points that will be scored in a game. If the Over/Under line is 51.5, you can bet the “over” and win if 52 or more points are scored in the game. Odds given are still 11-10.
Money Line: Suppose you just want to pick the winner of a game, and you don’t want to worry about the spread. You can bet on the money line. The favorite in the money line is listed with a negative value, for example, -130. In this case, you bet $130 to win $100. If the favorite wins, you get $230 back (your $130 plus $100). If the favorite loses, you are out $130. The underdog is listed with a positive number, e.g. +200. In this case, if you bet on the underdog, you are betting $100 to win $200 if the underdog wins.
Baseball: Because the outcome of a baseball game depends heavily on the starting pitchers, the starting pitchers are listed with the bet. If either of those pitchers does not end up starting the game, your bet is a push and your money is returned.
Parlay: A parlay is a selection of two or more bets, all of which must win for the parlay to pay off. In horse racing, you will often see this as the “Daily Double” or something similar. If your parlayed bets are separate in time (one after the other), you typically get better odds by making the first bet and then re-betting all of your winnings on the second bet, as opposed to wrapping them in a parlay.
Teasers: Teasers involve you betting against a modified spread on two or more games. For example, suppose the Colts are -8 favorites and the Bengals are -3 favorites. A 7 point teaser involving the Colts and Bengals would have you bet on the Colts at -1 (-8+7) and the Bengals at +4 (-3+7). So you are given 7 points on top of the spread for each game, but you have to win against BOTH modified spreads to win the bet. If either game pushes on the modified spread, the teaser bet pushes.
How to bulk download Batter vs. Pitcher Data
I worked on a project in a Complex Systems class where I wanted to know if there was any value in looking at the network of at-bats in baseball. To create this network, I assumed that if a batter got on base via hit, HBP, or walk, the batter won the at-bat and I drew a link from the pitcher to the batter. If the pitcher got the batter out, I said the pitcher won the at-bat and drew a link from the batter to the pitcher in the network. This created a directed graph that I could run networked statistics on, such as PageRank. I wanted to know how well the rank of a player in summary statistics (ERA, AVG, OBP, WAR, etc) matched up with the rank of the player in PageRank. PageRank, in this context, puts value upon beating other players with high PageRank. So, when playing the Dodgers, getting a hit off of Clayton Kershaw last year was worth more than getting a hit off of Chris Capuano. Did good players overperform or underperform against other good players? Does this have value for predicting playoff success? These were some of my questions as I started my study.
To get results of batter/pitcher matchups, I crawled Baseball-Reference.com. Their Play Index Tool lets you look up the results of any players’ batting/pitching matchups, possibly filtered by year. I wanted to download every at-bat for the year 2013.
To begin with, I’m not sure Baseball-Reference.com wanted me to crawl their records. They have disclaimers against this sort of bulk downloading, but I was using the data for a personal project and didn’t profit from it, so I went ahead. They didn’t kick off my IP as I went about crawling/downloading these matchups.
My code for this project is in Python. I used the screen-scraping package Beautiful Soup.
I first had to grab the usernames for all players in the majors in 2013. I went to this page to get the batters and this page to get the pitchers. Looking at the page source for the pitching page, you notice that the usernames start around line 1727. Download the page source for those pages and use some logic to grab all the usernames for pitchers and batters. Here is my ugly code to parse the usernames.
Once you have the usernames, you’ll want to crawl Baseball-Reference.com to get matchup data from every batter. Unfortunately, a batter’s matchup data (like this for Barry Larkin) creates the same page source whether you filter by year or not. Filtering by year only dynamically changes what is shown on the screen; it doesn’t change the page source, which is what we are going to crawl. So we have to use three steps to get only 2013 data:
-Parse the page source for a batter’s alltime matchups to see which pitchers he ever faced
-For each pitcher, see if that pitcher is in the list of 2013 pitchers
-If it is, crawl ‘http://www.baseball-reference.com/play-index/batter_vs_pitcher.cgi?batter=’+batter+’&pitcher=’+pitcher to get the line related to 2013. Add this line to your statistics that you are keeping.
Here is my Python code to download all at-bats from 2013. You’ll notice that I import urlopen from urllib2 to tell Python to open the webpages of interest. Then I use Beautiful Soup to parse the page source. Throughout the code, I added in lines like “time.sleep(random.random()*10)” from the time package to make the code delay a random amount of time. This kept me from overloading Baseball-Reference.com with requests and hopefully kept me from pissing them off. If you’re interested in using the code, note that you’ll obviously need to change your input/output folders to match your computer.
Hope this helps. I know it’s not 100% complete in its description, but post in the comments if you’re confused in some way.
Two Links Tuesday- July 22, 2014- Free eBooks edition
Books from The Great Books of the Western World that are available for free to download: If you’re not reading something technical for your job, read these classic books and make yourself a better person.
If you do want to read something technical for your job, let me suggest Think Complexity. It discusses graph algorithms, scale-free networks, fractals, and the game of life. It contains Python code, so it will also help you be a better programmer. I read most of it while awaiting assignment at Booz Allen.
Life Tips- How to Win at Monopoly
I’m a little hesitant to post this, because Monopoly is my favorite board game and I like to win. But here are two great webpages with way too much quantitative information about probabilities of hitting spaces on the board and rolls to recoup an investment:
Probabilities in Monopoly by Truman Collins
How to Win at Monopoly by Tim Darling
Here are a few of my strategies:
-Ideally, be the first to get a monopoly. Any monopoly, though the ones on the second side are preferable. Build 3 houses ASAP on each property.
-Trade. Always trade. (I also employ lots of trading in fantasy football). Be willing to give up bit pieces to get monopolies. Or even to get pieces that give you the CHANCE of picking up a monopoly. Be willing to give expensive monopolies to people that are cash-poor if it gives you a monopoly you can build on.
-Railroads are only good early in the game, in my opinion. Trade them later on for anything that increases your chance of a monopoly.
-Until <5 properties are available, get out of jail ASAP. After that, never pay to get out of jail early.
-I mortgage more bit pieces than the average player to get cash to build houses.
-Try to be the person who knocks others out of the game, because then you get their property, which is almost always valuable, even it it's mortgaged. Position yourself in the end game to knock people out.
-Sometimes it's wise to trade a bit piece for cash if it both gives you needed cash and starves someone who has an expensive monopoly of cash to build with.
-Know which properties others are likely to hit with their next roll. Build there if possible.
Book Review- Earth Afire
Earth Afire: The First Formic War
Orson Scott Card and Aaron Johnston, 2013
Ender’s Game, written by Orson Scott Card in 1985, is probably still my favorite book. Highly recommended. I read it for freshman English class at St. Xavier High School. Ender’s Game tells the story of the Second Formic War—a war waged by children against an alien species threatening Earth. I don’t want to give spoilers for Ender’s Game —you should read it!— so I’ll stop there.
Earth Afire is the second book in what is most likely a trilogy of books describing the First Formic War. Earth Unaware was the first book, which I read a couple years ago. Earth Afire follows Card’s tried and true method of having multiple competing storylines involving personal struggle and determination that eventually coalesce. Like many books in the Ender’s Game universe, Earth Afire uses at least one storyline that involves an intelligent, precocious child. We also see the initial efforts of Mazer Rackham, the hero of The First Formic War in Ender’s Game.
Overall, a pretty good book. Not the best in the series by a long shot, but a good read nonetheless. I would have had certain storylines be more realistic to the probable actions of the non-major characters, but that’s okay. A very interesting book series keeps on keeping on.
Two Links Tuesday- July 15, 2014- Grad Student Resources Edition
First Time on the Market? This page is a collection of essays written for The Chronicle of Higher Education. I jumped around a read a few. There are many good ideas about how to conduct a job search, interview, and deal with offers or rejection. Just note that each article is a sample size of one and your situation might not fit their suggestions perfectly.
Craig Holden’s Career Resources Page: Professor Holden teaches finance courses at Indiana University, and I took his “Asset Pricing Theory” course in Spring 2014. Some of the links on this page are specific to finance grad students, but many of the links to suggestions about how to write papers and find relevant papers are applicable for many students.