Skip to content

FB-Reda/HomeFieldAdvantage

Repository files navigation

An Analysis of Home Field Advantage

Alex Kosla, Bria Powell, Nina Lasswell, Franco Reda

Abstract

Many speculate that Home Team Advantages exist in all sports, with one of the main factors being fan attendance. With sports playing an important role in worldwide leisure, this project will help to produce a basis for the advantage of game location. This project will present an analysis on the impact of Home Attendance in the form of Ticket Sales for Worldwide Soccer Leagues and American Basketball on home team advantages. The plan is to use Linear Regression to examine the relationship between Home and Away game performance. Principal Component Analysis will be used to on the County Facts data set to reduce the dimensionality of the 23-variable data set. For prediction purposes, a decision tree will be formed to predict wins for future games. Association rules will be examined to discover relationships between Wins/Losses and the country fact demographics, including median incomes.

Keywords: Linear Regression, Principal Component Analysis (PCA), Association Rules.

Introduction

Home Team Advantage (HTA) is of interest to us because it calls into question the fairness various sports matches. At times, away fans claim bias in home team officials, as many referees are encouraged by the screams of the crowd. Others claim that their players are not used to the difference in climate or simply that they lack familiarity with the arena or field. Whatever the case, people want answers to the theory that home team advantage exists and poses a serious threat to the fairness of matches and thus is the purpose of our project. This topic forces the following questions to arise: what exactly causes home team advantage, how do we measure these influential factors and are they preventable?
A National Basketball Association (NBA) game has many factors that contribute to which team, home or away, is favored to win. Often these factors are not only related to the raw talent of the athletes, but rather factors pertaining the location of the game, fans in attendance, referee bias, and other relating factors, as many speculate. Professional sport wins lead to better endorsement opportunities, higher ticket sales, player transfers to better teams and better owners & sponsors. All of this leads to the increase in player salaries and revenue generated by the team. So, if a team has a HTA, it is possible that they have a higher chance of attaining these things. This is an unfair advantage, which the away team may not experience.
The objective of our project is to focus on home advantage in both the NBA and Soccer leagues. For the NBA, how its is influenced by fan attendance through ticket sales. We speculate that ticket sales are related to the country’s economic state and demographics which explain its standard of living. Using the NBA data, a comparison between ticket prices and the median income of each U.S. metropolitan area, in which the teams are located, will be made. Toronto will also be evaluated as metropolitan area. Since the Soccer leagues are within different countries, the general demographic factors will be evaluated, such as GDP, literacy & mortality rates and more. Since there are a larger number of variables, Principal Component Analysis will be used to reduce dimensionality of the country demographics in the country facts data set.

Dataset and Features

The data collection used for analysis includes the “Country_facts.csv”, “SoccerLeagues.csv” and “NBA.csv” data sets found on the Kaggle, published by Omri Goldstein, an Israeli Data Scientist. The Country facts data set includes demographics which measure Population Densities, Economic behavior, Climate, Migration, Literacy, Mortality, and Birth Rates, amongst other measurements. This data set is one part of the collection including the two specific sports data sets of Soccer and American Basketball.
The Soccer data set consists of 89 countries, 91 leagues and 1975 soccer teams. The time range of the soccer seasons is between 2009-2016. The data used to measure team performance are the Home/Away columns for Wins, Draws and Losses. The NBA data set is similar, with the exception that it does not include draws. The time range is much longer being between 1968-2010 and it consists of 56 Teams, both past and present. This set also includes columns to Home and Away percentages based on Wins.
A Median Income data set, in 2016 Inflation-Adjusted Dollars, from the U.S. Census Bureau, “ACS_16_5YR_S1903.csv”, is also used. This provides median incomes for various American households, to examine in the impact of ticket prices & attendance on Home Team Advantage. All column attributes which did not pertain to Median Income were removed, reducing the originally 62 columns of data to only 3. This data included all metropolitan areas in the USA, so all metros with NBA teams were kept and the rest removed.

References

J. Kotecki, “Estimating the Effect of Home Court Advantage on Wins in the NBA,” The Park Place Economist, vol. 22, no. 1, 2014. “Home Advantage in Soccer and Basketball,” Home Advantage in Soccer and Basketball. Kaggle.

F. Carmichael and D. Thomas, “Home-Field Effect and Team Performance,” Journal of Sports Economics, vol. 6, no. 3, pp. 264–281, Aug. 2005.

M. Ponzo and V. Scoppa, “Does the Home Advantage Depend on Crowd Support? Evidence From Same-Stadium Derbies,” Journal of Sports Economics, 2016.

A. Seçkin, “Home Advantage in Association Football: Evidence from Turkish Super League,” in ECOMOD Conference, 28-Jun-2006.

T. B. Swartz and A. Arce, “New Insights Involving the Home Team Advantage,” International Journal of Sports Science & Coaching, vol. 9, no. 4, pp. 681–692, Sep. 2014.

About

Data Mining Project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors