NBA analysis using R and JavaScript (Shiny) Part 1

NBA analysis using R and JavaScript (Shiny)

The programming language R has, over the last ten years, expanded and become one of the most widely used “analytical” languages. According to this year’s StackOverflow poll, R ranks ahead of all commonly compared languages, such as Matlab and Octave. Due to its specific functional approach as well as its many and diverse users, R has particularly distinguished itself through its successful application in academia, where it is intensively used in all fields of data science (DataScience), from the social sciences, through the natural sciences, as well as in many applied disciplines. To illustrate, it is enough to look at the lists of CRAN (Comprehensive R Archive Network) packages. In addition to classical scientific applications, R can also be used for educational purposes to analyze the data collected, especially in the era of web technology dominance, which, for R, can be a “window into the world”. The dynamic nature and flexibility of modern Javascript technologies make it possible to construct effective user applications and applets, which can be linked to the analytical basis provided by R. It is precisely the way of this connection, as well as an active approach to processing and analyzing the data and topics of this text.

By its design, R is not adapted to the creation of user applications, and especially not to modern reactive programming, where dynamic updating of variable values is expected, consistent with user actions. This is why the Shiny suite has been developed that allows you to create interactive web applications directly from any R environment, such as R studio (most used R IDE) or ESS. Shiny combines creating a user environment with analytical logic, using intuitive conventions, avoiding the complicated configuration process inherent in many web technologies. In addition, Shiny can communicate directly with the associated JS code, through a namespace variable that contains multiple methods for transferring information and data.

Sports statistics as an example of available and quality data

The recording of sports statistics has a very long tradition, and their analysis, and especially the ability to predict future results and evaluate the performance of teams and athletes, is one of the fastest-growing disciplines of data science today. Sports databases are well organized and easily accessible on the global web, and the development of sports metrics contributes to the development of the entire data science. The most famous example of an exceptionally good SQL database is the Lahman Baseball Stats Database, from 1871 to 2006, which is used for both professional analytics and educational purposes. Other well-known sites are Fangraphs and Basketball references. The educational potential of such data is reflected in providing an interesting and well-structured basis for the creation of many software solutions and analytical strategies. Two data tables from the Basketball reference page have been selected for analysis within this text, the usual performance measures of NBA Basketball Teams, Team Per Game Stats, and Opponent Per Game Stats, in the 2016/2017 season. Both tables are downloaded in .csv format and included in the source code of the Shiny application.

Performance of NBA teams through analysis of major components

The functionality and design of this Shiny application are subordinated to the goals of the analysis but also to dynamic and effective user experience. To describe the pattern of variation in NBA teams’ performance and possibly reveal some regularities or grouping patterns, we will use the Principal Components Analysis technique. PCA is one of the basic methods of multivariate statistical analysis that primarily serves to reduce the number of dimensions (variables) of a data set, in order to see overall variability through a smaller number of variables. Each NBA team, in the selected tables, is described in 17 sizes, and our goal will be to find the answers to the following questions:

Is there a grouping of teams by performance?

Can and how can the teams’ performance be described through the smaller number of variables represented by the linear combinations of the original 17 (major components)?

What is the impact of each of the measured sizes on the overall performance and eventual final placement?

These questions can also be answered by using team statistics and opponent statistics, which will include a description of team performance from both attack and defense.

Shiny application structure

The way to create Shiny applications, as well as instructive instructions for all levels of experience with this package, can be found in the official documentation, and especially in articles and lessons. Due to the high quality of official documentation, only basic ideas of the package and the specifics of the principal component analysis application will be discussed here.

In order for the application to run from the source code, it is necessary to install R and within it the Shiny and jsonlite packages. The code itself is available on the GitHub repository, and one of the very useful Shiny features is run GitHub (), which runs the application directly without the need for cloning the repository, allowing previously prepared content to be easily represented e.g. during lectures or presentations.

By running

runGitHub (“nbastream”, “paulidealiste”)

the application will run in any R environment (the first string is the name of the repository and the second is the GitHub username). Within the R studio environment, it is enough to open one of the two basic files (server.R or ui.R) and click on the Run app button which will appear automatically. Only basic principles will be described below while the recommendation is direct inspection of the code, which is rich in a commentary.

The Shiny application consists of two functional units, that is, the usual R function – the server, with primary analytical logic and UI, with the design of the application. These functions can be created within a single .R file, but can also be found in two separate ones (the older access), provided they are in the same directory, which will be our case.

The ui.R file contains the basic front-end design function:

shinyUI (fluidPage (
   # user environment design
))

FluidPage () is already a Shiny structural element whose list can be found in the official documentation in the UI Layout category. Shiny offers a number of features for creating and arranging various elements, most of which function similarly to the Bootstrap 3 framework that underlies the design of Shiny applications. Thus, Bootstrap defined CSS classes can be used to style Shiny elements, and by using the Shiny tags system it is possible to add the usual HTML elements. In addition to the structural and user (input) elements, Shiny also has specific output elements that can directly display R graphs and formatted tables according to current results. All app layout design is created using nested calls to these structural functions according to the following principle:

fluidPage (
   fluidRow (
     column (11,
       div (
         actionButton (“button1”, “Button 1”, …)
       )
     )
   )
)

The above code will result in the creation of a page with a single div of the row class containing a single column (div class column) containing a div with a single button.

The server.R file contains the basic function of server logic, starts first when you start the application and is characterized by the following signature:

shinyServer (function (input, output, session) {
   # performing analysis and reactive (data-binding) expressions
})

Likes:
9 0
Views:
1303
Article Categories:
PROGRAMMINGTECHNOLOGY

Leave a Reply

Your email address will not be published. Required fields are marked *