Chapter 2 Proposal

2.1 Research topic

This project explores the 2018/2019 academic achievement results for school districts of New York State. The main focus of this project is to visualize data, highlight strong correlations, and provide rankings based on different measures. We do not go beyond the boundaries of exploratory analysis.

The academic assessment results are divided into two categories: results for State Assessment in Mathematics and results for State Assessment in Reading/Language Art. We explore the distribution of the assessment results across these measures:

  • School district
  • Racial and Ethnic Groups
  • Sex
  • Disability Status
  • Economically Disadvantaged Status
  • Migrant Status
  • Homeless Enrolled Status
  • Foster Care Status
  • Military Connected Student Status

We also provide a consolidated scoring to combine the Mathematics and Reading/Language Art. This is mainly a means to compare district performance holistically. The ultimate goal of the project is to provide a set of visualizations to provide the reader a view on how well different school districts of New York are performing across different subgroups of students.

2.2 Data availability

Data Source: As part of the EDFact initiative of U.S. Department of Education data on the academic assessment is being collected and centralized across states and schools. This data is meant to be a high quality and vetted data source that is being used for policymaking and education planning.

Link to Source: https://www2.ed.gov/about/inits/ed/edfacts/data-files/index.html

Datasets:

  1. SY 2018-19 Achievement Results for State Assessments in Mathematics (LEA - Wide file)
  2. SY 2018-19 Achievement Results for State Assessments in Reading/Language Arts (LEA - Wide file)
  3. SY 2017-18 Achievement Results for State Assessments in Mathematics (LEA)
  4. SY 2017-18 Achievement Results for State Assessments in Reading/Language Arts (LEA)
  5. SY 2016-17 Achievement Results for State Assessments in Mathematics (LEA)
  6. SY 2016-17 Achievement Results for State Assessments in Reading/Language Arts (LEA)

Data Collection / Range: The data is collected by the Department of Education through annual district and state reports. It is being renewed/updated annually in each school year; the dataset includes academic performance data ranging from school year 2009-2010 to school year 2018-2019. For this project we will be focusing on The state of New York for the past 3 years.

Data Transformation / Ingestion: The data is reported on two different levels of aggregations: LEA level (local education authority) and School level. We are mostly interested in LEA level data for this analysis. There are also two different formats for the latest data: Wide (each value of a subgroup is a field), Long (each value of a subgroup has its own record). The wide format of the data is more suitable for our work but requires transformation. There are 262 variables on the wide version. We need to create new variables such as Race, Sex, Socio-ecomomic status, etc. to summairze the data properly. The data is needed to be consolidated across different years and different assessment types (Mathematics, Reading/Language Arts). All files are provided in csv format. We download the data and export it to the RStudio locally.