Skip to content
Snippets Groups Projects
Commit ce52741e authored by stuncay2's avatar stuncay2
Browse files

Project Proposal

parent 142ba38a
No related branches found
No related tags found
No related merge requests found
---
title: 'STAT 420: Data Analysis Project Proposal'
author: "Kieran Daly, Jack Meyers, Serhat Tuncay"
date: "July 21, 2023"
output:
html_document:
theme: readable
toc: yes
pdf_document: default
---
```{r setup, echo = FALSE, message = FALSE, warning = FALSE}
options(scipen = 1, digits = 4, width = 80)
library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
```
## Team
- Size : 3
- Name : StatJKS
Name
--------------
Kieran Daly
Jack Meyers
Serhat Tuncay
## Project Title
**Analyzing CT State Employee Wage Data**
## Introduction
Our proposal is to study the Connecticut state employee payroll data provided by the office of the state comptroller in order to study trends in state employees’ pay. Each row in the dataset details an individual payroll check issued to a state employee in Connecticut starting from 2015 and contains 38 columns. This dataset contains many data features that can help us identify wage trends such as ethnicity, sex, age, government agency, and location. The dataset contains over 14MM rows which gives us plenty of data to work with, our plan is to isolate a single year (2022) in order to tighten the scope of our study. It will be an interesting endeavor to investigate wages of these state employees (as Kieran is from Connecticut). From just sorting the data, one state employee makes over $11,000,000 a year! This sounds like a large salary for a government employee, so we can't wait to start looking into it further.
## DataSet
* **Source:** This data set is taken from [Connecticut Open Data](https://data.ct.gov/Government/State-Employee-Payroll-Data-Calendar-Year-2015-thr/virr-yb6n).
The full dataset contains the employee payroll data since 2015. However, as mentioned earlier we will only analyse the payroll data for 2022 as the full data set is huge.
```{r kable,message=FALSE,echo=FALSE,warning=FALSE}
# Libraries, Helpers and read the data.
library(readr)
```
```{r}
state_employee_payroll_2022_data = read_csv('./dataset/State_Employee_Payroll_Data_Calendar_Year_2022.zip')
```
* As you can see above we can read the data with a warning. Also, some of expected factor variables such Sex is read a character field. These issues will be addressed as first during the actual development.
```{r}
str(state_employee_payroll_2022_data)
```
* **Observation** There are 38 columns in data. Variable Tot Gross being the dependent variable.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment