Skip to content
Snippets Groups Projects
Commit 1dade4b6 authored by Jack Meyers's avatar Jack Meyers
Browse files

two models

parent afab1261
No related branches found
No related tags found
No related merge requests found
...@@ -138,17 +138,19 @@ Once the data was finally organized we were able to test some models. Below we c ...@@ -138,17 +138,19 @@ Once the data was finally organized we were able to test some models. Below we c
```{r} ```{r}
payroll_numeric = payroll %>% select_if(~class(.) != 'factor') payroll_numeric = payroll %>% select_if(~class(.) != 'factor')
numeric_model = lm(AnnualRate ~ ., payroll) numeric_model = lm(AnnualRate ~ ., payroll_numeric)
summary(numeric_model)
``` ```
While that model performed alright, the goal was to use to all of the categorical model to also inform the regression. We ran a backward search using AIC with all of the predictors in order to find a small model that would have more predictive power. While that model performed alright, the goal was to use some of the categorical predictors to also inform the regression. We ran a backward search using AIC with all of the factor predictors in order to find the factor variables which influence the regression.
```{r} ```{r}
reduced_numeric_model = step(numeric_model, direction = "backward", trace = 0) payroll_factor = payroll %>% select_if(~class(.) == 'factor')
summary(reduced_numeric_model) payroll_factor$AnnualRate = payroll$AnnualRate
factor_model = lm(AnnualRate ~ ., payroll_factor)
summary(reduced_factor_model)
``` ```
**Exploring Collinearity and Correlation of Predictors** **Exploring Collinearity and Correlation of Predictors**
...@@ -170,7 +172,7 @@ This base model will use several features that we think could be the most influe ...@@ -170,7 +172,7 @@ This base model will use several features that we think could be the most influe
The main factor that determines the wage of a CT state employee should be agency. The agency variable is a factor variable that holds dozens of different agencies. Before potentially any further data manipulation it might be helpful to view the largest and smallest coefficients. The main factor that determines the wage of a CT state employee should be agency. The agency variable is a factor variable that holds dozens of different agencies. Before potentially any further data manipulation it might be helpful to view the largest and smallest coefficients.
```{r} ```{r}
mod_start = lm(`Annual Rate`~ Agency, data = payroll_data) mod_start = lm(`AnnualRate`~ Agency, data = payroll_data)
``` ```
```{r} ```{r}
...@@ -186,7 +188,7 @@ It seems like the agencies with the lowest average annual income are CCC in Thre ...@@ -186,7 +188,7 @@ It seems like the agencies with the lowest average annual income are CCC in Thre
2. Besides from agency, age (as a proxy for seniority) might be a statistically significant factor. 2. Besides from agency, age (as a proxy for seniority) might be a statistically significant factor.
```{r} ```{r}
mod_2 = lm(`Annual Rate`~ Agency+Age, data = payroll_data) mod_2 = lm(`AnnualRate`~ Agency+Age, data = payroll_data)
``` ```
From the summary we can see that age is definitely a statistically significant variable. We will need to be wary of the range of values age was trained on. If we interpret the regression for someone who is 100 years old and works in the Judicial Branch then they would make $354906 every year. From the summary we can see that age is definitely a statistically significant variable. We will need to be wary of the range of values age was trained on. If we interpret the regression for someone who is 100 years old and works in the Judicial Branch then they would make $354906 every year.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment