Assignment file and week four lectures attached. Please ask any questions and provide original work.Week 4 Problem SetLet’s look at some other factors that might influence pay. In the Student Assignment File Week 4 tab, complete the problems included, and submit your work in an Excel document. See Where Is Help Button in Microsoft Excel 2007, 2010, 2013 and 2016 (Links to an external site.)Links to an external site., Load the Analysis ToolPak (Links to an external site.)Links to an external site., and Use the Analysis ToolPak to Perform Complex Data Analysis (Links to an external site.)Links to an external site. for more information on how to use the required technologies for the course. Be sure to show all of your work and clearly label all calculations.All statistical calculations will use the data found in the Data tab in the Student assignment file.

week_4_assignment.xlsx

2018c_canvas_lecture_week_4___1a__3_.pdf

2018c_canvas_lecture_week_4___2a.pdf

2018c_canvas_lecture_week_4___3a.pdf

Unformatted Attachment Preview

ID

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

Salary Compa- Midpoint

ratio

62.3

27.8

35.6

59.2

47.3

73.7

40.9

23.7

76

23.9

22.7

60.1

42.2

23.3

21.5

45.2

70.7

33.8

23.8

32.9

78.2

53.4

23.2

60

24.3

22.7

41.1

76.6

75.4

47.2

23.5

26.8

59.7

27.9

23

23.2

24.6

60.2

35.1

25.6

1.093

0.896

1.149

1.038

0.986

1.100

1.023

1.030

1.134

1.038

0.988

1.055

1.055

1.012

0.935

1.129

1.241

1.089

1.036

1.061

1.167

1.113

1.009

1.249

1.058

0.989

1.027

1.143

1.125

0.984

1.020

0.866

1.047

0.900

1.000

1.007

1.069

1.057

1.132

1.112

57

31

31

57

48

67

40

23

67

23

23

57

40

23

23

40

57

31

23

31

67

48

23

48

23

23

40

67

67

48

23

31

57

31

23

23

23

57

31

23

Age

34

52

30

42

36

36

32

32

49

30

41

52

30

32

32

44

27

31

32

44

43

48

36

30

41

22

35

44

52

45

29

25

35

26

23

27

22

45

27

24

Performance Service Gender

Rating

85

80

75

100

90

70

100

90

100

80

100

95

100

90

80

90

55

80

85

70

95

65

65

75

70

95

80

95

95

90

60

95

90

80

90

75

95

95

90

90

8

7

5

16

16

12

8

9

10

7

19

22

2

12

8

4

3

11

1

16

13

6

6

9

4

2

7

9

5

18

4

4

9

2

4

3

2

11

6

2

0

0

1

0

0

0

1

1

0

1

1

0

1

1

1

0

1

1

0

1

0

1

1

1

0

1

0

1

0

0

1

0

0

0

1

1

1

0

1

0

Raise Degree Gender1

5.7

3.9

3.6

5.5

5.7

4.5

5.7

5.8

4

4.7

4.8

4.5

4.7

6

4.9

5.7

3

5.6

4.6

4.8

6.3

3.8

3.3

3.8

4

6.2

3.9

4.4

5.4

4.3

3.9

5.6

5.5

4.9

5.3

4.3

6.2

4.5

5.5

6.3

0

0

1

1

1

1

1

1

1

1

1

0

0

1

1

0

1

0

1

0

1

1

0

0

0

0

1

0

0

0

1

0

1

1

0

0

0

0

0

0

M

M

F

M

M

M

F

F

M

F

F

M

F

F

F

M

F

F

M

F

M

F

F

F

M

F

M

F

M

M

F

M

M

M

F

F

F

M

F

M

41

42

43

44

45

46

47

48

49

50

44.6

24.3

74.6

65.3

51.8

60.2

62.1

62.6

63.7

56.8

1.115

1.055

1.114

1.145

1.080

1.057

1.090

1.099

1.117

0.997

40

23

67

57

48

57

57

57

57

57

25

32

42

45

36

39

37

34

41

38

80

100

95

90

95

75

95

90

95

80

5

8

20

16

8

20

5

11

21

12

0

1

1

0

1

0

0

1

0

0

4.3

5.7

5.5

5.2

5.2

3.9

5.5

5.3

6.6

4.6

0

1

0

1

1

1

1

1

0

0

M

F

F

M

F

M

M

F

M

M

Grade

E

B

B

E

D

F

C

A

F

A

A

E

C

A

A

C

E

B

A

B

F

D

A

D

A

A

C

F

F

D

A

B

E

B

A

A

A

E

B

A

Do not manipuilate Data set on this page, copy

to another page to make changes

The ongoing question that the weekly assignments will focus on is: Are males and females paid the same

Note: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.

The column labels in the table mean:

ID – Employee sample number Salary – Salary in thousands

Age – Age in years

Performance Rating – Appraisal rating (employee evaluation scor

Service – Years of service (rounded)

Gender – 0 = male, 1 = female

Midpoint – salary grade midpointRaise – percent of last raise

Grade – job/pay grade

Degree (0= BSBA 1 = MS)

Gender1 (Male or Female)

Compa-ratio – salary divided by midpoint

C

A

F

E

D

E

E

E

E

E

Week 4: Identifying relationships – correlations and regression

To Ensure full credit for each question, you need to show how you got your results. This involves either showing wh

or showing the excel formula in each cell.

Be sure to copy the appropriate data columns from the data t

1

What is the correlation between and among the interval/ratio level variables with salary? (Do not include c

a. Create the correlation table.

i.

What is the data input ranged used for this question:

ii. Create a correlation table in cell K08.

b. Technically, we should perform a hypothesis testing on each correlation to determine

if it is significant or not. However, we can be faithful to the process and save some

time by finding the minimum correlation that would result in a two tail rejection of the null.

We can then compare each correlation to this value, and those exceeding it (in either a

positive or negative direction) can be considered statistically significant.

i. What is the t-value we would use to cut off the two tails?

T=

ii. What is the associated correlation value related to this t-value? r =

c. What variable(s) is(are) significantly correlated to salary?

d. Are there any surprises – correlations you though would be significant and are not, or non significant cor

e. Why does or does not this information help answer our equal pay question?

2

Perform a regression analysis using salary as the dependent variable and the variables used in Q1 along wi

our two dummy variables – gender and education. Show the result, and interpret your findings by answerin

Suggestion: Add the dummy variables values to the right of the last data columns used for Q1.

What is the multiple regression equation predicting/explaining salary using all of our possible variables ex

a.

What is the data input ranged used for this question:

b.

Step 1: State the appropriate hypothesis statements:

Ho:

Ha:

Step 2: Significance (Alpha):

Step 3: Test Statistic and test:

Why this test?

Step 4: Decision rule:

Step 5: Conduct the test – place test function in cell M34

Step 6: Conclusion and Interpretation

What is the p-value:

What is your decision: REJ or NOT reject the null?

Why?

What is your conclusion about the factors influencing

the population salary values?

c.

If we rejected the null hypothesis, we need to test the significance of each of the variable coeff

Step 1: State the appropriate coefficient hypothesis statements:

(Write a single pair, we

Ho:

Ha:

Step 2: Significance (Alpha):

Step 3: Test Statistic and test:

Why this test?

Step 4: Decision rule:

Step 5: Conduct the test

Note, in this case the test has been performed and is part of the Regression output a

Step 6: Conclusion and Interpretation

Place the t and p-values in the following table

Identify your decision on rejecting the null for each variable. If you reject the null,

Midpoint

Age

Perf. Rat. Seniority

Raise

Gender

t-value:

P-value:

Rejection Decision:

If Null is rejected, what is the

variable’s coefficient value?

Using the intercept coefficient and only the significant variables, what is the equation?

Salary =

d.

Is gender a significant factor in salary?

e.

Regardless of statistical significance, who gets paid more with all other things being equal?

f.

How do we know?

3

After considering the compa-ratio based results in the lectures and your salary based results, what else wou

before answering our question on equal pay? Why?

4

Between the lecture results and your results, what is your answer to the question

of equal pay for equal work for males and females? Why?

5

What does regression analysis show us about analyzing complex measures?

lves either showing where the data you used is located

olumns from the data tab to the right for your use this week.

lary? (Do not include compa-ratio in this question.)

Use Cell K08 for the Excel test outcome location.

, or non significant correlations you thought would be?

es used in Q1 along with

ur findings by answering the following questions.

r possible variables except compa-ratio?

Use Cell M34 for the Excel test outcome location.

ch of the variable coefficients.

Write a single pair, we will use it for each variable separately.)

he Regression output above.

If you reject the null, place the coefficient in the table.

Degree

s the equation?

things being equal?

d results, what else would you like to know

BUS308 Week 4 Lecture 1

Examining Relationships

Expected Outcomes

After reading this lecture, the student should be familiar with:

1.

2.

3.

4.

Issues around correlation

The basics of Correlation analysis

The basics of Linear Regression

The basics of the Multiple Regression

Overview

Often in our detective shows when the clues are not providing a clear answer – such as

we are seeing with the apparent continuing contradiction between the compa-ratio and salary

related results – we hear the line “maybe we need to look at this from a different viewpoint.”

That is what we will be doing this week.

Our investigation changes focus a bit this week. We started the class by finding ways to

describe and summarize data sets – finding measures of the center and dispersion of the data with

means, medians, standard deviations, ranges, etc. As interesting as these clues were, they did not

tell us all we needed to know to solve our question about equal work for equal pay. In fact, the

evidence was somewhat contradictory depending upon what measure we focused on. In Weeks 2

and 3, we changed our focus to asking questions about differences and how important different

sample outcomes were. We found that all differences were not important, and that for many

relatively small result differences we could safely ignore them for decision making purposes –

they were due to simple sampling (or chance) errors. We found that this idea of sampling error

could extend into work and individual performance outcomes observed over time; and that overreacting to such differences did not make much sense.

Now, in our continuing efforts to detect and uncover what the data is hiding from us, we

change focus again as we start to find out why something happened, what caused the data to act

as it did; rather than merely what happened (describing the data as we have been doing). This

week we move from examining differences to looking at relationships; that is, if some measure

changes does another measure change as well? And, if so, can we use this information to make

predictions and/or understand what underlies this common movement?

Our tools in doing this involve correlation, the measurement of how closely two

variables move together; and regression, an equation showing the impact of inputs on a final

output. A regression is similar to a recipe for a cake or other food dish; take a bit of this and

some of that, put them together, and we get our result.

Correlation

We have seen correlations a lot, and probably have even used them (formally or

informally). We know, for example, that all other things being equal; the more we eat. the more

we weigh. Kids, up to the early teens, grow taller the older they get. If we consistently speed,

we will get more speeding tickets than those who obey the speed limit. The more efforts we put

into studying, the better grades we get. All of these are examples of correlations.

Correlations exist in many forms. A somewhat specialized correlation was the Chi

Square contingency test (for multi-row, multi-column tables) we looked at last week, if we find

the distributions differ, then we say that the variables are related/correlated. This correlation

would run from 0 (no correlation) thru positive values (the larger the value the stronger the

relationship).

Probably the most commonly used correlation is the Pearson Correlation Coefficient,

symbolized by r. It measures the strength of the association – the extent to which measures change

together – between interval or ratio level measures. Excel’s Fx Correl, and the Data Analysis

Correlation both produce Pearson Correlations.

Most correlations that we are familiar with show both the direction (direct or inverse) as

well as the strength of the relationship, and run from -1.0 (a strong and perfect inverse

correlation) through 0 (a weak and non-existent correlation) to +1.0 (a strong an perfect direct

correlation). A direct correlation is positive; that is, both variables move in the same direction,

such as weight and height for kids. An inverse, or negative, correlation has variables moving in

different directions. For example, the number of hours you sleep and how tired you feel; the

more hours, the less tired while the fewer hours, the more tired.

The strength of a correlation is shown by the value (regardless of the sign). For example,

a correlation of +.78 is just as strong as a correlation of -.78; the only difference is the direction

of the change. If we graphed a +.78 correlation the data points would run from the lower left to

the upper right and somewhat cluster around a line we could draw thru the middle of the data

points. A graph of a -.78 correlation would have the data points starting in the upper left and run

down to the lower right. They would also cluster around a line.

Correlations below an absolute value (when we ignore the plus or minus sign) of around

.70 are generally not considered to be very strong. The reason for this is due to the coefficient of

determination(CD). This equals the square of the correlation and shows the amount of shared

variation between the two variables. Shared variation can be roughly considered the reason that

both variables move as they do when one changes. The more the shared variation, the more one

variable can be used to predict the other. If we square .70 we get .49, or about 50% of the

variation being shared. Anything less is too weak of a relationship to be of much help.

Students often feel that a correlation shows a “cause-and-effect” relationship; that is,

changes in one thing “cause” changes in the other variable. In some cases, this is true – height

and weight for pre-teens, weight and food consumption, etc. are all examples of possible causeand- effect relationships; but we can argue that even with these there are other variables that

might interfere with the outcomes. And, in research, we cannot say that one thing causes or

explains another without having a strong correlation present.

However, just as our favorite detectives find what they think is a cause for someone to

have committed the crime, only to find that the motive did not actually cause that person to

commit the crime; a correlation does not prove cause-and-effect. An example of this is the

example the author heard in a statistics class of a perfect +1.00 correlation found between the

barrels of rum imported into the New England region of the United States between the years of

1790 and 1820 and the number of churches built each year. If this correlation showed a causeand-effect relationship, what does it mean? Does rum drinking (the assumed result of importing

rum) cause churches to be built? Does the building of churches cause the population to drink

more rum?

As tempting as each of these explanations is, neither is reasonable – there is no theory or

justification to assume either is true. This is a spurious correlation – one caused by some other,

often unknown, factor. In this case, the culprit is population growth. During these years – many

years before Carrie Nation’s crusade against Demon Rum – rum was the common drink for

everyone. It was even served on the naval ships of most nations. And, as the population grew,

so did the need for more rum. At the same time, churches in the region could only hold so many

bodies (this was before mega-churches that held multiple services each Sunday); so, as the

population got too large to fit into the existing churches, new ones were needed.

At times, when a correlation makes no sense we can find an underlying variable fairly

easily with some thought. At other times, it is harder to figure out, and some experimentation is

needed. The site http://www.tylervigen.com/spurious-correlations is an interesting website

devoted to spurious correlations, take a look and see if you can explain them. 😊😊

Regression

Linear. Even if the correlation is spurious, we can often use the data in making

predictions until we understand what the correlation is really showing us. This is what

regression is all about. Earlier correlations between age, height, and even weight were

mentioned. In pediatrician offices, doctors will often have charts showing typical weights and

heights for children of different ages. These are the results of regressions, equations showing

relationships. For example (and these values are made up for this example), a child’s height

might be his/her initial height at birth plus and average growth of 3.5 inches per year. If the

average height of a newborn child is about 19 inches, then the linear regression would be:

Height = 19 inches plus 3.5 inches * age in years, or in math symbols:

Y = a + b*x, where y stands for height, a is the intercept or initial value at age 0

(immediate birth), b is the rate of growth per year, and x is the age in years.

In both cases, we would read and interpret it the same way: the expected height of a child is 19

inches plus 3.5 inches times its age. For a 12-year old, this would be 19 + 3.5*12 = 19 + 42 = 61

inches or 5 feet 1 inch (assuming the made-up numbers are accurate).

Multiple. That was an example of a linear regression having one output and a single,

independent variable as an input. A multiple regression equation is quite similar but has several

independent input variables. It could be considered to be similar to a recipe for a cake:

Cake = cake mix + 2* eggs + 1½ * cup milk + ½ * teaspoon vanilla + 2 tablespoons* butter.

A regression equation, either linear or multiple, shows us how “much” each factor is used in or

influences the outcome. The math format of the multiple regression equation is quite similar to

that of the linear regression, it just includes more variables:

Y = a + b1*X1 + b2*X2 + b3*X3 + …; where a is the intercept value when all the inputs

are 0, the various b’s are the coefficients that are multiplied by each variable value, and

the x’s are the values of each input.

A note on how to read the math symbols in the equations. The Y is considered the output or

result, and is often called the dependent variable as its value depends on the other factors. The

different b’s (b1, b2, etc.) are coefficients and read b-sub-1, b-sub-2, etc. The subscripts 1, 2, etc.

are used to indicate the different coefficient values that are related to each of the input variables.

The X-sub-1, X-sub-2, etc., are the different variables used to influence the output, and are called

independent variables. In the recipe example, Y would be the quality of the cake, a would be the

cake mix (a constant as we use all of what is in the box), the other ingredients would relate to the

b*X terms. The 2*eggs would relate to b1*X1, where b1 would equal 2 and X1 stands for eggs,

the second input relates to the milk, etc.

Summary

This week we changed our focus from examining differences to looking for relationships

– do variables change in predictable ways. Correlation lets us see both the strength and the

direction of change for two variables. Regression allows us to see how some variables “drive” or

explain the change in another.

Pearson’s (for interval and ratio data variables) and Spearman’s (for rank ordered or

ordinal data variables) are the two most commonly used correlation coefficients. Each looks at

how a pair of variables moves in predictable patterns – either both increasing together or one

increasing as the other decreases. The correlation ranges from -1.00 (moving in opposite

directions) to +1.00 (moving in the same direction). These are both examples of linear

correlation – how closely the variables move in a straight line (if graphed). Curvilinear

corrections exist but are not covered in this class.

Regression equations show the relationship between independent (input) variables and a

dependent (output variables). Linear regression involves a pair of variables as seen in the linear

correlations. Multiple regression uses several input (independent) variables for a single output

(dependent) variable.

The basic form of the regression equation is the same for both linear and multiple

regression equations. The only difference is in the number of inputs used. The multiple

regression equation general form is:

Y = Intercept + coefficient1 * variable1 + coefficient2 * variable2 + etc. or

Y = A + b1*X1 + b2*X2 + …; where A is the intercept value, b is a coefficient value, and

X is the name of a variable, and the subscripts identify different variables.

Summary

This week we changed focus from examining differences to examining relationships –

how variables might move in predictable patterns. This, we found, can be done with either

correlations or regression equations.

Correlations measure both the strength (the value of the correlation) and the direction (the

sign) of the relationship. We looked at the Pearson Correlation (for interval and ratio level data)

and the Spearman’s Rank Order Corr …

Purchase answer to see full

attachment