Survey Analysis

# ⚗️ 4. Analysis Methodology

## Tests Used

We used several tests in the analysis of this. Explanations and caveats for each are in the following sections:

"Which questions generally had higher/lower scores than the others?"

* **Mean / Median** value for each question (for everyone together, no segmentation)

"Do answers differ by segment?"

* "Do answers differ between the two groups in this segment?" (like "One/Many Roles") -> **Spearman's Rho**

* "Do answers differ along the spectrum of values in this segment?" (like "Jarvus Tenure") -> **Spearman's Rho**

* "Do answers differ between the three+ groups in this segment?" (like "Main Role") -> **Kruskal-Wallace**

### Mean vs Median

There's a frequent data science debate about whether to use mean or median for Likert scale data. Generally, ordinal data like the 1-7 scale should use median, but mean can be "acceptable" when the data shows a normal distribution.

Our goal here with using mean/median was to figure out where to dig in more deeply. When we were exploring the data, we plotted questions by both mean and median. Mean ended up surfacing more interesting questions to investigate -- more questions that had more "Strongly Disagree" responses, or a wider spread overall.

### Spearman's Rho

[Spearman's Rho](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient), or Spearman's Rank Correlation Coefficient tests to see if two variables correlate. This is similar to the r^2 regression test you may have done in school, except we don't assume that it's linear and instead use their ranks (like 1-7 in our Likert scales). This gives us a [p-value](https://en.wikipedia.org/wiki/P-value) and a correlation coefficient. For our purposes, we only looked at the p value to see which ones were significantly different.

We used this on segments that correlate the dependent variable (question) to a segment that has only two groups:

One or Many Roles (2 bins)
One or Many Projects / Teams (2 bins)
Under-represented (two bins, as a binary)

We also used this test on segments that have a spectrum:

Jarvus Tenure (three bins, as a spectrum)
Under-represented (a 1-7 spectrum)
"I identify as someone under-represented in tech." (1-7) as a segment
"Jarvus Tenure" (since the answers do go on a scale of time, it's a continuum)

### Kruskal Wallace H Test

[Kruskal-Wallace one-way analysis of variance](https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance) tests to see if a dependent variable (like answer question) varies across several groups. This is basically a [one-way ANOVA](https://en.wikipedia.org/wiki/One-way_analysis_of_variance) for ranked data, answering the question "are ANY of these groups different from the others?".

We used this to test correlation for segments with multiple groups, including:

Main Role (3+ bins)
Main Project / Team (3+ bins)

## Why aren't all possible graphs shown in the report?

If we share every segmentation graph for every question we have, it would break anonymity -- someone who has a less common response could be identified by comparing them.

Imagine on a survey question that only one person answers "Strongly Disagree" and if we could see it across many segments: Main Role, Main Project, Jarvus Tenure, and Under-represented. That would violate anonymity at that point.

It's also a LOT of charts, and most of them don't contain too much that's actionable. Instead of sharing them all, we have just shared the most interesting ones.

## Colophon

Data was collected using a Google Form.

This report was generated using JupyterLab (python, numpy, pandas, matplotlib, the likert_plot package). The report was rendered using [quarto](https://quarto.org/) and pandoc.

For colors, this report uses [the viridis color scale (colorblind-friendly)](https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html), and [Casey's rainbow color palette](https://coolors.co/c2c3eb-dfa09f-dbc394-d1da90-94db9d-79c1d2-ae90da-dd98bf-6e96cf), made via [colordot](https://color.hailpixel.com/). This report uses the font Quicksand for headers and the default system font for the body text.