Using crosstabs in survey data analysis

A common method for researchers to perform quantitative analysis of the relationship between two or more variables is cross-tabulation. Cross-tabulation, or crosstab (also abbreviated as “x-tabs”), is a matrix table typically used in multivariate market research and/or statistical analysis to analyze the relationship between variables in two or more categories. 

Crosstabs in research analytics

Cross-tabulation is also known as simply “Crosstab”, but can also refer to a “contingency table” or “cross break.” They are a matrix-style table that allows the observation of one variable’s dependency on another and are commonly used to analyze relationships between categorical data, such as age, region, or frequency of an action. 

Even if you’re new to market research, you’ve likely encountered a cross-tabulation before. You can create them as pivot tables in Excel, or develop crosstab formats in popular analytics programs and most major market research tools. They have become an industry standard in for processing many variables at once to determine conditional relationships between them. 

The power of crosstabs

What makes this method especially powerful for researchers is how variables are grouped into sub-categories to see how a dependent variable changes the results. 

Reduces confusion

Raw data can be tough to interpret for even the most advanced researcher. When data is displayed in a crosstab matrix, researchers can make sense of the relationships more easily. Crosstabs present a clearer interpretation of the data by showing percentages and frequencies that may change when contrasted with variables in other categories.

Offers a deeper look

Because of the depth offered by crosstabs, researchers can uncover relationships between variables or segments that may have been missed if presented in another format. Layers of data can be correlated, rather than just one or two categories due to the variable grouping.

Opportunities to involve the whole team

Manual statistical analysis can take a while and introduces an increased possibility of human error. For those who don’t have an advanced statistical analysis background, analyzing results can be intimidating and easily confused. The use of crosstabs structures this data into a more digestible format at the beginning of the analytics process. It provides an efficient layout for research professionals as well as a more actionable view for members of the team who lack advanced data analytics backgrounds.

Crosstab example

For example, in a survey about online habits, a variable might be how often a respondent shares content on social media. Another variable might be whether or not they read content that covers certain topics. 

Crosstabs show correlation between multiple layers of data.

Using a crosstab to correlate data from those answers, a researcher can see the number of respondents (count) that fell into each category based on the answer given. Crosstabs offer a level deeper than simply stating that those who read business and finance news are likely to share educational content, and instead are able to uncover that those who are reading Business and Finance news the most often are also the most likely to be sharing educational content the most often on social media.

Terms used in crosstabs

Crosstabs come with their own vocabulary specific to the layout. Understanding the specific language and elements used in crosstabs can be helpful for researchers looking to get more out of their analytics.

  • Banners (or Cuts): The headers that name the categories of the data displayed by a column
  • Categories: The way that the variables are grouped (for example, respondents who are “women” or  who “mostly agree with the statement ‘salads are healthy.’”)
  • Columns: The cells that display data vertically
  • Column-Percentage: A view of data that calculates the column data belonging to a particular row.
  • Count (or Frequency): The total number of responses that fall into a row and column
  • Crosstabs: Otherwise known as “cross-tabulation,” “data tabulation,” “cross break,” or contingency table” is the name of the table that researchers use to analyze categorical data.
  • Fisher’s Exact Test: Another test for statistical significance that uses an exact deviation from the null value (rather than an approximation). Because of the exactness of the p-value, this is recommended for smaller sample populations (although can be applied to samples of all sizes)
  • G-test: a test for determining a statistically significant likelihood of a variable’s dependence on another, sometimes considered to be more efficient than chi-square testing.
  • Pearson’s chi-square test: a test for determining the statistical significance of a cross-tabulation by determining if the variables being compared are independent. It is a measure of how actual data compares to expected data.
  • Percentage: The percentage of all responses that fall into a given row and column
  • Rows: The cells that display data horizontally
  • Row-Percentage: A view of data that calculates the row data belonging to a particular column.
  • Stubs: The headers that name the categories of the data displayed by a row

For those looking to analyze their data using a crosstab, we suggest creating a Pollfish survey and exporting your results directly from the platform into this format. You can learn how on our post for using crosstabs in Pollfish.