Professor David Williamson Shaffer – Transforming Big Data into Meaningful Insights: Introducing Quantitative Ethnography

Jan 9, 2018Education & Training, Engineering & Computer Science

In the information age, humans produce data at an extraordinary rate, offering social scientists an opportunity to study our behaviour in a manner unprecedented in human history. In his new book Quantitative Ethnography, learning scientist Professor David Williamson Shaffer at the University of Wisconsin-Madison describes a novel theory and set of techniques for merging quantitative and qualitative analyses to discover meaningful patterns in big data.


Data in the Information Age

The internet and mobile computing have led to the largest shift in human record keeping since the advent of printed text. One hundred years ago, the first popular use cameras were just beginning to make their way into the average American family’s home. Today, nearly 4.5 billion people carry mobile phones. Nearly a third of the world’s population now carries a camera at all times and takes at least one digital photo every day. The average internet-connected person produces a gigabyte of data daily, close to two thousand books worth of information.

Data that would once have taken multiple lifetimes to compile is collected almost without effort as we communicate via email and text, entertain ourselves with online games, consume information on websites and engage with social media platforms. Many of us are not even aware of most of the digital records we create, such as credit card transactions, GPS pings from phones and vehicles as we move around and security camera footage. Every two days, we collectively generate as much information as was recorded in all of human history prior to the emergence of the internet – and this rate doubles approximately every two years.

Yet despite sensational news stories about the power of big data – the retail giant Target, for example, famously predicted which customers were pregnant by their shopping habits – making sense of the enormous volume of data collected is increasingly difficult. While computers can find patterns in big data, they cannot distinguish meaningful patterns from the random associations common in any sufficiently large dataset. And as the amount of data produced grows ever larger, the volume of meaningless patterns gets larger as well.

Our digital universe is, quite literally, growing at an exponential rate, and scientists are struggling to adapt classic methods of analysis to make sense of this bounty of digital data. Classical statistical techniques were not designed to handle massive quantities of data, and no human could read, let alone analyse qualitatively, even a fraction of most large datasets.

Over the last two decades, data mining techniques have emerged that use computer algorithms to identify patterns in human behavioural data, but they seldom provide meaningful context for the patterns they find. To see the problem, imagine a computer program examining the behaviour of chess players. The program will easily identify that advanced players typically move much more quickly than beginners. However, this pattern tells us nothing useful about how to play chess well. Indeed, the very worst advice you could give to a new chess player is, ‘Just move your pieces faster!’

Professor David Williamson Shaffer at the University of Wisconsin-Madison has built his career studying how we learn and process information. He was disappointed that even though we have more and more data about what learners do in classes and online, the limitations of existing big data methods do not do a good job of providing meaningful insights that shed light on the processes of learning and improve student outcomes. To address this problem, he developed quantitative ethnography, a novel approach to big data that combines quantitative and qualitative methods to make sense of complex phenomena in big data.





A Marriage of Methodologies

While both quantitative and qualitative methods benefit from collecting large amounts of data, they are difficult to merge because they have fundamentally different goals and strengths. Quantitative analysis is most powerful when it uses data about a large number of individuals to support general claims about a population.

Qualitative analysis is strongest when it uses a large amount of data about a small number of individuals to generate deep and meaningful insights about a small set of cases. Where quantitative findings tend to be shallow but broadly applicable, qualitative findings tend to be detailed but narrowly focused. Both are essential to a complete understanding of human behaviour, but combining them has historically proven difficult.

The rise of big data fundamentally changes this landscape. We can easily collect a large amount of data about large numbers of individuals, and the question is, how do we use that data to generate meaningful insights.

In recent years, many social scientists have championed mixed methods strategies involving both approaches. However, this typically involves quantitative and qualitative studies run in parallel or in sequence, with the hope that the results from one will inform the other. While the final results include both methods, the methods themselves are often employed separately. With quantitative ethnography, Professor Shaffer instead offers researchers a strategy to harness the power of big data to truly merge quantitative and qualitative approaches.

To appreciate the significance of quantitative ethnography, it is important to first understand the basic principles of quantitative and qualitative methods. Quantitative methods employ statistical analysis so that researchers may acquire data from a subset of the population and use it describe a general feature of the full population. It relies on scientists obtaining a representative sample with similar characteristics to the larger population they hope to describe.

The advantage of quantitative approaches is that they provide support for claims that something observed in a sample is generalisable to the population from which the sample was taken. Statistical techniques do this by distinguishing between ‘true’ characteristics of the population and the normal random variations among the individuals within it. This often requires relatively large samples, which helps to ensure that the random variations cancel out and the systematic effects – the true characteristics – can be identified.

Ethnography, on the other hand, is a qualitative field that aims to describe and understand the structures of societies and cultures. In contrast to the large, impersonal sample sizes of quantitative analysis, ethnographic methods rely on in-depth observations of small groups of people, with the goal of creating ‘thick’ descriptions of why this particular group of people do the things they do.

Ethnographers gather information to create a portrait of people’s lived experiences that puts their activities in context and lets researchers make sense of their behaviour. The object is to identify the connecting causes between behaviours that explain how and why something happens a particular way in a particular context, not to make claims about the broader population. To make sure their analyses are sufficiently robust, or ‘thick’, qualitative researchers continue collecting data until they reach theoretical saturation, a point at which new observations act to confirm existing hypotheses rather than providing additional insights.

Successfully merging these two approaches requires a method for providing ‘thick’ descriptions of a large population, using both the general claims of quantitative methods and the deep insights of qualitative methods.

Finding Meaning in Chaos

Professor Shaffer argues that big data provides a unique opportunity for advancing social sciences that we have only begun to explore. In his own field of education, more data about learning is now available than ever before – data from large-scale online courses, educational games and simulations, computer-based tests, and other digital learning programs and tools. Big data offers a unique set of both the massive numbers of individual participants required for statistical analysis, and the incredible depth of information about each one that allows for qualitative analysis.

Professor Shaffer’s new book, Quantitative Ethnography, shows that the incredible amount of data available makes it possible to incorporate qualitative and quantitative analysis within the same conceptual framework. Quantitative Ethnography begins with the idea that learning is a process of becoming part of a culture, and that the manifestation of a culture is a Discourse, the patterns of communication and interactions characteristic of that culture.

However, Discourse (with a big D) is not observable. What can be observed is discourse (with a little d), the actual things that people say and do and the many ways that they act and interact. To understand a culture of learning, researchers need to find a way to go from discourse to Discourse. To take the specific things that students said and did and work out their meanings.

A key part of that process is coding. Codes (with a big C) provide a system for interpreting parts of a Discourse, or for understanding what things mean within a culture. Creating a culture (Enculturation) therefore requires learning the right Codes. For example, when archaeologists analyse soil, they describe its characteristics – things like colour and texture – in consistent terms that any other archaeologist will easily understand. To make their descriptions consistent, archaeologists use tools like a Munsell Colour Chart to measure and describe soil.

Things like a Munsell Colour Chart and the texture of soil are Codes in the archaeologists’ culture that are different from the Codes that farmers would use to describe the same patch of ground. For one thing, most farmers do not use a Munsell Colour Chart. Both the culture of archaeologists and the culture of farmers have consistent Codes for describing soil, because soil is important to each culture.

The study of learning means identifying the Codes within patterns of discourse, and placing them into the context of the culture being studied. But as with big D Discourse, big C Codes are not directly observable. Researchers need to identify codes (with a little c). The things people say or do that provide evidence for the Codes they are using.

Critically, however, simply identifying the Codes in a Discourse does not provide a ‘thick’ description of a culture. To understand why people, act the way we do, we also need to understand how the Codes are related to one another.

For example, a Munsell Colour Chart is meaningful to an archaeologist because it is a way to systematically record the colour of soil – as well as a whole set of practices about how to select, grade, and record soil samples. In this sense, soil and Munsell Colour Charts and other Codes from the Discourse of archaeology are related to one another. And learning to become an archaeologist, in turn, means learning these Codes, and learning how they are related to one another.

Professor Shaffer and his team have also developed a set of tools and techniques for quantitative ethnography – a set of methods for merging statistical and ethnographic analyses. This toolkit includes an advanced statistical tool called Epistemic Network Analysis, or ENA, a network analysis tool used to model how Codes are related to one another in a set of data. ENA models visualise a system of connected Codes, and let researchers use statistical methods to quantify and test the differences between them. Tools like ENA let researchers test whether a description of a group of people is theoretically saturated.

In the context of learning research, quantitative ethnography provides an approach that allows researchers to identify meaningful differences in enculturation among different groups of learners. Classical statistical methods allow researchers to determine whether learning tools are effective for students, but quantitative ethnography allows researchers to discover why – and to show teachers and parents and other educators how their own students are thinking about a subject and what connections between ideas the students still need to make.

Professor Shaffer’s methods add a new layer of understanding that takes researchers beyond just describing whether a curriculum or online learning system works, and into a realm where they can identify critical components of learning that shape the success of a program.

The Future of Big Data & Learning

Technological advances over the past two decades have revolutionised how we as human beings consume and produce data. These changes challenge researchers to find new ways to meaningfully analyse the enormous amount of data that is now available.

Professor Shaffer’s quantitative ethnography gives scientists the analytical tools to both describe learning and understand how and why learning occurs. By merging tools for testing statistical significance with techniques to generate deep understanding, the methods described in Quantitative Ethnography allow researchers to illuminate new paths forward in teaching, learning and in understanding the how and why of human behaviour.

Meet the researcher

Professor David Williamson Shaffer, Ph.D.
Department of Educational Psychology
University of Wisconsin-Madison
Madison, WI


Professor David Williamson Shaffer is an internationally recognised expert on teaching and assessing modern skills through educational games. He is best known for his work using quantitative ethnography to measure complex thinking, and for the development of Virtual Internships that teach and assess students in high school, college and corporate training. Professor Shaffer completed his M.S. and Ph.D. at the Media Laboratory at the Massachusetts Institute of Technology, and has taught in the Technology and Education Program at the Harvard Graduate School of Education. He is currently the Vilas Distinguished Professor of Learning Sciences at the University of Wisconsin-Madison, the Obel Foundation Professor of Learning Analytics at Aalborg University in Copenhagen, and a Data Philosopher at the Wisconsin Center for Education Research. Professor Shaffer served in the United States Peace Corps and was a 2008-2009 European Union Marie Curie Fellow. He is the author of How Computer Games Help Children Learn and his most recent book is Quantitative Ethnography, an introduction to the new science of studying the human side of big data.


T: (+1) 608 265 4602


James Gee, Arizona State University

Arthur Graesser, University of Memphis

Morten Misfeldt, Aalborg University

Dragan Gašević, University of Edinburgh


U.S. National Science Foundation


DW Shaffer, Quantitative ethnography, Madison, Wis: Cathcart Press, 2017, Print and digital.