Who owns big data? This is the important question posed by Evelyn Ruppert in a recent short article which details how big data are:
the product of different actors and technologies involved in its generation (digital platforms, mobile devices, sensors, sequencers), formatting (cleaned, linked, packaged, stored, curated) and analysis (mined, visualised, correlated).
The actors involved in these practices and processes can in many ways be seen to ‘own’ big data. As a consequence, big questions need to be asked about the ownership of the insights that come from big data as they are extracted from the everyday traces people leave as they interact with one another and transact with services digitally.
In the field of education, this makes it essential to consider the more specific question of ‘who owns educational big data?’ I say this in the context of some recent research I’ve been doing that focuses on Pearson plc, the world’s largest commercial educational publisher. Pearson is positioning itself as a major actor in the generation and analysis of educational big data.
I recently published a research article in the European Educational Research Journal on what I described as Pearson’s ‘digital methods.’ The research tried to identify some of the many research methods that Pearson is using to make sense of education, and specifically looked into the the data science methods used by Pearson’s Centre for Digital Data, Analytics and Adaptive Learning.
My argument was that Pearson is becoming a methodological gatekeeper with the capacity to carry out new forms of educational research using large-scale datasets, big data and data science methods. I suggested that Pearson’s Centre for Digital Data, Analytics and Adaptive Learning is a seriously-resourced commercial laboratory for educational research and knowledge production that challenges the existing methods, knowledge and theories of educational sociology, philosophy and psychology.
Pearson invited me to contribute a short blog post summarizing the key headings from the paper, which I produced under the title Educational data, Pearson and the ‘theory gap’, to which Pearson produced an accompanying response entitled Why the world’s leading learning company has to love data.
One of the things I noted was that Pearson’s senior data analysts at the Centre for Digital Data, Analytics and Adaptive Learning are talking about an emerging ‘theory gap between the dramatic increase in data-based results and the theory base to integrate them.’ The contention here is that ‘the billions of bits of digital data generated by students’ interactions with online lessons as well as everyday digital activities’ are revealing the need for new theories of learning that have not been conceptualized in previous research.
I also suggested that Pearson clearly sees for itself a key role in closing that theory gap, using its significant commercial resources to conduct big data analyses on the data generated from its e-learning products at vast scale. In short, Pearson is positioning itself to re-theorize learning through the explanatory lens of big data. It’s applying a kind of big data imaginary to the analysis and conceptualization of learning which assumes that massive quantities of data can reveal truthful and meaningful patterns about the reality they’re taken from–though in their response Pearson’s analysts were careful to differentiate their approach from more crude forms of decontextualized data analytics.
For me, this raises an issue that’s not explicitly explored in either my piece or in Pearson’s detailed response to it. It’s an issue that returns me to the question of ‘who owns big data?’ For if we can now understand Pearson to ‘own’ a significant chunk of the big data produced about education and learning, and if Pearson is intending to use big data to both identify and fill a theory gap in existing understandings of learning processes, then might it be seen to ‘own’ educational theory?
Few education departments in universities have the big data infrastructure to conduct the kinds of advanced data scientific studies that Pearson is able to do (Stanford University’s recent dedication to learning analytics is a notable exception here, but Stanford has a long-standing synergy with Silicon Valley, which is where the big data imaginary is socially, culturally, politically and economically located). This means that as big data gains credibility as the source for educational knowledge production and theorizing, it is likely that legitimacy will flow towards those centres able to conduct such analyses.
In other words, there’s a political economy dimension to educational theorizing as it seems to be migrating towards well-resourced commercial research centres like those of Pearson. How ‘learning’ is theorized looks increasingly to be led by for-profit actors with the in-house expertise and technical capacity to generate insights from big data, who might then stand to gain commercially by designing and patenting e-learning software resources on the basis of the theories they’ve generated–essentially a case of locking-in a theory to a specific technical innovation. Audrey Watters suggests that the technological future of education is one in which software patents become the educational theory:
This version of the future does not guarantee that these companies have developed technologies that will help students learn. But it might mean that there will be proprietary assets to litigate over, to negotiate with, and to sell.
As the Pearson experts from the Center for Digital Data, Analytics and Adaptive Learning put it in their response to my blog post, they are motivated by a ‘theory of action’:
Better data analysis → better understanding of students’ attributes/curriculum/learning trajectories → better instructional decisions → improved learner outcomes.
By using better data analysis techniques applied to data captured from better designed activities, we hope to build more complete and accurate models of learners’ knowledge, skills, and attributes that will provide better information to teachers and learners and provide systems that are relevant to each student’s individual proficiency levels, interests, and current states.
In education departments we are used to tracing the provenance of educational theories to their original thinkers. As a big data imaginary increasingly infuses educational thinking and research, and educational analyses are increasingly performed by profit-making companies with the relevant big data infrastructure, might we need to address the question of who owns educational theory?