What is datafication? And how does it affect education? These questions were put to me ahead of conference discussion panel recently. While writing a few notes, it quickly became apparent I needed some categories to sort out my thinking. In simple terms, datafication can be said to refer to ways of seeing, understanding and engaging with the world through digital data. This definition draws attention to how data makes things visible, knowable, and explainable, and thus amenable to some form of action or intervention. However, to be a bit more specific, there are at least ten ways of defining datafication.
Datafication as we know it today has a long history, going back at least as far as the industrial revolution and efforts then to capture statistical knowledge of the state, society and its population, and then to use that knowledge to come up with better institutions and practices of management and intervention. David Beer offers a really good historical view of the historical evolution of ‘metric power.‘
In terms of education, Michel Foucault of course articulated how children could be counted in terms of their development, knowledge, behaviour, progress, worth, cleanliness, age, social class and character, in order that they could then be ranked, supervised and disciplined more effectively. He called schools and classrooms ‘learning machines’. Within education policy, Martin Lawn and others have charted the historical rise of data in education systems. These authors have shown, for example, how the nineteenth century Great Expositions became carefully stage-managed presentations of different states’ educational performance rates, so allowing different national systems to be compared for their effectiveness in producing the labour required for social and economic progress. These early historical developments in the datafication of education have slowly given rise to the ‘global race’ that we still see in education policy today, driven by comparative analysis of performance in large scale assessments (LSAs).
Although there are clear continuities from the past to the present, the current version of datafication through ‘big data’ also represents a bit of a rupture with the past. The assessment data that dominates LSAs is sampled, collected at long temporal intervals, and slow to collect. New digital datafication technologies such as ‘learning analytics,’ by contrast, harvest data in real-time as students complete tasks, enable high-speed automated analysis and feedback or adaptivity, and can capture data from all participants rather than a sample. They also allow individuals to be compared against each other and with aggregated norms calculated in massive datasets, rather than the broad-brush comparison of national systems enabled by LSAs.
In technical terms, datafication is a process of transforming diverse processes, qualities, actions and phenomena into forms that are machine-readable by digital technologies. Datafication allows things, relationships, events, processes to be examined for patterns and insights, often today using technical processes such as data analytics and machine learning which rely on complex algorithms to join up and make sense out of thousands or millions of individual data points. The technical language of datafication can get quite bewildering, proliferating to include technical concepts and methods which are even being modelled to some degree on human processes–so-called ‘cognitive’ computing, deep ‘learning’, and ‘neural’ networks.
Thinking educationally, it’s intriguing that much of the language associated with digital datafication refers to learning, training and neural processes of cognition. Datafication relies to a significant technical degree on ‘learning machines’. Algorithms have to be ‘taught’, using ‘training sets’ of past data to determine how to act when put ‘into the wild’ to process live and less structured data. This can be done through ‘supervised learning’, which sounds rather like direct instruction, or through ‘unsupervised learning,’ which is more like autodidactic learning through experience. DeepMind’s AlphaGo Zero–a highly advanced AI program for unsupervised learning–for example, learns purely from its own experience and from a ‘self-reinforcement learning algorithm’ that rewards it for every ‘success’ it experiences. BF Skinner’s famous behaviourist ‘teaching machines’ have been encoded in algorithmic form.
Also, in the technical sense datafication relies on the material infrastructure of hardware, software, servers, cables, connectors, microprocessors—all of the ‘stuff of bits’, as Paul Dourish has argued, that has to be assembled together in order to generate data. The materialities of datafication significantly shape how data are generated and how they can be put to use.
Thinking epistemologically about datafication concerns what we can know from data. For some, datafication rests on the assumption that the patterns and relationships contained within datasets inherently produce meaningful, objective and insightful knowledge about complex phenomena. As Rob Kitchin has shown, this empiricist epistemology assumes that through the ‘application of agnostic data analytics the data can speak for themselves free of human bias or framing.’
For critics, however, this empiricist epistemology is flawed because all data are always framed and sampled; data are not simply natural and essential elements that are abstracted from the world in neutral and objective ways to be accepted at face value. As Nathan Jurgenson has put it, data do not provide a ‘view from nowhere‘ because factors such as algorithms, databases, and venture capital pre-format data and so shape what may be seen or known. Data don’t tell the unbiased ‘truth’ because the data points captured and analysed are always affected by the choices of the original designers. Making sense of data is also always framed–data are examined through a particular lens that inﬂuences how they are interpreted. Jose van Dijck has described an epistemological ‘data-ist’ trust in the numbers provided through datafication.
Epistemology in this sense extends to include a methodological definition of datafication. Datafication is a process of employing certain data scientific methods to produce, analyse and circulate data. These methods have their own social origins–or ‘social lives’ as John Law claims–and derive from and reproduce the particular epistemological assumptions of the expert groups that created them. Datafication, in other words, is epistemological and methodological.
Datafication also raises ontological questions about what data really are. One view is that data are simply ‘out there’ waiting for collection, as supposed in the term ‘raw data’ and in the view that ‘data speak for themselves’. The other, more common in contemporary social science, is that data are inseparable from the software and knowledge employed to produce them. Data do not simply represent the reality of the world independent from human thought but are constructions about the world. These insights into the ontology of data are often associated with sociological theories of science, technology, statistics and economics–Sheila Jasanoff pulls these strands together in a recent article on ‘data assemblages’.
Moreover, however, data have consequences and shape individual actions, experiences, decisions and choices. In that sense, they shape and change reality; they have ontological consequences and partake in making up reality. So, ontologically, datafication is a product of the social world and of specific practices, but it also acts upon the world and on other practices, changing them in various ways. For example, Marion Fourcade has shown how the statistical practices of economists, of ‘ever-finer precision in measurement and mathematics … have constructed a wholly separate and artificial reality,’ a ‘make believe substitution’ that is entirely made out of historical and disciplinary conventions, ‘nothing more’. Yet, as Fourcade adds, if you change the statistical convention, ‘the picture of economic reality changes too’, sometimes with dramatic real-world results. As such, datafication is ontological because it has the potential to produce or perform different versions of reality–what actor-network theorists call ‘ontological politics’.
Datafication is accomplished by social actors, organizations, institutions and practices. So today we have data scientists, data analysts, algorithm designers, analytics engineers and so on all bringing their expertise to the examination of data of all kinds. These people or experts are housed in businesses, governments, philanthropies, social media firms, financial institutions, which have their own objectives, business plans, projects and so on, which frame how and why digital data are captured and processed. In this sense, datafication can be defined socially because it is always socially situated in specific settings and framed by socially-located viewpoints.
In education, we have ‘education data scientists’ and learning analytics practitioners, engineers and vendors of personalized learning platforms, even entrepreneurs of artificial intelligence in education, all now bringing their own particular forms of expertise to the examination and understanding of learning processes, teaching practices, schools, universities and educational systems. They are supported by funding streams from venture capital firms, philanthropic donations from wealthy technology entrepreneurs, impact investment programs which all direct financial resources to the datafication of education. Putting it super-simply, datafication exists because people and institutions of society make it so.
Moreover, datafication needs to be defined socially because much data is captured from the social world—people, institutions, behaviours and the full range of societal phenomena are the stuff of data. As Geoffrey Bowker has memorably put it, ‘if you are not data, you do not exist’! People are data; societies are the data. Even more consequentially, these social data can be used to reshape social behaviours. Bowker adds that as data about people are stored in thousands of virtual locations, reworked and processed by algorithms, their ‘possibilities for action are being shaped’.
The new actors undertaking datafication are invested with a certain form of data power. Expert authority, as William Davies argues, increasingly resides with those who can work with complex data systems to generate analyses, and then narrate the results to the public, the media and policymakers. This is why governments are increasingly interested in capturing the digital traces and datastreams of citizens’ activities. By knowing much more about what people do, how they behave, how they respond to events or to policies, it becomes possible to generate predictions and forecasts about best possible courses of action, and then to intervene to either pre-empt how people behave or prompt them to behave in a certain way. For example, there’s a whole ‘Data for Policy’ movement and new funding streams for ‘GovTech’ applications in the UK to realize the potential of ‘Government by Algorithm’. Evelyn Ruppert and colleagues have termed this ‘data politics’ and note that power over data no longer only belongs to bureaucracies of state, but to a constellation of new actors in different sectoral positions.
Something of an arms race is underway by those organizations that want to attain data power in education. Education businesses like Pearson are putting large financial, material and human resources into technologies of datafication, and are seeking both to make it commercially profitable and also attractive to policymakers as a source of intelligence into learning processes. Dorothea Anagnostopoulos and colleagues have written about the ‘informatic power’ possessed by the organizations and technologies involved in processing test-based data. But some of that power is now being assumed by those actors, organizations and analytics technologies that process digital learning data and turn it into actionable intelligence and adaptive, personalized prescriptions for pedagogic intervention.
This definition draws attention to datafication as a cultural phenomenon and as a concept that has attained a privileged position in the view of the public, businesses, governments and the media. Increasingly, it seems, data and algorithms are invested with promises of objectivity and impartiality, at a time when human experts are not necessarily to be trusted because they’re too clouded by subjective opinion, bias and partiality. An article in the Silicon Valley ed-tech magazine EdSurge effectively represented how the objectivity of data has been culturally adopted and accepted in some parts of the education sector. It claimed teachers are unable to recognize how well students are engaging with their own learning because the teachers are too subjectively biased. This speaks to a cultural narrative which frames datafication in terms of mechanical objectivity, certainty, impartiality.
But the cultural acceptance or otherwise of datafication is of course context-specific. In some European countries such as Germany the cultural narrative of datafication and algorithms is more contested, and perhaps legally and politically inflected. It would be interesting to tease out how datafication in general and datafication of education in particular becomes culturally embedded or not in different geographical, political and social locations. So for example, datafication in education may appear to be a largely Anglophone phenomenon. Recently, however, a new report on ‘Learning Analytics for the Global South’ appeared which considered ‘how the collection, analysis, and use of data about learners and their contexts have the potential to broaden access to quality education and improve the efficiency of educational processes and systems in developing countries around the world’. Datafication of education is becoming culturally sensitive.
Datafication is the subject of breathless utopian fantasies of real-time responsive smart cities, global Internet of Things, human-machine symbiosis, algorithmic certainty, hyperpersonalized services, driverless cars and so on—a world plastered with a new shiny surface of machine-readable data, which acts as a fuel for an automated, responsive, personalized environment which constantly moulds itself around us.
Education is affected by the same fantasies and utopian imaginaries. At last year’s British Science Festival, Sir Anthony Seldon, Master of Wellington College and VC of the University of Buckingham, presented a picture of a robotized future of schools, with ‘extraordinarily inspirational’ machines completely personalizing the education journey, ‘adaptive machines that adapt to the individual,’ that ‘listen to the voices of the learners, read their faces and study them in the way gifted teachers study their students,’ know what ‘excites’ learners and can ‘light up the brain’ through ‘intellectual excitement.’
Datafication is in this sense the subject of imagination–but imaginary visions can sometimes catalyse real-world applications, with powerful visionaries gathering coalitions of support to make reality conform with their utopian ideals. SIlicon Valley entrepreneurs have animated their visions of data-driven education through the capture or donation of funding and engineering teams, for example.
In contrast to the utopian imagination, datafication is also a great source of anxiety. Concerns circulate about gross privacy invasion, panoptic dataveillance, data bias against ethnic groups, the manipulation of behaviours through persuasive design, viral spread of computational propaganda powered by data-driven profiling and targeting, information war, data breaches, hacking and cyberterrorism, and that datafication reduces people to their data points—as if we are our data, perfectly knowable through our digital traces.
In education, the children’s writer Michael Rosen recently posted a tweet along these lines, writing that: ‘First they said they needed data about the children to find out what they’re learning. Then they said they needed data about the children to make sure they are learning. Then the children only learnt what could be turned into data. Then the children became data.’
Recently a lot of commentary has emerged about the social and emotional anxiety experienced by students in both schools and universities. Many of these psychological frailties are at least partly blamed on social media and other technologies that harvest up data from young people and then target and manipulate them for commercial profit. Richard Freed calls it the ‘tech industry’s psychological war on kids’. These kinds of stories are now part of a kind of cultural narrative about the dystopian, nightmarish effects of datafication on children, which is happening both in their own time and in an increasingly data-driven education.
10 Legally & ethically
Finally, there are legal, ethical and regulatory mechanisms shaping datafication. Europe is much more privacy-focused than the US, for example, as the incoming EU General Data Protection Regulation shows. So how datafication plays out—what datafication is—is itself shaped by law, ethics and politics.
In the US, for example, specific federal acts such as COPPA and FERPA exist to protect children’s privacy, and organizations like the Internet Keep Safe Coalition enforce them. Other organizations such as the Future of Privacy Forum exist to produce ‘policy guidance and scholarship about finding the balance between protecting student privacy and allowing for the important use of data and technology in education’. The US also has the 2015 Every Student Succeeds Act (ESSA), which has made it possible for states and schools to apply for additional funding for personalized learning technologies. So there’s a new federal act in place which performs the double task of stimulating market growth in adaptive personalized learning software and incentivizing schools to invest in such technologies in the absence (or at least shortage) of public funding for state schooling.
Of course, the ethical issues of datafication of education are considerable and fairly well rehearsed. An interesting one is the ethics of data quality–a topic discussed by Neil Selwyn at the recent Learning Analytics and Knowledge (LAK) conference. There are significant potential consequences of poor data in learning analytics platforms. In other spaces, such as healthcare and military drones, the consequences of poor data quality can lead to disastrous, even fatal, effects. Poor quality datafication of education may not be quite so drastic, but it has the potential to significantly disrupt students’ education by leading to mismeasurement of their progress, misdiagnosis of their problems, or by diverting them on to the ‘wrong’ personalized pathways.
I’m sure datafication could be cut in different ways. But hopefully these categories capture some of its complexity.