Learning lessons from data controversies

Ben Williamson

This is a talk delivered at OEB2018 in Berlin on 7 December 2018, with links to key sources. A video recording is also available (from about 51mins mark)

Ten years ago ‘big data’ was going to change everything and solve every problem—in health, business, politics, and of course education. But, a decade later, we’re now learning some hard lessons from the rapid expansion of data analytics, algorithms, and AI across society.

DCMS Zuckerberg      Data controversies became the subject of international government attention in 2018

Data doesn’t seem quite so ‘cool’ now that it’s at the centre of some of society’s most controversial events. By ‘controversy’ here I mean those moments when science and technical innovation come into conflict with the public or political concerns.

Internationally, politicians have already begun to ask hard questions, and are looking for answers to recent data controversies. The current level of concern about companies like Facebook, Google, Uber, Huawei, Amazon and so on is now so acute that some commentators say we’re witnessing a ‘tech-lash’—a backlash of public opinion and political sentiment to the technology sector.

The tech sector is taking this on board, such as the Centre for Humane Technology seeking to stop tech from ‘hijacking our minds and society’. Universities that nurture the main tech talent, such as MIT, have begun to recognize their wider social responsibility and are teaching their students about the power of future technologies, and their potentially controversial effects. The AI Now research institute just launched a new report on the risks of algorithms, AI and analytics, calling for tougher regulation.

TES-algorithms-printPrint article on AI & robotization in teaching, from Times Education Supplement, 26 May 2017

We’re already seeing indications in the education media of a growing concern that AI and algorithms are ‘gonna get you’—as it said in the teachers’ magazine the Times Education Supplement last year.

In the states the FBI even issued a public service announcement warning that the collection of sensitive data by ‘edtech’ could result in social engineering, bullying, tracking, identity theft, or other means for targeting children’. An ‘edtech-lash’ has begun.

The UK Children’s Commissioner has also warned of the risks of ‘datafying children’ both at home and at school. ‘We simply do not know what the consequences of all this information about our children will be,’ she argued, ‘so let’s take action now to understand and control who knows what about our children’.

And books like Weapons of Math Destruction and The Tyranny of Metrics have become surprise non-fiction successes, both drawing attention to the damaging effects of data use in schools and universities.

So, I want to share some lessons from data controversies in education in the last couple of years—things we can learn from to avoid damaging effects in the future.

Software can’t ‘solve’ educational ‘problems’ 
One recent moment of data controversy was the protest by US students against the Mark Zuckerberg-supported Summit Public Schools model of ‘personalized learning’. Summit is originally a charter school chain with an adaptive learning platform—partly built by Facebook engineers—that’s scaled up across many high school sites in the US.

But in November, students staged walkouts in protest at the educational limitations and data privacy implications of the personalized learning platform. Student protestors even wrote a letter to Mark Zuckerberg in The Washington Post, claiming assignments on the Summit Learning Platform required hours alone at a computer and didn’t prepare them for exams.

They also raised flags about the huge range of personal information the Summit program collected without their knowledge or consent.

‘Why weren’t we asked about this before you and Summit invaded our privacy in this way?’ they asked Zuckerberg. ‘Most importantly’, they wrote, ‘the entire program eliminates much of the human interaction, teacher support, and discussion and debate with our peers that we need in order to improve our critical thinking…. It’s severely damaged our education.’

So our first lesson is that education is not entirely reducible to a ‘math problem’, nor can it be ‘solved’ with software—it exceeds the increase in data available from teaching and learning processes. For many educators and students alike, education is more than the numbers in an adaptive, personalized learning platform, and includes non-quantifiable relationships, interactions, discussion, and thinking.

Global edtech influence raises public concern
Google, too, has become a controversial data company in education. Earlier this year it launched its Be Internet Awesome resources for digital citizenship and online safety. But the New York Times questioned whether the public should accept Google as a ‘role model’ for digital citizenship and good online conduct when it is seriously embattled by major data controversies.

Google NY TimesThe New York Times questioned Google positioning itself as a trusted authority in schools

Through its education services, it’s also a major tracker of student data and is shaping its users as lifelong Google customers, said the Times. Being ‘Internet Awesome’ is also about buying into Google as a user and consumer.

In fact, Google was a key target of a whole series of Times articles last year revealing Silicon Valley influence in public education. Silicon Valley firms, it appears, have become new kinds of ‘global education ministries’—providing hardware and software infrastructure, online resources and apps, curricular materials and data analytics services to make public education more digital and data-driven.

This is what we might call ‘global policymaking by digital proxy’ as the tech influences public education at speeds and international scale conventional policy approaches cannot achieve.

The lesson here is that students, the media and public may have ideas, perceptions and feelings about technology, and the companies behind it, that are different to companies’ aspirations—claims of social responsibility compete with feelings of ‘creepiness’ about commercial tracking and concern about private sector influence in public education.

Data leaks break public trust
Data security and privacy is perhaps the most obvious topic for a data controversy lesson—but it remains an urgent one as educational institutions and companies are increasingly threatened by cybersecurity attacks, hacks, and data breaches.

K12 cybermapThe K12 Cyber Incident map has catalogued hundreds of school data security incidents

The K-12 Cyber Incident Map is doing great work in the US to catalogue school hacks and attacks, importantly raising awareness in order to prompt better protection. And then there’s the alarming news of really huge data leaks from the likes of EdModo and SchoolZilla—raising fears this is surely only going to get worse as more data is collected and shared about students.

The key lesson here is that data breaches and student privacy leaks also break students’, parents’, and the public’s trust in education companies. This huge increase in data security threats risks exposing the ed-tech industry to media and government attack. We’re supposed to protect children, they might say, but we’re exposing their information to the dark web instead!

Algorithmic mistakes & encoded politics cause social consequences 
Then there’s the problem of educational algorithms being wrong. Earlier this year, the English Testing Service revealed results from a check of whether international students were cheating an English language proficiency test. To discover how many students had cheated, ETS used voice biometrics to analyze tens of thousands of recorded oral tests, looking for repeated voices.

What it found? According to reports, 20% of the time the algorithm was getting the voice matching wrong. That’s a huge error rate, with massive consequences.

Around 5000 international students in the UK wrongly had their visas revoked and were threatened with deportation, all related to the UK’s ‘hostile environment’ immigration policy. Many have subsequently launched legal challenges, and many have won.

Data lesson 4, then, is that poor quality algorithms and data can lead to life-changing outcomes and consequences for students—even raising the possibility of legal challenges to algorithmic decision-making. This example also shows the problem with ascribing too much objectivity and accuracy to data and algorithms—in reality, they’re the products of ‘humans in the room’ whose own assumptions, and potential biases and mistakes can be coded into the software that’s used to make life-changing decisions.

Let’s not forget, either, that the test wouldn’t even have existed except the UK government was seeking to root out and deport unwanted immigrants—the algorithm was programmed with some nasty politics.

Transparency, not algorithmic opacity, is key to building trust with users
The next lesson is about secrecy and transparency. The UK government’s Nudge Unit, for example, revealed this time last year that it had piloted a school-evaluating algorithm for school inspection, which could identify where a school might be failing from its existing data.

Many headteachers and staff are already fearful of the human school inspector. The automated school-inspecting algorithm secretly crawling around in their servers and spreadsheets, if not their corridors, offices and classrooms, hasn’t made them any less concerned. Especially as it can only rate their performance from the numbers, rather than qualitatively assessing the impact of local context on how they perform.

A spokesperson for the National Association of Headteachers said to BBC News, ‘We need to move away from a data-led approach to school inspection. It is important that the whole process is transparent and that schools can understand and learn from any assessment. Leaders and teachers need absolute confidence that the inspection system will treat teachers and leaders fairly’.

The lesson to take from the Nudge Unit experiment is that secrecy and lack of transparency in use of data analytics and algorithms do not win trust in the education sector—teacher unions and education press are likely to reject AI and algorithmic assistance if not believed to be transparent, fair, or context-sensitive.

Psychological surveillance raises fears of emotional manipulation
My last three lessons focus on educational data controversies that are still emerging. These relate to the idea that the ‘Internet of Bodies’ has arrived in the shape devices for tracking the ‘intimate data’ of your body, emotions and brain.

For example, ‘emotion AI’ is emerging as a potential focus of educational innovation—such as biometric engagement sensors, emotion learning analytics, and facial vision algorithms that can determine students’ emotional response to teaching styles, materials, subjects, and different teachers.

Emotive computingEmotionAI is being developed for use in education, according to EdSurge

Among others, EdSurge and the World Economic Forum have endorsed systems to run facial analytics and wearable biometrics of students’ emotional engagement, legitimizing the idea that invisible signals of learning can be detected through skin.

Emotion AI is likely to be controversial because it prioritizes the idea of constant psychological surveillance—the monitoring of intimate feelings and perhaps intervening to modify those emotions. Remember when Facebook got in trouble for its ‘emotional contagion’ study? Fears of emotional manipulation inevitably follow from emotionAI–and the latest AI Now report highlighted this as a key area of concern.

Facial coding and engagement biometrics with emotion AI could even be seen to treat teaching and learning as ‘infotainment’—pressuring teachers to ‘entertain’ and students to appear ‘engaged’ when the camera is recording or the biometric patch is attached.

‘Reading the brain’ poses risks to human rights 
The penultimate lesson is about brain-scanning with neurotechnology. Educational neurotechnologies are already beginning to appear—for example, the BrainCo Focus One brainwave-sensing neuroheadset and application spun out of Harvard University.

Such educational neurotechnologies are based on the idea that the brain has become ‘readable’ through wearable headsets that can detect neural signals of brain activity, then convert those signals into digital data for storage, comparison, analysis and visualization via the teacher’ brain-data dashboard. It’s a way of seeing through the thick protective barrier of the skull to the most intimate interior of the individual.

BrainCo 1The BrainCo Focus One neuroheadset reads EEG signals of learning & presents them on a dashboard

But ‘brain surveillance’ is just the first step as ambitions advance to not only read from the brain but to ‘write back’ into it or ‘stimulate’ its ‘plastic’ neural pathways for more optimal learning capacity.

Neurotechnology is going to be extraordinarily controversial, especially as it is applied to scanning and sculpting the plastic learning brain. ‘Reading’ the brain for signals, or seeking to ‘write back’ into the plastic learning brain, raises huge ethical and human rights challenges—‘brain leaks’, neural security, cognitive freedom, neural modification—with prominent neuroscientists, neurotechnologists and neuroethics councils already calling for new frameworks to protect the readable and writable brain.

Genetic datafication could lead to dangerous ‘Eugenics2.0’
I’ve saved the biggest controversy for last: genetics, and the possibility of predicting a child’s educational achievement, attainment, cognitive ability, and even intelligence from DNA. Researchers of human genomics now have access to massive DNA datasets in the shape of ‘biobanks’ of genetic material and information collected from hundreds of thousands of individuals.

The clearest sign of the growing power of genetics in education was the recent publication of a huge, million-sample study of educational attainment which concluded the number of years you spend in education can be partly predicted genetically.

The study of the ‘new genetics of intelligence’, based on very large sample studies and incredibly advanced biotechnologies, is also already leading to ever-stronger claims of the associations between genes, achievement and intelligence. And these associations are already raising the possibility of new kinds of markets of genetic IQ testing of children’s mental abilities.

Many of you will also have heard the news last week that a scientist claimed to have bred the first ever genetically edited babies, raising a massive debate about re-programming human life itself.

Basically, it is becoming more and more possible to study digital biodata related to education, to develop genetic tests to measure students’ ‘mental rating’, and perhaps even to recode, edit or rewrite the instructions for human learning.

It doesn’t get more controversial than genetics in education. So what data lesson can we learn? Genetic biodata risks reproducing dangerous ideas about the biologically determined basis of achievement, while genetic ‘intelligence’ tests are a step towards genetic selection, brain-rating, and gene-editing for ‘smarter kids’—raising risks of genetic discrimination, or ‘Eugenics 2.0’.

Preventing data controversies 
So why are these data lessons important? They’re important because governments are increasingly anxious to sort out the messes that overenthusiastic data use and misuse has got societies into.

In the UK we have a new government centre for data ethics, and a current inquiry and call for evidence on data ethics in education. Politicians are now asking hard questions about algorithmic bias in edtech, accuracy of data models, risk of data breaches in analytics systems, and the ethics of surveillance of students.

Data and its controversies are under the microscope in 2018 for reasons that were unimaginable during the big data hype of 2008. Data in education is already proving controversial too.

In Edinburgh, we are trying to figure out how to build productive collaborations between social science researchers of data, learning scientists, education technology developers, and policymakers—in order to pre-empt the kind of controversies that are now prompting politicians to begin asking those hard questions.

By learning lessons from past controversies with data in education, and anticipating the controversies to come, we can ensure we have good answers to these hard questions. We can also ensure that good, ethical data practices are built in to educational technologies, hopefully preventing problems before they become full-blown public data controversies.

Advertisements
This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s