Genetics, big data science, and postgenomic education research

Ben Williamson

Emily Willoughby_Genetics of educational attainment_2018A diagram visualizing the genetic variants associated with educational attainment. Image by Emily Willoughby.

An international consortium of genetics researchers has established a link between genes and educational attainment from a study of over a million people. One of the largest genetics studies ever published in a science journal, it represents a significant step forward for the emerging field of educational genetics. The growth of genetics expertise in education also, however, raises substantial concerns about biological determinism and new forms of eugenics, and reanimates long-standing debates about the genetic inheritance of intelligence and cognitive ability.

In this post I outline some key findings of the study, but primarily focus on the significant implications and issues it raises for education research more widely. The implications of the study are that it: (1) establishes genetics as a powerful new front in educational knowledge production; (2) positions big data science as a methodological apparatus for future educational studies; (3) surfaces extreme political polarization regarding genetic factors in education that will be difficult to reconcile as genetics enters education policy debates; (4) potentially opens up a new market for commercial educational genetics products; and (5) reveals the need for new social scientific forms of engagement with, and critique of, genetics research and postgenomic science in the education field.

Gene discovery
Published in Nature Genetics at the end of July 2018 by the international Social Science Genetic Association Consortium (SSGAC) in collaboration with the consumer genetics company 23andMe, the paper ‘Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals’ reports findings showing that genetic patterns across a large population are associated with years spent in school. According to its 80 authors, ‘educational attainment is moderately heritable and an important correlate of many social, economic and health outcomes,’ and is therefore an important focus in a number of educational genetics studies.

Specifically, the scientists  identified over a thousand genetic variants linked with educational attainment, particularly those involved in brain-development processes and the formation of neuronal connections in foetuses and newborns. These biological factors, the scientists claim, influence psychological development, which in turn affects how far and for how long people continue at school.

The SSGAC has been careful in reporting the results. They do not claim to have identified any single genes for education, and the data don’t predict educational attainment for individuals. The research also found that genetic variants have a far weaker effect than environmental influences on educational attainment, and was restricted to analysis of a homogeneous sample people aged in their 40s and 50s of white European descent (the study failed with a sample of African-Americans). The authors produced a massive Q&A document—longer than the paper itself—to help explain and clarify the results, methods and conclusions, while downplaying the policy and practical implications of its findings. As such, the paper has been carefully published in acknowledgement of the potential controversy it could cause, and to anticipate misinterpretation and misreporting of its findings.

Nonetheless, the paper has catalysed significant media interest and social media commentary. Three days after publication, the paper had been Tweeted 1000 times, blogged multiple times, and reported in news media around the world—picking up an enormous Altmetric score in the process. There is useful coverage in the New York Times, Atlantic and MIT Technology Review reporting the key findings.

Clearly the paper is a massive advance for genetics science, in education and beyond. For those education researchers and social scientists outside of the genetics field, however, it has major implications in terms of knowledge production, methods, policy influence, and the commercialization of educational genetics.

Powerful genetic knowledge
Along with other recent advances in genetics in education, the SSGAC study instantiates the emergence of a powerful new field of knowledge production. Such research is only possible now owing to the complete sequencing of the human genome–the entire genetic structure of human DNA–over a decade ago, and since then studies in human genomics have expanded rapidly. As a result, science studies researchers claim we are now in a postgenomic age.

As a research field, educational genomics seeks to unpack the genetic factors involved in individual differences in learning ability, behavior, motivation, and achievement. Importantly, researchers of educational genomics do not assume either that there is any single genetic factor that determines learning ability, cognition or intelligence, or that genetic factors entirely explain the complexity of learning. Identifying an individual’s genotype—the full heritable genetic identity of a person—and its relationship to learning, intelligence or educational outcomes remains complex. Practitioners of educational genomics and behavioural genetics look for patterns in huge numbers of genetic factors that might explain behaviours and achievements in individuals, by studying the interaction of genotypes and environmental influences on phenotypical behaviours and traits (such as intelligence etc).

The SSGAC has positioned itself as a leading consortium for such postgenomic education science with the publication of their paper, but another key figure bringing genomics research into education is the behavioural geneticist Robert Plomin, co-author of the controversial G is for Genes: The Impact of Genetics on Education and Achievement. Plomin has extensively studied the links between genes and attainment using ‘genome-wide polygenic scoring’ (GPS), a method also employed in the SSGAC study. A polygenic score is produced by analysing huge number of genetic markers, and their interactions with environmental factors, in order to predict a particular behavioural or psychological trait. As computer processing power, data storage capacity, and data analytics technologies have advanced in recent years, it has become possible to correlate huge quantities of genotypical data with a host of phenotypical traits.

Under the banner of a ‘new genetics of intelligence’, Plomin and colleagues have used polygenic scores to predict academic achievement in schools. The substantial increase in heritability they found ‘represents a turning point in the social and behavioural sciences because it makes it possible to predict educational achievement for individuals directly from their DNA,’ thereby ‘moving us closer to the possibility of early intervention and personalized learning.’

While the SSGAC avoids calling for interventions based on its data, the results open up possibilities for further studies and analyses. These include: studies that control for genetic influences in order to generate credible estimates of how changes in school policy influence health outcomes; study why specific genetic variants predict educational attainment; and study how the effects of genes on education differ across environmental contexts. As such, the research itself is a catalyst for further educational genomics studies.

Although educational genomics remains in its infancy, it seems likely to advance considerably in coming years, linking genotypes to phenotypical traits, behaviours and other outcomes. It will link more closely with psychology and neuroscience as associations are further established between genes and neurons, personality traits and so on. As more findings emerge, further support will grow for evidence-based scientific perspectives on learning. New forms of genetic and genomic expertise in educational matters are already emerging, and challenging existing forms of social scientific and philosophical educational research which have challenged the biological determinism of genetics for decades.

Big data science
The methodological apparatus of the SSGAC study, and other research in educational genomics and behavioural genetics, is huge—it dwarfs the technical, methodological, financial and expert resources of other forms of educational research. The SSGAC study itself is the accomplishment of a well-funded international team of 80 scientists working in departments of psychology, sociology, behavioural genetics, behavioural science, neurogenomics, economics, biosciences, health sciences, and many others. A core part of the team included more than 20 scientists from the commercial organization 23andMe, the Silicon Valley company backed by Google. The research, then, was distributed across public universities and commercial labs at huge scale and significant cost.

Beyond the big size of the team and its funding, the study is also typical of the big data methods of genetic science. The data on its sample of over a million people was from two sources. One was the UK Biobank, a huge open access health resource based on a living population of over 500,000 volunteer participants, which was established by the Medical Research Council and the Wellcome Trust and opened up to scientists in 2012. One of many biobanking projects worldwide, it opens up unprecedented access to large samples of genetic data for analysis. The other data was sourced from 23andMe itself, the consumer genetics company offering health and ancestry services on a profit-making basis.

The methods described in the appendix to the SSGAC study demonstrate the quantitative and computational complexity of such large-scale genetics research. The study depends on a range of statistical methods, tests, mathematical formulae, algorithms, data visualizations, software platforms with names such as METAL and PLINK, and bioinformatics platforms called DEPICT, MTAG, PANTHER and MAGMA.

As such, the paper published in Nature Genetics is the end-result of the activities of a huge interdisciplinary science team, generous financial funding, enormous databanks from both the non-for-profit and private sectors, and highly sophisticated big data analytics methods, all powered by a vast infrastructure of bioinformatics technologies, statistical software analysis packages, data analytics and visualization. The scale of the scientific infrastructure of knowledge production is miles away from the norms of educational research.

Yet we may expect further education research to locate itself within such infrastructures of professional expertise, labs, databanks, analytics methods and software. Already, scientists are beginning to propose new multidisciplinary experimentation and intervention under the heading of ‘precision education’. Genetics and neuroscience are spectacular new fronts of big data-driven scientific research, and related subfields of educational genomics and educational neuroscience are growing fast, with the support of wealthy foundations and commercial partners. As a result, studies such as that by the SSGAC and other educational genetics teams position big data science as a new frontier of innovative and interdisciplinary education research.

Policy sciences
Researchers in the field of Science and Technology Studies (STS) have long maintained that science and politics are inseparable, and often focus their attention on scientific controversies. This is particularly the case when science enters into official policy, and is translated and manipulated to fit political agendas and policymakers’ requirements. The new genetics of education are an ideal illustration of an emerging scientific controversy in education.

The SSGAC research represents the potential for a significant shift in emphasis in education policy to embrace genetics expertise. Though the SSGAC reports no direct policy implications from its study, it is clear that policymakers seeking explanations for educational attainment would be interested in the results. As Kalervo Gulson and P. Taylor Webb have argued, new kinds of ‘bio-edu-policy-science actors’ may be emerging as authorities in educational policy, ‘not only experts on intervening on social bodies such as a school, but also in intervening in human bodies’. And science writer Antonio Regalado pointed out that one of the SSGAC authors had previously stated that once polygenic scores could be used to predict IQ, it would trigger a ‘serious policy debate’ about ‘personal eugenics’.

Commenting on the SSGAC study, John Warner cautions about how conservative economists might seek to translate the results into policy proposals. ‘How long before schools subject to performance funding as determined by graduation metrics begin to discriminate against students with low polygenic educational attainment scores?’ he asks. ‘When will automated human resources algorithms start weighing polygenic educational attainment scores when sorting through job applicants?’ These questions point to the possibility of students being grouped and clustered together by their polygenic scores, and the potential for enforcing new kinds of ‘biosocial collectivity’ within schools.

A significant problem with the potential translation of educational genomics into education policy is that genetics in education is extremely controversial and politicized. The publication in the mid-90s of The Bell Curve rekindled old debates about genetic determinism, eugenics and racialized discrimination in relation to IQ testing and the political uses of intelligence data. Concerns persist about this ‘new geneism’, and help account for the very careful, actively depoliticised packaging of the SSGAC study. A recent article in The New Statesman on the genetics of education identified deep polarization between right-wing advocates of genetics and left-wing critics, with the former preferring explanations based in biology and the latter seeking environmental explanations. A column reporting on the SSGAC study in the New York Times argued ‘progressives should embrace the genetics of education’, suggesting that ‘the power of the genomic revolution [can] be harnessed to create a more equal society’ while berating the ‘long tradition of left-wing thinkers who considered biological research inimical to the goal of social equality’.

Matters aren’t helped by the fact that some of the most outspoken advocates of genetic explanations for attainment, achievement and intelligence are divisive public figures such as Toby Young and Charles Murray (co-author of The Bell Curve). In a recent Spectator article titled ‘The left is heading for a reckoning with the new genetics’, Young attacked what he saw as liberal progressives’ ‘environmental determinism’ as ‘scientifically indefensible’. ‘Like Marx,’ he argued, ‘post-modernists believe that man’s true nature is reducible to the totality of social relations, that individuals are nothing more than the embodiments of particular class-relations and class-interests, and that everything comes down to the struggle for power. I wouldn’t expect an uncritical acceptance of the new genetics from that quarter’.

Drawing on an interview with Charles Murray, Young also speculated that left wing sociologists in particular would likely become irrelevant unless they embraced the new genetics by the mid-2020s. For Murray, this was even a source of deep concern, since he thought ‘once left-wing intellectuals finally let go of environmental determinism they may veer too far in the opposite direction and embrace gene editing technologies like CRISPR-Cas9 to try to create the perfect socialist citizen’.

Given Young’s proximity to education policymakers and politicians unde the current UK Conservative government, his comments on genetics have caused widespread alarm among academic and educators. Generating policy proposals based on educational genomics in this tense environment, then, is likely to be a continuing source of deep controversy and irreconcilable political suspicions. It appears that education policy in coming years will have to engage in significant debate about genetics and even personal eugenics, requiring informed participation by social scientists whose views on the matter are currently subject to attack and ridicule by conservative commentators. Education policy studies of this scientific and political controversy will be essential.

Genetic exploitation
With growing awareness of the increasing power of genetic science in education, it is highly likely that commercial organizations will seek to exploit the opportunity to build an educational genetics market of services and products.

Consumer companies such as Google-backed 23andMe have already exploited the opportunities made available by the sequencing of the human genome to launch genetic testing services as commercial products. As 23andMe make up part of the team behind the SSGAC study, this commercial outfit has now not only positioned itself as part of the apparatus of education research, but potentially could stand to gain from extending to the provision of further educational genetics products. In the same week the SSGAC study was released, 23andMe also released details of a deal with big pharmaceutical company GlaxoSmithKline to use data from its 5 million customers of home genetics testing kits to design new drugs. The $300million deal will see GSK and 23andMe  applying artificial intelligence and machine learning to the medical discovery process, analysing genetic data from 23andMe and other sources such as UK Biobank. As a private company with vast genetic databanks, 23andMe is clearly positioning itself as a key part of the infrastructure of genetic science in pharmaceuticals and education.

Other companies are likely to see market potential in educational genetic testing products too. Already, concerns are emerging about startup companies seeking to exploit advances in human genomics research to produce genetic IQ tests. Cheap DNA kits for IQ testing in schools, in the shape of ‘intelligence apps’ or other genetic ed-tech products, may be feasible in the not-too-distant future, though considerable and understandable concern exists about their usefulness and ethics. Robert Plomin has proposed that DNA analysis devices such as ‘learning chips’ could make reliable genetic predictions of heritable differences in academic achievement, and it is easy to speculate how consumer-DNA companies could extend in this direction.

Major risks would emerge from the expansion of an educational genetics markets. One is that as genetic predictions become accepted  as forecasts of a child’s future ability, new approaches may emerge to ‘artificially select future generations’–a ‘eugenics 2.0‘ for selecting ‘smarter kids’. While embryo screening programs probably remain unlikely in the West, large-scale efforts are already underway elsewhere to find the genetic code for high IQ. This raises the possibility for selective-intelligence to become attractive to wealthy parents seeking genetic advantage for their children.

The merging of genetic science, big data and commercial speculation in education could lead to a new form of ‘platform scientism’, where the logics of capital accumulation and data analytics combine to push genetic testing and other profiling services in schools. The danger of such a scenario, as detailed in The Atlantic, is that obsession with these ‘slippery genetic predictions could turn people’s attention away from other things that influence how children do in school and beyond — things like their family’s wealth, the stress in their neighborhoods, the quality of the schools themselves’.

Critical postgenomic education research
The acceleration and expansion of educational genetics research as a big data science of attainment, achievement and even intelligence raises distinctive challenges for social scientific education research. Straightforward critique and rejection of genetics represents a possible form of resistance. However, within the wider field of sociology and STS research on postgenomics, researchers have begun to propose different forms of analysis and critique, with some educational researchers also working to get beyond simplistic critical reactions to new biological thinking in productive new ways.

Contemporary postgenomic science, with its emphasis on gene-environment interaction, offers an invitation for social scientists to explore how the biological and the social constitute each other. Biosocial studies, for example, acknowledge that the body, biology and brain are shaped by their social circumstances and environmental contexts. Commenting on contemporary postgenomic science, biosocial researchers argue that the social world gets ‘under the skin’ to impress upon the biological. They insist that bodies are influenced by power structures in society, becoming tangled with social, political and cultural structures and environments.

Biosocial work in education is just beginning to emerge. Developing a ‘biosocial education’ agenda, Deborah Youdell argues that learning may be best understood as the result of ‘social and biological entanglements.’ Biosocial education research therefore takes biology seriously, but also digs critically into the ways scientists have conceptualized the body and thereby made it amenable to experimentation and intervention.

A biosocial approach would seek to understand educational genetics in both biological and social scientific terms by appreciating that the social environments in which learning takes place do in fact inscribe themselves on bodies and brains. The genetic and neural data of contemporary postgenomics would have to be understood from a biosocial view as data about social processes, not only biological processes.

Since genetics is a highly data-intensive and software-saturated field of experimentation and knowledge production, a biosocial perspective would also address the implications of data processing of students’ genetic and neural details. Taking further cues from STS, it would acknowledge that data are always a partial selection, that their analysis through vast data infrastructures of methods and software packages matters a great deal to the results produced, and that the results can influence what happens in educational settings. Is the ‘quantified human’ held in a database and represented by a polygenic score really detailed enough to yield insights to intervene upon students? Additionally, biosocial research would be alive to the possible consequences of for-profit commercial companies building software platforms for collecting and analysing students’ genetic and neural information.

The million-sample SSGAC study is clearly a landmark in postgenomic education science. It is a field of experimentation and knowledge production requiring novel forms of social scientific and philosophical analysis. A biosocial approach may be one way forward, but it is clear that educationalists need to develop a range of concepts and methods in order to perform critical postgenomic education research as the genetic science of education expands and accelerates.

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

1 Response to Genetics, big data science, and postgenomic education research

  1. paulmartin42 says:

    Key findings: (1) via NY Times: moderate genetic influence (2) more research needed – eg bigger sample (3) 23andme PR moves fwd – see yesterday article in The Times wrt to GSK investment & recent Google Talk which is worth persisting with.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s