‘Stochastic parrots’: Abeba Birhane on how AI compounds harmful biases

Posted on : 2024-06-10 09:48 KST Modified on : 2024-06-10 09:48 KST
Abeba Birhane, an eminent thinker on AI and ethics, is one of the keynote speakers at the Hankyoreh Human & Digital Forum on June 12
Abeba Birhane, a professor at Trinity College Dublin.
Abeba Birhane, a professor at Trinity College Dublin.

Abeba Birhane, a professor at Trinity College Dublin, is a cognitive scientist lauded for opening new possibilities in the field of AI ethics by verifying data used for AI development. and keynote speaker of the 3rd Hankyoreh Human & Digital Forum, which will be held on June 12.  

Birhane was listed among Time Magazine’s 100 Most Influential People in AI 2023 for shedding light on how current AI models are more likely to reinforce harmful biases and stereotypes about people of color, women and minorities. 

The Hankyoreh spoke to Birhane, who is also serving on the UN’s AI Advisory Body, via email before her visit to Korea. 

Hankyoreh: As a cognitive scientist, you seem to be more concerned than expected about the current development of AI based on big data. In what ways are artificial intelligence and human intelligence close, and in what ways are they far apart? 

Birhane: Although it is common to hear phrases such as “AI models exceeding human capabilities” or “models performing better than a human,” I think these comparisons between humans and AI are like making comparisons between apples and oranges. They make little sense, if any at all. We have yet to fully grasp what human cognition is (and there is no single representative human or single benchmark here) before we sensibly pitch it against machine intelligence. 

Hankyoreh: As a cognitive scientist and computer scientist, do you think AI can have values and common sense? How can this be achieved? 

Birhane: The question of common sense in AI is an old one and scholars such as Melanie Mitchell and Gary Marcus have done extensive work on it. Although there have been some developments that suggest some capability of common sense, I think for the most part these systems remain stochastic parrots that regurgitate underlying data. As for values, AI is not an entity that exists independent of those that build it, so the values of AI often boil down to the values of those that create it. We (with colleagues) have actually looked at this very question in a 2021 paper entitled “The Values Encoded in Machine Learning Research” and found that these values include “performance, nobility, generalization, efficiency, scale, and so on.” These are values that concentrate power in the hands of tech corporations and stand in opposition to values such as justice and fairness. 

Hankyoreh: Current AI systems have been criticized for being structurally biased against people of color, women and the disadvantaged. You have done some intensive data analysis to reveal this. You’ve also been credited with breaking new ground in auditing AI training datasets. Can you explain how this works? 

Birhane: Over the past number of years, a robust body of empirical work has demonstrated that AI systems fail. And when these systems fail, individuals and communities at the margins of society are often disproportionately negatively impacted. With colleagues, we have carried out numerous audits of large-scale training datasets, examining the presence of hateful content as well as racial and gender stereotypes. We have found that, as the scale of these datasets increases, so does the presence of hateful content. Furthermore, we have found that these gigantic datasets encode and exacerbate negative societal and historical stereotypes against people of color, women and otherwise marginalized groups. 

Hankyoreh: Recent studies have shown that discrimination and bias do not disappear as the scale of AI models increases, but rather become stronger. Your research has also shown that discrimination, bias and hateful content actually expands as data size increases. 

Birhane: Scale or scaling up is one of the top values the field of machine learning aspires towards. Scale is, furthermore, perceived as a solution to numerous issues. Canonical and influential papers including those coming from corporate labs such as Google have gone so far as to claim that “scale beats noise” — that the good, the bad and the ugly will balance out and end up with something closer to the “ground truth.” What we did to test this hypothesis was evaluate two popular datasets — one containing 400 million and, the other 2 billion image and text pairs — and see if the presence of hateful content would balance out, decrease or increase as scale increased. We found that as training data was scaled from 400 million samples to 2 billion samples, the amount of hateful content increased by 12%. The conclusion from this is that, as the scale of a dataset increases, so does hateful content. 

Hankyoreh: You have criticized AI for spreading mainstream mindsets: white, male, Western, wealthy. You have also referred to AI algorithms that are imbued with this mainstream worldview as “digital colonialism.” Can you explain what you mean by this? 

Birhane: AI technologies tend to power surveillance; encode and exacerbate inequity and injustice; concentrate power in the hands of few; and benefit the status quo, which is Western, wealthy, white, male values. Such Western-developed developed AI is then imposed as a solution to numerous (sometimes complex social and historical) problems on the Global South, often without adequate consideration for local needs, values and context. This constitutes digital colonialism: it undermines and suppresses the development of indigenous technologies that are much more suited to indigenous problems; further widens and perpetuates power imbalance and inequity; and often is rooted in a colonial mentality of “white-saviorism” that boils down to “We know what your problems are and we are coming to solve it for you with our technology.” This is often done without consultation or participation of indigenous populations. So, as you can see, not only is such tech unfit to solve local problems, but these populations are also increasingly given very little options to refuse. 

Abeba Birhane presents on the topic of AI and Big Tech accountability at a talk hosted by the Dalai Lama on Oct. 13, 2022, on interdependence, ethics and social networks. (from the Dalai Lama’s website)
Abeba Birhane presents on the topic of AI and Big Tech accountability at a talk hosted by the Dalai Lama on Oct. 13, 2022, on interdependence, ethics and social networks. (from the Dalai Lama’s website)

Hankyoreh:  You spoke with the Dalai Lama about the responsibilities of Big Tech. What do you think is the biggest problem with Big Tech-led AI development, and how do you think Big Tech should be held accountable? 

Birhane: It was a great honor to meet His Holiness the Dalai Lama and an amazing privilege to have a conversation with him about technology, AI, responsibility and accountability. The biggest problem we have faced is the massive, unchecked and unprecedented power and influence the AI industry and more particularly Big Tech currently holds. This power and influence allow Big Tech to exploit the masses, dictate the direction of AI (research, policy, public perception of AI and AI risks, and everything in between) and the focus of AI governance with little accountability. For example, as I write this, Google is introducing a feature called “AI Overview” to Google search. Several people have pointed out that the information this AI Overview provides is flat-out incorrect and sometimes extremely dangerous, recommending users “eat at least one rock a day,” “add glue to pizza sauce,” “jump off the Golden Gate Bridge” as a solution to depression. This is not only dangerous and irresponsible, but it also seriously undermines knowledge and the public’s trust in technology. However, in the face of this serious and dangerous failure, Google has not taken down this feature. There are no regulatory mechanisms to hold them accountable. The corporation is too big and too powerful to care so this faulty and dangerous feature continues unabated. 

Hankyoreh: There are several attempts to regulate the development of AI driven by Big Tech. For example, the EU’s AI Act and Biden’s action to create AI safeguards. Can you touch on the possibilities and limitations of these attempts? 

Birhane: Yes, we are witnessing several developments in the regulatory space, and these are all encouraging. These developments hold a promise for a technological future where the welfare and well-being of humans are the core of tech development. As tech companies seem to respond to regulation and public outrage, putting in place enforceable regulation is essential. However, there are some obstacles to this including Big Tech lobbyists that are undermining and watering down proposed regulations. 

Hankyoreh: You have said that the regulation of AI is also dominated by Big Tech, which can be a conflict of interest. Who should regulate AI? 

Birhane: Let me clarify that I am not advocating for the exclusion of Big Tech from regulatory discussions. It brings an important perspective to the table. However, these corporations rely on the mass adoption of their technology. Their capitalist business model — which tends to be maximizing profit — raises a conflict of interest with the development of a just, and human-centered AI. These two values often sit opposite each other. To be precise, what I’m objecting to is the overwhelming dominance and capture of the entire regulatory space. Multiple stakeholders, including academics, civil society, and those that are disproportionately negatively impacted by tech are expert groups that are most suited to recognize the harms arising from tech and therefore most suited to input towards sensible regulation. 

Hankyoreh: What efforts are needed to ensure that AI is used to help working people and the disadvantaged, rather than for the benefit of Big Tech? 

Birhane: Again, one of the most effective mechanisms is regulation. Put in place regulatory guardrails that protect people from practices such as aggressive data harvesting, dark patterns, and systems that ensure that a given technology is thoroughly tested before it is put on the market as well as mechanisms that allow [people] to refuse technology. And substantial penalties when these are breached. At the end of the day, the way to regulate that empowers working people is regulation that is developed with the interest and welfare of working people, not the interest of a handful of tech corporations. 

Hankyoreh: As a UN adviser on AI, what role do you think the UN should play in the context of conflicting stakeholders, and what are some of its biggest challenges? 

Birhane: As far as current governance bodies go, the UN High Level AI Advisory Body is one of the most diverse bodies with stakeholders ranging from industry, academia, government and civil society. This is encouraging but also has its own challenges. The changelogs include, sensibly representing the views and values of all stakeholders. Oftentimes, even with bodies constituting such diverse stakeholders, the views and values of those at the margins of society tend to be systematically left out or barely incorporated. It remains to be seen with the UN. 

Hankyoreh: You said that developers should be able to go beyond their technical role and ask questions about the political implications of the knowledge contained in the data. Do you think ethics training and critical self-reflection for developers could be an alternative? 

Birhane: Much of the technology that is developed by Big Tech and startups alike does not remain in the lab but rather finds its way into society, sometimes with dire implications for individuals and groups. Subsequently, developing technology that serves society requires knowledge and critical awareness outside the technical realm. There are numerous avenues to this, including training and self-reflection as well as working with domain experts in social sciences, humanities as well as civil society groups. 

Hankyoreh: You emphasize the need to consider people's “lived experiences” in AI development, rather than just machine learning based on large amounts of data. You say that minorities who have been excluded from the development process need to be given a greater voice, such as the Maori community in New Zealand who are trying to develop their own speech technology to reclaim their dying language. Can you share more about this? 

Birhane: Te Reo Maori language technology projects in New Zealand present one of the clear examples of technology designed, developed and deployed to solve an actual problem that exists and benefit small communities (not powerful developers). The main drive to build their language technologies emerged from the necessity to preserve the Maori language and culture. The speakers of Maori were prevented from speaking their language — through shaming and physical punishment — during the British colonial era. In order to resuscitate their dying language, the Te Hiku NLP project collected speech-text data from the elder members of the community. Speech recognition technology was then built to educate and serve the younger generation. Te Hiku built its own digital hosting platform as well as establishing a Data Sovereignty Protocol that allowed the community to hold full autonomy and control over the data and technology. This is indeed what a technology that decenters Western hegemony, restores erased histories, uplifts and historical and current indigenous intellectual contributions and a technology driven by justice and equity looks like.  

At Wednesday’s forum, Birhane will deliver an address titled “How does AI development led by Big Tech reproduce bias and inequality?” Her address will be followed by a discussion with Cheon Hyun-deuk, a professor of natural sciences at Seoul University.  

By Han Gui-young, research fellow at Human & Digital Research Lab 

Please direct questions or comments to [english@hani.co.kr

button that move to original korean article (클릭시 원문으로 이동하는 버튼)

Related stories

Most viewed articles