Research Methodology for Operationalizing Diversity and Inclusion Requirements for AI Systems

3 Jul 2024


(1) Muneera Bano;

(2) Didar Zowghi;

(3) Vincenzo Gervasi;

(4) Rifat Shams.

Abstract, Impact Statement, and Introduction

Defining Diversity and Inclusion in AI

Research Motivation

Research Methodology



Conclusion and Future Work and References


As depicted in Figure 1, our research methodology consists of three stages: 1) Data Collection and Analysis from a Literature Review, 2) Tailoring a User Story Template for eliciting D&I in AI Requirements, and 3) Focus Group.

A. Stage 1: Data Collection and Analysis

We built on our previous work, where data was gathered through two distinct approaches: a systematic literature review (SLR) [28], and a document analysis of published ethical AI guidelines related to D&I [16].

The SLR [28], among other objectives, aimed to identify challenges and their associated solutions (guidelines, strategies, approaches, or practices) associated with D&I in AI. After a rigorous search and selection, we found a sample of 48 academic papers from the period of 2017-2022. Open coding was applied to the data extracted from these papers, leading us to identify 45 challenges related to D&I in AI.

In our document analysis of grey literature, we applied a systematic approach to extract guidelines related to D&I in AI from widely circulated sources, reports and grey literature [16]. We then conducted a thematic analysis of this list to identify 46 unique Guidelines about D&I in AI systems structured using the 5 pillars of D&I in AI definition.

Fig 1. Research Methodology

Fig 2. Data Analysis

The methodology depicted in Figure 2 outlines a rigorous and systematic approach we adopted to extract 23 unique themes related to D&I in AI from our two sources of data, SLR and Guidelines. Our complete dataset is provided as a spreadsheet1 . The data from the literature review and guidelines contained a plethora of conceptual and semantic repetitions, necessitating meticulous refinement to extract only a unique and comprehensive list of themes. For the grey literature, the data collection and analysis were mainly undertaken by the second author, and the first author was involved in the analysis. Meanwhile, the SLR involved the collaborative efforts of the first,second, and fourth authors. These two sources individually revealed 45 challenges and 46 guidelines concerning D&I in AI, which after further analysis, were subsequently condensed into 21 and 42 distinct themes, respectively.

In the first level of thematic analysis, several themes were streamlined to eliminate semantic redundancies. For example, within the Human pillar of SLR, themes related to representation, including "imbalance of gender representation," "under-representation of marginalized groups," "lack of AI development team's diversity," and "lack of AI researcher's diversity," were consolidated under one theme of “Representation, Diversity and Inclusion”. An iterative thematic analysis was applied, further refining and merging all the derived themes. This process resulted in identifying 23 unique themes of D&I in the AI.

B. Stage 2: Tailored User Story Template

In the second phase, we designed a tailored User Story template for specifying D&I requirements in AI, that could be used for eliciting and capturing D&I in AI requirements by using the themes identified in stage 1. The template particularly focuses on roles that are embodied by persons with diverse attributes or that specify system behaviors that involve diverse attributes. Our hypothesis here is that a structured, specialized template can help human analysts focus specifically on D&I during requirements elicitation.

C. Stage 3: Focus Group Exercise

Using the themes identified in stage 1 and the tailored user story template from stage 2, we presented the artifacts to a focus group of four experts (in RE, D&I and AI) along with two example case studies. They were asked to elicit D&I requirements for AI systems that were described in the case studies. The team of experts comprised of three females and one male participant, coming with different diversity attributes of age, race, faith, culture, language, and nationality.

D. Exploring the utility of LLM

As we wanted to explore the utility of automating this process using LLM and examine how effective the process of writing user stories according to the template could be, we additionally used the popular LLM, GPT-4 from OpenAI as a tool to specify requirements. The test was based on the hypothesis that LLMs can support and complement humans in identifying relevant D&I in AI requirements and capture them in the user story template extracted from the themes.

This paper is available on arxiv under CC 4.0 license.