Text classification of gender-biased language in archival documentation /

This dataset is designed for teaching a supervised learning approach to creating natural language processing models for classifying gender biases in a text corpus. The dataset contains catalogue metadata descriptions from the University of Edinburgh's Heritage Collections' Archives catalog...

Full description

Bibliographic Details
Main Author:	Havens, Lucy (Author)
Format:	eBook
Language:	English
Published:	London : SAGE Publications Ltd, 2024.
Series:	SAGE Research methods: diversifying and decolonizing research.
Subjects:	Computational linguistics. Data sets. Linguistique informatique. Jeux de données. computational linguistics.
Online Access:	Connect to the full text of this electronic book

Description
Summary:	This dataset is designed for teaching a supervised learning approach to creating natural language processing models for classifying gender biases in a text corpus. The dataset contains catalogue metadata descriptions from the University of Edinburgh's Heritage Collections' Archives catalogue, which were manually annotated for gendered and gender-biased language by Lucy Havens, Suzanne Black, Ashlyn Cudney, Anna Kuslits, and Iona Walker. The sample dataset contains examples of text that were manually annotated with all available labels from the Taxonomy of Gendered and Gender Biased Language, providing a dataset that was then used to train several gender-biased text classification models. Here, we use a subset of that dataset to focus on two labels only, "Omission" and "Stereotype," and on one type of classification task, document classification. The dataset file is accompanied by a Teaching Guide and a Student Guide, which explain how to create a text classification model with the data and evaluate the model's performance both quantitatively and qualitatively.
Physical Description:	1 online resource.
ISBN:	9781529692600 1529692601