How to collect a corpus of websites with a web crawler /

Conducting research on digital cultures often requires some form of reference to online sources--but online sources are constantly changing, being updated, or deleted on a minute-by-minute basis. This guide will introduce the use of web crawlers as one potential method for gathering a stable, trustw...

Full description

Bibliographic Details
Main Author: Hodges, James A., active 2022 (Author)
Format: eBook
Language:English
Published: London : SAGE Publications, Ltd., 2022.
Series:SAGE research methods: doing research online
Subjects:
Online Access:Connect to the full text of this electronic book
Description
Summary:Conducting research on digital cultures often requires some form of reference to online sources--but online sources are constantly changing, being updated, or deleted on a minute-by-minute basis. This guide will introduce the use of web crawlers as one potential method for gathering a stable, trustworthy collection of online sources. A corpus of sources generated via a web crawler can function as a detailed snapshot of the way an online resource existed at a particular point in time. The guide begins with an introduction to the theory behind web crawling, before moving into discussions of ethical concerns and commonly used tools. After addressing each of these foundational areas, the guide concludes with a step-by-step demonstration of web crawling with the popular command-line based open-source web downloading tool known as Wget.
Physical Description:1 online resource.
ISBN:9781529609325
1529609321