An Overview of Web Scraping: Technical Aspects and Exercises
Zusammenfassung
Researchers and organizations conducting different types of research can benefit from studying and using Web Scraping in a correct manner to further their research goals. This study serves as a review on some of the web scraping techniques and the legal and ethical implications of web scraping. Technical, legal, and ethical aspects of web scraping are discussed to better understand benefits and risks of using the web scraping process. Three exercises involving Web Scraping techniques are presented. One is performed by using the BeautifulSoup library in Python. The second exercise is performed using the web
scraping tool Octoparse. Lastly, web scraping is performed using ParseHub. The three experiences are discussed to provide insight on how the different techniques and programs compare. Key Terms ⎯ BeautifulSoup, Octoparse, ParseHub, Web scraping.