Wacky Web Bots

Note: Recorded for you all to be able to watch later

Email: hdeep2@illinois.edu

Outline

  • webbrowser built in library to navigate to course pages and book chapters

  • selenium to scrape DSC information

Time Suggestions

  • For those following along async or reusing these slides (which is 100% allowed), I've added timing guides throughout the slides as well. These are just suggestions so don't worry if you're not able to finish the work in time. Especially installation stuff, that stuff can be really tricky.

Intro (5 minutes)

Why Web Bots

  • Almost everything is on the internet now. A gold mine of information exists which can enable us to do so many things.

  • They're fun and impress people.

  • Automation lets us save time, prevent errors and create business value.

Uses

  • Testing systems automatically

  • Gathering information from the internet. Most of the internet is not on an API.

  • Making many poorly organized resources more accessible.

Opening Browsers

Setup (10 minutes)

Setup Selenium

If you aren't able to get this to work don't worry. This will all be recorded and the slides are all available online. I'll be open to questions after this session over email as well so feel free to ask me questions then.

  • Open your command line and type. Do pip3 instead if you have been using python3 so far.
    pip install selenium
    pip install webdriver_manager
    

Couldn't Get It To Work

  • Don't worry.

  • This will all be recorded and the slides are all available online.

  • Ask me questions after email later too.

Code Along Task (10 minutes)

  • Make the University Course Explorer Easier to Navigate

  • Problem: When you use the Handbook Course Website you can't directly jump to the Course you want

  • Take command line input - common in real life scenario.

  • Valid upper case inputs like "COMP30027" or "COMP10001".

  • Code

Approach

  • Figure out base_url

  • Construct new url base_url + extention based on input

  • Use webbrowser.open()

Exercise 1 (5 minutes) - Chapter goto

  • We want to jump to chapters of Automate The Boring Stuff with Python

  • The url for each chapter has the chapter number in the end "https://automatetheboringstuff.com/2e/chapter12/" from 0 to 20 both included.

  • Command line input automate.py 5

Selenium

Code Along Task (15 minutes)

Approach

  • Inspect element to find a pattern

  • Figure out how to match that pattern on selenium

  • Perform an action on selected element (click, read, hover, fill) and repeat this cycle

Exercise 2 (5 minutes)

  • Feel free to work with others too. If your set up isn't working feel free to just discuss with others.

  • We got everyone's names out of the web page, now lets use the same logic to try to get everyone's roles as well.

  • Previous code

Going Further

  • Things can get more complex.

  • Instead of taking command line inputs we can have a GUI or a config file.

  • More: scrolling, forms, tickets

Going Further

  • Test frameworks

  • You can even incorporate AI into all of this.

Ethical issues

Responsible bot development

Jobs

Legality and Court Battles

Bad Actors

Covid Vaccines

Automate the Boring Stuff

More Docs

Contact

Email: hdeep2@illinois.edu

Code: https://github.com/harsh183/sail21-whacky-web-bots/

Page 1 of 28