Wacky Web Bots

Note: Recorded for you all to be able to watch later

Email: hdeep2@illinois.edu

Outline

  • webbrowser built in library to navigate to course pages and book chapters

  • selenium to scrape DSC information

Time Suggestions

  • For those following along async or reusing these slides (which is 100% allowed), I've added timing guides throughout the slides as well. These are just suggestions so don't worry if you're not able to finish the work in time. Especially installation stuff, that stuff can be really tricky.

Intro (5 minutes)

About Me (general)

  • I'm Harsh Deep, a junior studying Statistics and Computer Science at UIUC. hdeep2@illinois.edu

  • I like teaching: I've spent 5 semesters working as CA helping teach intro CS 125.

  • I also like open source, hci research (human ai teaming and eyetracking), cats, reading and watching interesting animation from around the world.

Why Web Bots

  • Almost everything is on the internet now. A gold mine of information exists which can enable us to do so many things.

  • They're fun and impress people.

  • Automation lets us save time, prevent errors and lets us focus on more important things.

Not A Gimmick

  • I used to think they were a gimmick

  • then in my first tech intership I created three different bots for extracting event info and social media automation

  • Not only can they make things easier in your life, they can also create business value.

Uses

  • Testing systems automatically

  • Gathering information from the internet. Most of the internet is not on an API.

  • Making many poorly organized resources more accessible.

Opening Browsers

Setup (10 minutes)

Setup Selenium

If you aren't able to get this to work don't worry. This will all be recorded and the slides are all available online. I'll be open to questions after this session over email as well so feel free to ask me questions then.

  • Open your command line and type. Do pip3 instead if you have been using python3 so far.
    pip install selenium
    pip install webdriver_manager
    

Couldn't Get It To Work

  • Don't worry.

  • This will all be recorded and the slides are all available online.

  • Ask me questions after email later too.

Code Along Task (10 minutes)

  • Make the UIUC Course Explorer Easier to Navigate (20 minutes)

  • Problem: When you use the UIUC Course Website you can't directly jump to the Course you want

  • Take command line input - common in real life scenario.

  • Valid upper case inputs like "CS 125" or "CWL 114".

  • Final code

Exercise 1 (5 minutes) - Chapter goto

  • We want to jump to chapters of Automate The Boring Stuff with Python

  • The url for each chapter has the chapter number in the end "https://automatetheboringstuff.com/2e/chapter12/" from 0 to 20 both included.

  • Command line input automate.py 5

JS Approach (complex)

Selenium

Code Along Task (15 minutes)

  • Get the names of all the wonderful CS 125 Staff

  • They're an amazing set of people so perhaps we can make them all a nice thank you message.

  • Final code

Exercise 2 (5 minutes)

  • Feel free to work with others too. If your set up isn't working feel free to just discuss with others.

  • We got everyone's names out of the web page, now lets use the same logic to try to get everyone's roles as well.

  • Use the code we've written so far as a starting point.

Going Further

  • Things can get more complex.

  • Instead of taking command line inputs we can have a GUI or a config file.

  • More: scrolling, forms, tickets

Going Further

  • Test frameworks

  • You can even incorporate AI into all of this.

Ethical issues

Responsible bot development

Jobs

Legality and Court Battles

Bad Actors

Covid Vaccines

Automate the Boring Stuff

More Docs

Contact

Email: hdeep2@illinois.edu