Whacky Web Bots

Note: Recorded for you all to be able to watch later

Email: hdeep2@illinois.edu

Outline

  • 1st hour - Intro and background, making a simple program to improve navigation on UIUC Course Website

  • 2nd hour - Selenium and having a program control the web browser to collect data. Ending with ethical considerations and questions.

Time Suggestions

  • For those following along async or reusing these slides (which is 100% allowed), I've added timing guides throughout the slides as well. These are just suggestions so don't worry if you're not able to finish the work in time. Especially installation stuff, that stuff can be really tricky.

Intro (10 minutes)

About Me (general)

  • I'm Harsh Deep, a junior studying Statistics and Computer Science at UIUC. hdeep2@illinois.edu

  • I like teaching: I've spent 5 semesters working as CA helping teach intro CS 125.

  • I also like open source, hci research (human ai teaming and eyetracking), cats, reading and watching interesting animation from around the world.

Why Web Bots

  • Almost everything is on the internet now. A gold mine of information exists which can enable us to do so many things.

  • They're fun and impress people.

  • Automation lets us save time, prevent errors and lets us focus on more important things.

Not A Gimmick

  • I used to think they were a gimmick

  • then in my first tech intership I created three different bots for extracting event info and social media automation

  • Not only can they make things easier in your life, they can also create business value.

Uses

  • Testing systems automatically

  • Gathering information from the internet. Most of the internet is not on an API.

  • Making many poorly organized resources more accessible.

Opening Browsers

Setup Python (15 minutes)

Couldn't Get It To Work

  • Don't worry.

  • This will all be recorded and the slides are all available online.

  • Ask me questions after email later too.

Code Along Task (20 minutes)

  • Make the UIUC Course Explorer Easier to Navigate (20 minutes)

  • Problem: When you use the UIUC Course Website you can't directly jump to the Course you want

Exercise 1 (10 minutes) - Chapter goto

  • Lets say we have a book that we really like named Automate The Boring Stuff with Python, but we want to jump into a chapter directly at once.

Spec

  • The url for each chapter has the chapter number in the end "https://automatetheboringstuff.com/2e/chapter12/" from 0 to 20 both included.

  • (Programmers have this weird obession with starting everything from 0)

  • Make a program that takes the command line input automate.py 5 and have the browser jump to the right page.

JS Approach (complex)

Selenium

Setup Selenium (15 minutes)

If you aren't able to get this to work don't worry. This will all be recorded and the slides are all available online. I'll be open to questions after this session over email as well so feel free to ask me questions then.

  • Open your command line and type. Do pip3 instead if you have been using python3 so far.
    pip install selenium
    pip install webdriver_manager
    

Code Along Task (20 minutes)

Exercise 2 (10 minutes)

Feel free to work with others too. If your set up isn't working feel free to just discuss with others.

  • We got everyone's names out of the web page, now lets use the same logic to try to get everyone's descriptions as well.

  • Use the code we've written so far as a starting point.

  • Make sure the descriptions match up with the right people :)

  • Don't worry if you aren't able get this in time, the point is to get you thinking in the right direction.

  • If you aren't able to get this into code, having just a conceptual explaination is fine too. Focus on using inspect element to figure out which selectors can give the right data.

Going Further

  • Things can get more complex.

  • Instead of taking command line inputs we can have a GUI or a config file.

  • More: scrolling, forms, tickets

Going Further

  • Test frameworks

  • You can even incorporate AI into all of this.

Ethical issues

Responsible bot development

Jobs

Legality and Court Battles

Bad Actors

Covid Vaccines

Automate the Boring Stuff

More Docs

Contact

Email: hdeep2@illinois.edu

Page 1 of 31

TODO: Insert link to repo with close ish example code