Dusting Off My Python Skills

These days, when I review job descriptions, I see a lot of skills that match mine popping up. Including a lot of demand for Python skills. I’ve been writing Python for years, but I’ve spent the last few years in JavaScript/TypeScript/Node land. So I’m rusty. Being in JS land taught me a lot, but now it’s time I built something to understand how a Python backend stack works in 2025. So I’m building a simple project I’ve wanted to make for a while.

I listen to Maximum Film!. A podcast that discusses and reviews movies. It features my favorite movie reviewers, who discuss specific films and the current state of the movie industry. Love that show. My favorite segment is when they suggest an additional film to watch, based on either the one they just discussed or the feelings evoked by the conversation. They’re always great picks, and I’m always on the lookout for movie suggestions. However, the podcast has more than 400 episodes, and if I wanted a solid list of them, I’d have to go through each episode. So, I’m coming up with something better.

Enter Poetry. I’ve never used it before, but I started researching how a modern Python stack might be set up, and that was the first thing that came up. In Node land, you npm init and you’re on your way. You can handle all your packages, envs, scripts, etc, using that setup, but not so much in Python. The last Python project I built was containerized and utilized Robot Framework scripts to accomplish various tasks. Poetry manages all packages and dependencies within a virtual environment, keeping them separate from the main system. Digging into the documentation reveals a range of built-in commands for creating and publishing a project to PyPI. I’m into it. So off I went.

I want to be able to grab the titles automatically, so the first thing I considered was how to obtain the movie titles. I want to scrape some web data, like a greedy AI. However, I decided against that, as I didn’t want to have to visit the show sites and then review each entry. I instead turned to using the RSS feed. A quick search on PyPI revealed a small library called feedparser, which allows me to retrieve the RSS feed using a fetch request and parse out specific fields. A basic flow is below.

Image

The ingestion code will reach out to the RSS feed, parse out the information, and eventually save the data into a table.

So far I’ve been able to accomplish the first couple of tasks. The basic code can be broken down as follows.

def main():
    # Get Feed
    feed_res = None
    try:
        feed_res = feedparser.parse(FEED_URL)
    except:
        raise Exception("Error fetching feed results")

    # Get episodes from feed
    episodes_from_feed = get_episodes(feed_res)

    staff_picks_episode_block = {}

    for episode in episodes_from_feed:
        staff_picks_episode_block[episode.title] = extract_staff_picks_data(episode_item=episode, published=episode.published)

The bottom for loop breaks down the parsed information into a Python dictionary. Coming from every schema being a JSON object, it’s less of a culture shock for me to keep what data I can in a key-value data type.

{
     "episode_title": {
          "date": "episode_date,
          "stack_picks_html": "html_string"
     }
}

This schema will help me break down each episode of the podcast and create database entries for each title I’d like to save. Additionally, there is further parsing to be done. RSS returned each episode in HTML code, so after finding the HTML block I want I extract it.

 <p><strong>Staff Picks</strong><br/>Drea - <a href="https://www.justwatch.com/us/movie/jane-austen-wrecked-my-life"><i>Jane Austen Wrecked My Life</i></a><br/>Alonso - <a href="https://www.justwatch.com/us/movie/feng-liu-yi-dai"><i>Caught by the Tides</i></a><br/>BJ - <a href="https://www.justwatch.com/us/movie/the-ugly-stepsister"><i>The Ugly Stepsister</i></a><br/>Kevin - <a href="https://www.justwatch.com/us/movie/open-water"><i>Open Water</i></a></p>

This is a good opportunity because they often include links to additional information about the movies, which I’d like to include in the database entry for each title.

One last thing I’ll note is the extract_staff_picks_data function.

def extract_staff_picks_data(episode_item: list, published) -> dict:
    episode_content = get_episode_content(episode_item)
    soup = BeautifulSoup(episode_content.value, 'html.parser')

    # Text is _usually_ wrapped in a <p> tag and we can find it using the <strong>
    #  tag that contains the text "Staff Picks"
    staff_picks_text = soup.find('strong', string='Staff Picks')

    # Not all episodes have a staff picks section
    if staff_picks_text:
        # Finds the block where the relevant text is and returns the parent
        pick_html_block = staff_picks_text.find_parent('p')
        return {
            'published': published,
            'staff_picks_html': pick_html_block
        }

I’m using the Beautiful Soup library to extract the desired section. Another challenge, as noted in the comments, is that not every episode is configured the same way.

It’s been interesting, albeit not so different from building something using Python. The most significant delta is type safety. It makes me feel a little shaky, not having that peace of mind before building. However, type annotations exist for Python, so those will suffice. Additionally, in instances where JS uses C-like syntax, such as in for loops, using things like a range object throws me a little. However, it feels like writing pseudocode for everything, which isn’t particularly difficult and thus isn’t a great challenge. I’m gonna have to find something harder to stretch the differences between TypeScript and Python.