How to Turn Hacker News Stories into Podcasts with AI

Introduction

While experimenting with NotebookLM, a tool released by Google that can, among other things, convert your data (research papers, YouTube videos, etc.) into conversation-style podcasts, I realized it would be great to automate podcast generation for the regular content we consume every day. As a next step, I checked if Google had released an API that I could integrate with. Unfortunately, I couldn’t find one, as Google hasn’t made this capability available as an API. However, I did find an alternative API that offers this functionality: Podcastfy.

What is Podcastfy?

According to the README on Podcastfy’s GitHub page:

“Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multilingual audio conversations using GenAI. Input content includes websites, PDFs, images, YouTube videos, as well as user-provided topics.”

In short, Podcastfy allows you to feed it content in various forms, such as URLs, YouTube videos, and more, and it generates a conversational-style podcast. This makes it a great tool for learning and exploring content in an audio format.

What are we going to build with it?

We’ll create a simple Python app that fetches the latest post URLs from Hacker News and converts all the fetched links into an engaging podcast. Below is a sample of what we’ll be building as part of this post.

sample for generated podcast

Implementation

Tools

  1. Python 3.11
  2. Python IDE of your choice
  3. uv package manager: Refer to this article

Implementation Steps

Step 0: Init the project

uv init HackerNewsPodcasts

Step 1: Install dependencies

Install the below dependencies:

[tool.poetry]
name = "hackernews-podcasts"
version = "0.1.0"
description = "A simple HackerNews article podcastfy implementation"
authors = ["madhavarora1988@gmail.com"]

[tool.poetry.dependencies]
python = "^3.8"
requests = "^2.28.0"
python-dotenv = "^0.20.0"
schedule = "^1.1.0"
pandas = "^2.0.0"
openpyxl = "^3.1.0"

[tool.poetry.dev-dependencies]
pytest = "^7.1.2"
black = "^22.3.0"
flake8 = "^4.0.1"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

Step 3: Podcastfy requires an .env file containing your Gemini (and/or OpenAI/ElevenLabs) keys for multimodal and TTS capabilities. You can find more details here.

GEMINI_API_KEY=<Gemini_key_here>

Step 2: Create a file named app.py and add the following code:

import os
import schedule
import time
from datetime import datetime
from dotenv import load_dotenv
from hn_fetcher import fetch_top_stories
from database import init_db, save_articles, export_to_excel

load_dotenv()

def main():
print("HackerNews Fetcher Started!")

# Delete existing database if it exists
if os.path.exists('hackernews.db'):
print("Removing old database...")
os.remove('hackernews.db')

# Initialize the database
init_db()

def fetch_articles():
print(f"\nFetching articles at {datetime.now()}")
articles = fetch_top_stories(limit=10) # Get top 10 stories
save_articles(articles)
print("Articles fetched and saved!")

# Export to Excel
excel_file = export_to_excel()
print(f"Articles exported to Excel: {excel_file}")

# Print the latest articles
for article in articles:
print(f"\nTitle: {article['title']}")
print(f"URL: {article['url']}")
print("-" * 50)

# Schedule the job to run daily at 9 AM
schedule.every().day.at("09:00").do(fetch_articles)

# Run once immediately when starting
fetch_articles()

# Keep the script running
while True:
schedule.run_pending()
time.sleep(60)

if __name__ == "__main__":
main()

In the above script, we fetch the top stories from Hacker News, save them to a database, and also export them to an Excel file. While writing to the Excel file, we ensure that URLs for the stories are included, as these URLs will be used to create a podcast for the entire feed.

Step 3: Create a file named generate_podcast.py and add the following content:

import pandas as pd
import requests
import os
from bs4 import BeautifulSoup
from podcastfy.client import generate_podcast
from datetime import datetime

def read_urls_from_excel(excel_file_path):
try:
# Read the Excel file
df = pd.read_excel(excel_file_path)

# Find columns that might contain URLs
url_columns = [col for col in df.columns if 'url' in col.lower()]

if not url_columns:
print("No columns containing 'url' found in the Excel file.")
print("Available columns:", df.columns.tolist())
return []

all_urls = []
# Collect URLs from each relevant column
for column in url_columns:
# Get URLs, skipping empty or NaN values
urls = df[column].dropna().tolist()
all_urls.extend(urls)

# Get first 3 URLs
first_three_urls = all_urls[:3]

# Print for verification
print("\nFirst 3 URLs:")
print("-" * 50)
for i, url in enumerate(first_three_urls, 1):
print(f"{i}. {url}")

# return first_three_urls
except Exception as e:
print(f"Error reading Excel file: {e}")
return []
return first_three_urls

if __name__ == "__main__":
# Replace this with your Excel file path
excel_file = "exports/hackernews_articles.xlsx"
urls_list = read_urls_from_excel(excel_file)
print(f"\nStored URLs list: {urls_list}")
conversation_config = {
"podcast_name": "Hacker News Podcast",
}

audio_file = generate_podcast(urls=urls_list, tts_model="gemini", conversation_config=conversation_config)
if audio_file:
print(f"\nPodcast saved")
else:
print("\nNo audio file was generated")

In the above script, we fetch the URLs and use the Podcastfy API to convert them into a podcast.

Step 4: Run the script below to fetch the content from Hacker News:

python app.py

After completing the above step, you will find an Excel file containing all the fetched content.

Step 5: Run the next script to pass the urls to the pocastfy api.

python generate_podcast.py

Final Result

After successful execution, the generated podcast and its associated transcripts will be available in the data folder.

Output

generated podcast

Note: To keep things concise, I’ve omitted some of the code details, but you can check out the complete implementation in the GitHub repository for more information.

Conclusion

This article demonstrated how to use Podcastfy to convert Hacker News content into an engaging podcast with transcripts. By combining automation and AI, we streamlined content consumption and opened up possibilities for personalized and accessible learning.

What ideas do you have for using Podcastfy? Share your thoughts or projects in the comments — we’d love to hear from you!

Resources


🌟 Stay Connected! 🌟

I love sharing ideas and stories here, but the conversation doesn’t have to end when the last paragraph does. Let’s keep it going!

🔹Website : https://madhavarora.net

🔹 LinkedIn for professional insights and networking: https://www.linkedin.com/in/madhav-arora-0730a718/

🔹 Twitter for daily thoughts and interactions:https://twitter.com/MadhavAror

🔹 YouTube for engaging videos and deeper dives into topics: https://www.youtube.com/@aidiscoverylab

Got questions or want to say hello? Feel free to reach out to me at madhavarorabusiness@gmail.com. I’m always open to discussions, opportunities, or just a friendly chat. Let’s make the digital world a little more connected!

Leave a Comment

Your email address will not be published. Required fields are marked *