Abstract
As the world goes crazy about AI (read large/medium/small Language Models please), we decided to run some experiments to see how AI can help document and enrich the archives. What started as a simple transcript summarization experiment has grown into a suite of AI-assisted workflows.
How We Started
Our first experiments used Fabric and Ollama running locally:
- Extracting YouTube transcripts (Fabric’s
ytscript) - Passing them to Ollama for summarization (Fabric’s summarize prompt)
- Ollama ran llama3:70b for the initial summarization pass
- Posting the summary at the bottom of each talk page
- Tagging each page with specific security topics, resulting in collections under the Focus Areas section
This gave us the foundation β but we’ve since built dedicated tooling that goes much further.
What We Do With AI Now
1. Transcript Download & AI Summarization
We download YouTube transcripts for every talk that has a video link, then generate AI summaries that appear on the talk page.
- Transcripts are fetched via multiple backends (YouTube Transcript API, InnerTube, yt-dlp) with automatic fallback
- Summaries are generated using LLMs via OpenRouter (cloud) or Ollama (local)
- Summaries appear at the bottom of each talk page with a disclaimer
- Over 1,100 transcripts and growing
2. AI-Assisted Tagging
Talks without tags can be tagged using AI that analyzes the title, abstract, and optionally the full transcript to suggest security-relevant tags from our taxonomy.
- Suggests 3β7 tags per talk from the existing Focus Area taxonomy
- Supports dry-run preview or auto-apply mode
- Falls back to keyword-based matching when AI is unavailable
3. Conference Content Extraction
Given a conference schedule URL, AI extracts structured talk entries (title, speaker, abstract, links) and generates ready-to-use markdown files.
- Fetches and converts the webpage to clean text
- LLM extracts structured fields from the page content
- Outputs draft entries for review before adding to the archive
4. Missing Abstract Extraction
For talks that have a conference link but no abstract, AI can visit the page, extract the abstract, and update the entry.
- Fetches HTML from the
conf_linkURL - LLM extracts the relevant talk description
- Updates the markdown file in place
5. Tag & Focus Area Auditing
AI assists in auditing the tag taxonomy β suggesting which focus area orphan tags should belong to, and reviewing whether tags on individual talks are appropriate.
- Identifies orphan tags (not mapped to any focus area)
- Suggests focus area assignments for unmapped tags
- Reviews per-page tag accuracy
Caution
All AI-generated content on this site is clearly marked. These are automated processes with no guaranteed accuracy. AI summaries, tags, and extracted content may contain errors, hallucinations, or misinterpretations. Always refer to the original talk, video, or conference page for authoritative information.
Models Used
We use free-tier models via OpenRouter and locally via Ollama .
OpenRouter (Cloud)
| Purpose | Models |
|---|---|
| Summarization | stepfun/step-3.5-flash, deepseek/deepseek-r1-0528, google/gemma-3-27b-it (selected by context size) |
| Tagging | mistralai/mistral-small-3.2-24b-instruct, cognitivecomputations/dolphin-mistral-24b-venice-edition |
| Content extraction | google/gemini-2.0-flash-exp, meta-llama/llama-3.2-3b-instruct |
| Abstract extraction | mistralai/mistral-small-3.2-24b-instruct |
Ollama (Local)
| Purpose | Models |
|---|---|
| Summarization | llama3.1:70b |
| Tagging | phi4:latest, glm-4.7-flash:latest, qwen3-coder:30b |
| Tag auditing | phi4 |
Historical Models (Early Experiments)
These were used during the initial Fabric + Ollama phase:
- llama3:70b (first summarization model)
- phi3.5:latest
Credit Where Credit Is Due
- Fabric
β for doing a lot of legwork in terms of good prompts and the initial
yttranscript extraction that kicked off this project - Ollama β allows local language models to run on local hardware, keeping costs at zero
- OpenRouter β provides free-tier access to a wide range of models for cloud-based processing
- youtube-transcript-api β Python library for fetching YouTube transcripts
- yt-dlp β fallback transcript extraction from YouTube