AI Experiments

Abstract

As the world goes crazy about AI (read large/medium/small Language Models please), we decided to run some experiments to see how AI can help document and enrich the archives. What started as a simple transcript summarization experiment has grown into a suite of AI-assisted workflows.

How We Started

Our first experiments used Fabric and Ollama running locally:

  1. Extracting YouTube transcripts (Fabric’s yt script)
  2. Passing them to Ollama for summarization (Fabric’s summarize prompt)
  3. Ollama ran llama3:70b for the initial summarization pass
  4. Posting the summary at the bottom of each talk page
  5. Tagging each page with specific security topics, resulting in collections under the Focus Areas section

This gave us the foundation β€” but we’ve since built dedicated tooling that goes much further.

What We Do With AI Now

1. Transcript Download & AI Summarization

We download YouTube transcripts for every talk that has a video link, then generate AI summaries that appear on the talk page.

2. AI-Assisted Tagging

Talks without tags can be tagged using AI that analyzes the title, abstract, and optionally the full transcript to suggest security-relevant tags from our taxonomy.

3. Conference Content Extraction

Given a conference schedule URL, AI extracts structured talk entries (title, speaker, abstract, links) and generates ready-to-use markdown files.

4. Missing Abstract Extraction

For talks that have a conference link but no abstract, AI can visit the page, extract the abstract, and update the entry.

5. Tag & Focus Area Auditing

AI assists in auditing the tag taxonomy β€” suggesting which focus area orphan tags should belong to, and reviewing whether tags on individual talks are appropriate.

Caution

All AI-generated content on this site is clearly marked. These are automated processes with no guaranteed accuracy. AI summaries, tags, and extracted content may contain errors, hallucinations, or misinterpretations. Always refer to the original talk, video, or conference page for authoritative information.

Models Used

We use free-tier models via OpenRouter and locally via Ollama .

OpenRouter (Cloud)

Purpose Models
Summarization stepfun/step-3.5-flash, deepseek/deepseek-r1-0528, google/gemma-3-27b-it (selected by context size)
Tagging mistralai/mistral-small-3.2-24b-instruct, cognitivecomputations/dolphin-mistral-24b-venice-edition
Content extraction google/gemini-2.0-flash-exp, meta-llama/llama-3.2-3b-instruct
Abstract extraction mistralai/mistral-small-3.2-24b-instruct

Ollama (Local)

Purpose Models
Summarization llama3.1:70b
Tagging phi4:latest, glm-4.7-flash:latest, qwen3-coder:30b
Tag auditing phi4

Historical Models (Early Experiments)

These were used during the initial Fabric + Ollama phase:

Credit Where Credit Is Due

  1. Fabric β€” for doing a lot of legwork in terms of good prompts and the initial yt transcript extraction that kicked off this project
  2. Ollama β€” allows local language models to run on local hardware, keeping costs at zero
  3. OpenRouter β€” provides free-tier access to a wide range of models for cloud-based processing
  4. youtube-transcript-api β€” Python library for fetching YouTube transcripts
  5. yt-dlp β€” fallback transcript extraction from YouTube