Slowly but steadily the podcast as a format gained more and more power as a dominant form of media. It is the way in which millions of people consume news, politics, entertainment and gossip on a daily basis. So since 2021 we’ve been actively working on preserving these audio stories that are created both by media professionals and hobbyists with a microphone on the kitchen table.
In this talk, we share key insights into how we’ve been archiving and preserving podcasts from the Netherlands for over four years. Why, after studying the distribution models of podcasts, we decided to ignore playback platforms like Apple Music or Spotify, but make use of a podcast RSS aggregator service instead. Using the Listennotes API, our script allows us to automatically gather podcasts in MP3 format together with any descriptive metadata that's included in the RSS feed by the podcaster creators. Simply adding new shows to a playlist enables us collect the latest episodes on a weekly basis. As we will walk you through our method, we go in-depth as to how we addepted MP3 as an accepted file format to ingest podcasts in our infrastructure, how we enrich episodes with additional metadata and make the shows accessible on our platforms to users. We explain our selection process using license agreements with creators and how we’re trying to get as wide of a vertical slice as possible of the Dutch podcasting landscape. Finally we address paywall related challenges that have become more frequent and that we are struggling with.This talk provides pointers that will allow anyone to get a grasp on how to preserve podcasts and make sure these stories can be told for generations to come.