Automation Academy: How I Turn Voice Recordings into Searchable Obsidian Notes with Shortcuts, Hazel, and LLMs

As I mentioned last week in the MacStories Weekly newsletter and have been hinting recently on both Connected and AppStories, I’m in the process of building a “perfect memory” system in Obsidian that allows me to save, archive, and search anything I write, think about, or come across on the Internet. This project is a work in progress comprised of different components that need to fall into place, and for this Automation Academy lesson, I’m going to focus on a project that I’ve been refining for a while: automating my voice recordings and turning them into searchable notes in Obsidian.
A couple months ago, I realized that I’ve become the sort of person who likes to brainstorm ideas and tasks by, well, talking out loud and recording myself. I mostly do this when I’m driving alone (technically, with my two dogs in the backseat) or doing chores around the house. I find the process oddly relaxing and better than taking typed notes. Perhaps 12 years of podcasting every week have rewired my brain so that I make better connections between ideas by talking about them, or maybe it’s a byproduct of “shifting modes” and feeling less constrained by the absence of a keyboard and text editor in front of me. Regardless, I’ve been recording myself talking about stuff I have on my mind or need to do for a while now, and I love the process.
After I started recording myself, I quickly realized that I needed those rambling voice recordings to be more than just audio files in a folder. Rather, I wanted to turn them into structured notes in Obsidian containing actionable items extracted from the recording session. I also wanted them to be searchable with Obsidian Copilot, easy to reference, and – ideally – automatically organized with lots of metadata, a summary, and a list of key tasks from the voice recording.
That’s why, after a lot of experiments, I built a hybrid automation to bridge spoken words and Markdown – a system that combines the non-deterministic nature of human language and messy voice recordings with the reliability of Shortcuts, the power of Hazel rules on macOS, and the flexibility of LLMs, which are ideal for processing natural language. The system revolves around a shortcut called Process Transcript that takes the raw transcript of a voice recording and turns it into a structured note in Obsidian, complete with a summary, action items, an embedded audio player, and an internal link to the full transcript.
Building this system has been a fun and informative journey, and today, I want to show you how I did it.