Sesla Transcriber

Matthias Sperber

This speech transcription / labeling tool is targeted for medium to long speeches, such as lectures and presentations, but has also been used for other tasks like meeting transcription. It features intuitive user interface that is easy to learn for first-time users, while allowing advanced users to transcribe rapidly via various strategies. The tool has been described in our SLT 2014 paper, and our SLT 2014 demo abstract. Preliminary versions of this tool have also been used for the user studies in our Interspeech 2013 and TACL 2014 papers. The latest version (0.10.8) can be accessed here, a user manual here.

Matthias Sperber

An overview of the features:

  • Free for academic purposes.
  • Can be used with .wav and .ogg audio.
  • Transcription or labeling is performed segment by segment.
  • Transcription can be from scratch based on a 'blind' segmentation, or by editing an initial transcript.
  • Initial transcripts can be imported from word-aligned speech recognizer output in the standard .CTM (NIST) and .MLF (HTK) file formats, and segment-aligned subtitles in the standard .SRT file format.
  • Labeling can be done via input of unconstrained text or numbers, or by specifying a limited number of categories, which can be either exclusive or non-exclusive.
  • Word confidences of the initial transcript, if provided, are visualized, thereby helping the transcriber to more easily identify errors.
  • If word-aligned initial transcripts are used, segments that are of pleasant size for transcribing can be automatically created via heuristic rules.
  • Time-Budgeted transcription: Allows specifying a time budget, upon which a selection of segments is chosen (and regularly updated) that is predicted to result in maximum error reduction, subject to the budget. Uses user-specific predictive cost models that are automatically learned while transcribing. Requires a confidence-annotated ASR transcript. For details, check our paper at SLT 2014.
Matthias Sperber
  • Most function accessible via keyboard
  • Easy navigation with vertical waveform
  • Integrated English spell checker
  • Visualization of segment confidences, and possibility to tag segments as correct, incorrect, unsure, ..
  • Runs under Windows and Mac, requires Java 7
  • Details on the data formats can be found here
This is work in progress. If you have questions, comments, feature requests, or bug reports, please send an email to .