VoiceScript Technologies

Automated Multi-person, Multi-language Transcription Service

Frequently asked questions

Things you might want to know

How does the system work?
The VST system is a patented solution that employs an algorithmic ‘sound’ engine capable of determining what each speaker in a room is speaking and what they say separated from the other speakers.

Even in cases where two (or more) people are speaking at the same time, the system will accurately capture and transcribe what each speaker said (and typically the transcription document will underline the text indicating that the speakers were speaking simultaneously).

How secure is it?
The system is completely secure and conforms to existing data and file security protocols.

In fact, the entire system can be hosted and managed from within the secure IT infrastructure of an organisation and the ‘recording’ units can be installed to your IT policies (by your own IT department).

How accurate is it?
Accuracy is built up over time, based on three primary contributors. These are rapidly assembled in a variety of ways (including, but not limited to) scanning previous
statements, entering a pre-list of common words and recording mock / training interviews.Initially, accuracy is above 75%; and this will improve. In a machine-to-machine system (no manual intervention) accuracy improvements are based on increasing the capabilities of the Automatic Speach Recognition engine with scanned pages of previous transcripts and “Correction Feedback Loops”.Another option is to employ Agents at the onset that monitor all transcriptions and results. The VST system can score and tag all transcription results and focus on specific areas
for the Agents (areas of an interview that are below your threshold accuracy requriement) so Agents do not waste their time on transriptions that were accurately transcribed
automatically. For example, you may select an accuracy score of 85% and only transcriptions that fail a confidence score of 85% are highlighted in yellow in the docuemnt and sent to the Agent.

The three largest contributors to accuracy are:

  1. Language Model: How specific words are put together?
  2. Lexicon Model: Which words are used most/least frequently?
  3. Acoustic Model: What do the words/word combinations sound like?
Does the system require user profiling?
No. Unlike all other voice-to-text systems that require you to ‘train’ the transcription engine to your voice characteristics, the patented VST system is profile-less.
How can the system be secure if there are ‘Agents’ involved in the process?
An Agent is only required if a certain word (or phrase) has not been encountered before or is indecipherable to the system. Agents can be existing staff within your organisation or security cleared individuals at a central site.For non-secure interviews there is a ‘pool’ of transcribers available.
How quickly can the transcription happen?
The VST system uses advanced multi-processing and cascading technology. This can provide a turnaround time of 1:1. A one hour (multi-person) interview is prepared in about an hour.