It seems a no brainer that the reason they are having the audio transcribed is so that they can train their own speech recognition model with actual data samples from their own software services. The transcription constitutes “tagging” of the data, which is a necessary manual process. Then you do training with 3/4 of the tagged data samples for training, and another 1/4 for “validation”, and refine your model to do speech recognition.
It seems a no brainer that the reason they are having the audio transcribed is so that they can train their own speech recognition model with actual data samples from their own software services. The transcription constitutes “tagging” of the data, which is a necessary manual process. Then you do training with 3/4 of the tagged data samples for training, and another 1/4 for “validation”, and refine your model to do speech recognition.