Learn how to use Microsoft Azure for video captioning.
Accessibility is important. In our continued pursuit to understand Microsoft Azure, it’s capabilities, and what additional business value it can provide, particularly in the non-traditional areas (i.e. infrastructure-as-a-service), we came across Azure Cognitive Services and Media Services.
Cognitive Services piqued our interest due to the limitless possibility of artificial intelligence across speech, vision, and decision making. Early on, we explored QnA Maker to develop a Skype for Business integrated chatbot to simplify information sharing across the Third Octet team – simply ask a question, such as “What are Third Octet’s core values?” and the chatbot would provide the answer. It was a bit tedious at first with a heavy manual focus; however, over time, we continue to feed and train the model to develop its own answers. It has been a fun experimental project but limited to, largely, an internal-only sandbox.
On the other hand, Cognitive Services provided additional value that we immediately put to use. As a matter of fact, when reviewing lengthier content that we tend to share on our website, we found that time on pages did not align to the suggested reading time we provided. On further review of metrics, we realized that though the content is valuable (in our humble opinion), visitors tended to either prefer additional consumption formats (i.e. audio) or, unfortunately, may not be able to consume written content due to accessibility. Our negligence – we can do better.
The Accessibility for Ontarians with Disabilities Act (AODA) is also enforcing compliance with accessibility challenges around many forms of content, including content we make readily available on our website. As a small organization, we are not required to comply with the AODA requirements; however, why shouldn’t we, or any business for that matter.
Azure Cognitive Services made our process of accessible content extremely straightforward, particularly text-to-speech, which we tested against a lengthy article around Citrix Summit Series. Once Azure Cognitive Services Speech Services was deployed in our Azure subscription, it was a matter of supplying the text (through CLI, API, or Speech Studio), running conversion jobs to generate audio files, and embedding the audio content on our site.
We will continue to provide audio formats of our text content moving forward.
Video is becoming an increasing area of content creation for us and live captioning is another area that we need to consider for accessibility. Certainly, there are many providers out there that can facilitate lived captioning creation or, as a matter of fact, you can certainly handle transcription yourself. Yet, our pursuits in Azure suggested we could continue taking advantage of native services to aid in this process as well.
Enter Azure Media Services.
Azure Media Services allows organizations to “manage, transform and deliver media content with cloud-based workflows” and that sounds like it could certainly help us create live captioning for our video content. Long story short, it did, and today we are going to walk you through exactly how to do this yourself.
One absolute requirement is a Microsoft Azure Subscription – either an existing paid subscription, a trial subscription or, alternatively, a subscription procured through Third Octet (yes, we can help with that). If you have a subscription, great – let us get started.
- Access your Microsoft Azure Portal at https://portal.azure.com.
- Once inside the Azure Portal, search for Media Services in the search box and click.
- In the Media Services blade, click Add.
- Under the new blade, create media service account, you will need to select, create, or provide some relevant information including subscription, resource group, media services account name, location, and storage account.
- For Subscription, and in most situations, you will likely only have one subscription. Choose the only subscription or the most appropriate subscription for this service account.
- Resource Groups are an important area to consider – not specifically for Media Services, but as a rule of thumb for organizing your resources in Azure. For example, we use Azure for both production and testing purposes – our production resources are allocated across several resource groups depending on function and location whereas our testing resources are located within our own unique and individualized resource group. In the end, how you define your resource group structure is dependent on your objectives and requirements about resource collection.
- Provide a Media services account name to reflect your naming conventions and objective behind this media services account.
- Select the location where the media services account will reside.
- If you have an existing storage account, select it; otherwise, create a new storage account.
- Lastly, you must acknowledge that you have all the rights to use the content/file that you will be submitting to Azure Media Services.
- Click Next: Tags to move on.
- Tags are another important consideration for Azure resources. Beyond the scope of this quick “how-to”, have a read through Microsoft’s recommended naming and tagging conventions. Briefly, tags allow you to include information about resources and assets in Azure that you may not be able to include in the resource name, such as additional context about the resources, the workload, what business unit it impacts, or similar. One critical tag we use at Third Octet is to differentiate between production and test. A few simple tags allow us to easily identify who owns the resource, the importance of the resource, when the resource can safely be removed, and what business unit is impacted.
- Once you are satisfied with the tags, click Next: Review + create. If you are shown with a Validation passed, click Create and allow the job to complete.
- Once the Media Services account has been provisioned, we can start generating live captions from our video content. Browse (by searching or through resources) to your newly created Media Services account and select the media services account you had just created.
- Once inside the media services account blade, select Assets.
- The Assets blade is where we will upload our video content. Click Upload. Under the new blade to upload new assets, the storage account created for this media service should be selected. Click the blue folder icon to select a video file from your local computer. In our situation, we are using an MP4 file.
- Do not forget to acknowledge your rights to the file/content and select I agree, upload, and close. The file will upload, and you can track the progress in the notifications blade. Once the upload has completed, we can return to the Assets
- Back in the Assets blade, click the file you had just uploaded. In this new blade, we will be provided details around the video asset – what media services account, name, storage container, creation date, and a description (if provided). From here, we want to click Add Job.
- From the create a job blade, we will first want to create a new transform. A transform is a set of workflow or tasks to perform on the video – in this instance, we are looking to handle audio transcription, converting speech to text. We can provide that as the transform name which we can continue to use for subsequent videos.
- Under Automatic language detection, feel free to choose yes; however, it is best to select the language you know is present in the video.
- Under Configure output, you can leave the output asset name as is or, alternatively, provide a more descriptive name if your original video name is rather non-descriptive. Also, provide the storage account used for media services and a descriptive job name.
- We can now click Create to proceed with the job. The job can take some time – be patient and track the progress under Notifications.
- Once the job has completed, returning to the media services account, you will see an additional file reflecting the name defined under output asset name from the configure output screen prior.
- Click that file and in the new blade, click the storage container A new window will open and provide you with all the files created from our speech-to-text job.
- The transcriptions are provided in two formats – TimedText (TTML) and WebVTT (VVT). Select the file that is required by your video hosting platform. For example, YouTube supports both VTT and TTML. For us, we use a platform that supports only SubRip (SRT) that required us to convert from these formats to SRT (which is also extremely easy).
- You can now upload both the video file and transcription file to your preferred video platform and provide accessible video content complete with transcriptions.