Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

A native Unity plugin to convert speech to text on Android & iOS

yasirkula/UnitySpeechToText

Folders and files, repository files navigation, unity speech to text plugin for android & ios.

Discord: https://discord.gg/UJJt549AaV

GitHub Sponsors ☕

This plugin helps you convert speech to text on Android (all versions) and iOS 10+. Offline speech recognition is supported on Android 23+ and iOS 13+ if the target language's speech recognition model is present on the device.

Note that continuous speech detection isn't supported so the speech recognition sessions automatically end after a short break in the speech or when the OS-determined time limits are reached.

INSTALLATION

There are 4 ways to install this plugin:

  • import SpeechToText.unitypackage via Assets-Import Package
  • clone/ download this repository and move the Plugins folder to your Unity project's Assets folder
  • "com.yasirkula.speechtotext": "https://github.com/yasirkula/UnitySpeechToText.git",
  • openupm add com.yasirkula.speechtotext

There are two ways to set up the plugin on iOS:

a. Automated Setup for iOS

  • (optional) change the values of Speech Recognition Usage Description and Microphone Usage Description at Project Settings/yasirkula/Speech to Text

b. Manual Setup for iOS

  • see: https://github.com/yasirkula/UnitySpeechToText/wiki/Manual-Setup-for-iOS

KNOWN ISSUES

  • Speech session returned error code 12 on a single Android test device (regardless of target language) and couldn't be started

NOTE: The codebase is documented using XML comments so this section will only briefly mention the functions.

You should first initialize the plugin via SpeechToText.Initialize( string preferredLanguage = null ) . If you don't provide a preferred language (in the format " en-US "), the device's default language is used. You can check if a language is supported via SpeechToText.IsLanguageSupported( string language ) .

After initialization, you can query SpeechToText.IsServiceAvailable( bool preferOfflineRecognition = false ) and SpeechToText.IsBusy() to see if a speech recognition session can be started. Most operations will fail while the service is unavailable or busy.

Before starting a speech recognition session, you must make sure that the necessary permissions are granted via SpeechToText.CheckPermission() and SpeechToText.RequestPermissionAsync( PermissionCallback callback ) functions. If permission is Denied , you can call SpeechToText.OpenSettings() to automatically open the app's Settings from where the user can grant the necessary permissions manually (Android: Microphone, iOS: Microphone and Speech Recognition). On Android, the speech recognition system also requires the Google app to have Microphone permission. If not, its result callback will return error code 9. In that scenario, you can notify the user and call SpeechToText.OpenGoogleAppSettings() to automatically open the Google app's Settings from where the user can grant it the Microphone permission manually.

To start a speech recognition session, you can call SpeechToText.Start( ISpeechToTextListener listener, bool useFreeFormLanguageModel = true, bool preferOfflineRecognition = false ) . Normally, sessions end automatically after a short break in the speech but you can also stop the session manually via SpeechToText.ForceStop() (processes the speech input so far) or SpeechToText.Cancel() (doesn't process any speech input and immediately invokes the result callback with error code 0). The ISpeechToTextListener interface has the following functions:

  • OnReadyForSpeech()
  • OnBeginningOfSpeech()
  • OnVoiceLevelChanged( float normalizedVoiceLevel )
  • OnPartialResultReceived( string spokenText )
  • OnResultReceived( string spokenText, int? errorCode )

EXAMPLE CODE

Sponsor this project.

  • https://yasirkula.itch.io/unity3d/donate
  • Objective-C++ 28.7%

Speech Recognition in Unity3D – The Ultimate Guide

Speech Recognition in Unity3D – The Ultimate Guide

​There are three main strategies in converting user speech input to text:

  • Voice Commands
  • Free Dictation

These strategies exist in any voice detection engine (Google, Microsoft, Amazon, Apple, Nuance, Intel, or others), therefore the concepts described here will give you a good reference point to understand how to work with any of them. In today’s article, we’ll explore the differences of each method, understand their use-cases, and see a quick implementation of the main ones.

Prerequisites

To write and execute code, you need to install the following software:

  • Visual Studio 2019 Community

Unity3D is using a Microsoft API that works on any Windows 10 device (Desktop, UWP, HoloLens, XBOX). Similar APIs also exist for Android and iOS.

Did you know?…

LightBuzz has been helping Fortune-500 companies and innovative startups create amazing Unity3D applications and games. If you are looking to hire developers for your project, get in touch with us.

Source code

The source code of the project is available in our LightBuzz GitHub account. Feel free to download, fork, and even extend it!

1) Voice commands

We are first going to examine the simplest form of speech recognition: plain voice commands.

Description

Voice commands are predictable single words or expressions, such as:

  • “Forward”
  • “Left”
  • “Fire”
  • “Answer call”

The detection engine is listening to the user and compares the result with various possible interpretations. If one of them is near the spoken phrase within a certain confidence threshold, it’s marked as a proposed answer.

Since that is a “one or anything” approach, the engine will either recognize the phrase or nothing at all.

This method fails when you have several ways to say one thing. For example, the words “hello”, “hi”, “hey there” are all forms of greeting. Using this approach, you have to define all of them explicitly.

This method is useful for short, expected phrases, such as in-game controls.

Our original article includes detailed examples of using simple voice commands. You may also check out the Voice Commands Scene  on the  sample project .

Below, you can see the simplest C# code example for recognizing a few words:

2) Free Dictation

To solve the challenges of simple voice commands, we shall use the dictation mode.

While the user speaks in this mode, the engine listens for every possible word. While listening, it tries to find the best possible match of what the user meant to say.

This is the mode activated by your mobile device when you speak to it when writing a new email using voice. The engine manages to write the text in less than a second after you finish to say a word.

Technically, this is really impressive, especially considering that it compares your voice across multi-lingual dictionaries, while also checking grammar rules.

Use this mode for free-form text. If your application has no idea what to expect, the Dictation mode is your best bet.

You can see an example of the Dictation mode in the sample project   Dictation Mode Scene . Here is the simplest way to use the Dictation mode:

As you can see, we first create a new dictation engine and register for the possible events.

  • It starts with DictationHypothesis events, which are thrown really fast as the user speaks. However, hypothesized phrases may contain lots of errors.
  • DictationResult is an event thrown after the user stops speaking for 1–2 seconds. It’s only then that the engine provides a single sentence with the highest probability.
  • DictationComplete is thrown on several occasions when the engine shuts down. Some occasions are irreversible technical issues, while others just require a restart of the engine to get back to work.
  • DictationError is thrown for other unpredictable errors.

Here are two general rules-of-thumb:

  • For the highest quality, use DictationResult .
  • For the fastest response, use DictationHypothesis .

Having both quality and speed is impossible with this technic.

Is it even possible to combine high-quality recognition with high speed?

Well, there is a reason we are not yet using voice commands as Iron Man does: In real-world applications, users are frequently complaining about typing errors, which probably happens only less than 10% of the cases… Dictation has many more mistakes than that.

To increase accuracy and keep the speed fast at the same time, we need the best of both worlds — the freedom of the Dictation and the response time of the Voice Commands.

The solution is  Grammar Mode . This mode requires us to write a dictionary. A dictionary is an XML file that defines various rules for the things that the user will potentially say. This way, we can ignore languages we don’t need, and phrases the user will probably not use.

The grammar file also explains to the engine what are the possible words it can expect to receive next, therefore shrinking the amount from ANYTHING to X. This significantly increases performance and quality.

For example, using a Grammar, we could greet with either of these phrases:

  • “Hello, how are you?”
  • “Hi there”
  • “Hey, what’s up?”
  • “How’s it going?”

All of those could be listed in a rule that says:

If the user started saying something that sounds like” Hello”, it would be easily differentiated from e.g “Ciao”, compared to being differentiated also from e.g. “Yellow” or “Halo”.

We are going to see how to create our own Grammar file in a future article.

For your reference, this is the official specification for structuring a Grammar file .

In this tutorial, we described two methods of recognizing voice in Unity3D: Voice Commands and Dictation. Voice Commands are the easiest way to recognize pre-defined words. Dictation is a way to recognize free-form phrases. In a future article, we are going to see how to develop our own Grammar and feed it to Unity3D.

Until then, why don’t you start writing your code by speaking to your PC?

You made it to this point? Awesome! Here is the source code for your convenience.

Before you go…

Sharing is caring.

If you liked this article, remember to share it on social media, so you can help other developers, too! Also, let me know your thoughts in the comments below. ‘Til the next time… keep coding!

Shachar Oz

Shachar Oz is a product manager and UX specialist with extensive experience with emergent technologies, like AR, VR and computer vision. He designed Human Machine Interfaces for the last 10 years for video games, apps, robots and cars, using interfaces like face tracking, hand gestures and voice recognition. Website

You May Also Like

Lightbuzz body tracking sdk version 6.

LightBuzz

Kinect is dead. Here is the best alternative

Vangos Pterneas

Product Update: LightBuzz SDK version 5.5

11 comments.

' src=

Hello, I have a question, while in unity everything works perfectly, but when I build the project for PC, and open the application, it doesn’t work. Please help.

' src=

hi Omar, well, i have built it will Unity 2019.1 as well as with 2019.3 and it works perfectly.

i apologize if it doesn’t. please try to make a build from the github source code, and feel free to send us some error messages that occur.

i apologize if it doesn’t. please try to make a build from the github source code, and feel free to send us some error messages that occur.

' src=

Hello, I’m trying Dictation Recognizer and I want to change the language to Spanish but I still don’t quite get it. Can you help me with this?

' src=

hi Alexis, perhaps check if the code here could help you: https://docs.microsoft.com/en-us/windows/apps/design/input/specify-the-speech-recognizer-language

' src=

You need an object – protected PhraseRecognizer recognizer;

in the example nr 1. Take care and thanks for this article!

Thank you Carl. Happy you liked it.

' src=

does this support android builds

Hi there. Sadly not. Android and ios have different speech api. this api supports microsoft devices.

' src=

Any working example for the grammar case?

Well, you can find this example from Microsoft. It should work anyway on PC. A combination between Grammar and machine learning is how most of these mechanisms work today.

https://learn.microsoft.com/en-us/dotnet/api/system.speech.recognition.grammar?view=netframework-4.8.1#examples

Leave a Reply Cancel Reply

Save my name, email, and website in this browser for the next time I comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed .

© 2022 LIGHTBUZZ INC. Privacy Policy & Terms of Service

Privacy Overview

Manual Setup for iOS - yasirkula/UnitySpeechToText GitHub Wiki

  • set the value of Automated Setup to false at Project Settings/yasirkula/Speech to Text
  • build your project
  • enter Camera Usage Description and Speech Recognition Usage Description in Xcode
  • insert -weak_framework Speech -weak_framework Accelerate to the Other Linker Flags of Unity-iPhone Target (and UnityFramework Target on Unity 2019.3 or newer)
  • lastly, remove Speech .framework and Accelerate.framework from Link Binary With Libraries of Unity-iPhone Target (and UnityFramework Target on Unity 2019.3 or newer) in Build Phases , if exists

Unity Speech Recognition

This article serves as a comprehensive guide for adding on-device Speech Recognition to an Unity project.

When used casually, Speech Recognition usually refers solely to Speech-to-Text . However, Speech-to-Text represents only a single facet of Speech Recognition technologies. It also refers to features such as Wake Word Detection , Voice Command Recognition , and Voice Activity Detection ( VAD ). In the context of Unity projects, Speech Recognition can be used to implement a Voice Interface .

Fortunately Picovoice offers a few tools to help implement Voice Interfaces . If all that is needed is to recognize when specific phrases or words are said, use Porcupine Wake Word . If Voice Commands need to be understood and intent extracted with details (i.e. slot values), Rhino Speech-to-Intent is more suitable. Keep reading to see how to quickly start with both of them.

Picovoice Unity SDKs have cross-platform support for Linux , macOS , Windows , Android and iOS !

Porcupine Wake Word

To integrate the Porcupine Wake Word SDK into your Unity project, download and import the latest Porcupine Unity package .

Sign up for a free Picovoice Console account and obtain your AccessKey . The AccessKey is only required for authentication and authorization.

Create a custom wake word model using Picovoice Console.

Download the .ppn model file and copy it into your project's StreamingAssets folder.

Write a callback that takes action when a keyword is detected:

  • Initialize the Porcupine Wake Word engine with the callback and the .ppn file name (or path relative to the StreamingAssets folder):
  • Start detecting:

For further details, visit the Porcupine Wake Word product page or refer to Porcupine's Unity SDK quick start guide .

Rhino Speech-to-Intent

To integrate the Rhino Speech-to-Intent SDK into your Unity project, download and import the latest Rhino Unity package .

Create a custom context model using Picovoice Console.

Download the .rhn model file and copy it into your project's StreamingAssets folder.

Write a callback that takes action when a user's intent is inferred:

  • Initialize the Rhino Speech-to-Intent engine with the callback and the .rhn file name (or path relative to the StreamingAssets folder):
  • Start inferring:

For further details, visit the Rhino Speech-to-Intent product page or refer to Rhino's Android SDK quick start guide .

Subscribe to our newsletter

More from Picovoice

Blog Thumbnail

Learn how to perform Speech Recognition in JavaScript, including Speech-to-Text, Voice Commands, Wake Word Detection, and Voice Activity Det...

Blog Thumbnail

Learn how to perform Speech Recognition in iOS, including Speech-to-Text, Voice Commands, Wake Word Detection, and Voice Activity Detection.

Blog Thumbnail

Learn how to perform Speech Recognition in the web using React, including Voice Commands and Wake Word Detection.

Blog Thumbnail

Learn how to perform Speech Recognition in Android, including Speech-to-Text, Voice Commands, Wake Word Detection, and Voice Activity Detect...

Blog Thumbnail

Learn how to perform Speech Recognition in Python, including Speech-to-Text, Voice Commands, Wake Word Detection, and Voice Activity Detecti...

Blog Thumbnail

New releases of Porcupine Wake Word and Rhino Speech-to-Intent engines add support for Arabic, Dutch, Farsi, Hindi, Mandarin, Polish, Russia...

Blog Thumbnail

Learn how to create offline voice assistants like Alexa or Siri that run fully on-device using an STM32 microcontroller.

Blog Thumbnail

Learn how to transcribe speech to text on an Android device. Picovoice Leopard and Cheetah Speech-to-Text SDKs run on mobile, desktop, and e...

  • Case Studies
  • Support & Services
  • Asset Store

speech to text unity github

Search Unity

speech to text unity github

A Unity ID allows you to buy and/or subscribe to Unity products and services, shop in the Asset Store and participate in the Unity community.

  • Discussions
  • Evangelists
  • User Groups
  • Beta Program
  • Advisory Panel

You are using an out of date browser. It may not display this or other websites correctly. You should upgrade or use an alternative browser .

  • Search titles only

Separate names with a comma.

  • Search this thread only
  • Display results as threads

Useful Searches

  • Recent Posts

[Open Source] whisper.unity - free speech to text running on your machine

Discussion in ' Assets and Asset Store ' started by Macoron , Apr 12, 2023 .

Macoron

whisper.unity ​ Several month ago OpenAI released powerful audio speech recognition (asr) model called Whisper . Code and weights are under MIT license. I used another open source implementation called whisper.cpp and moved it to Unity. Main features: Multilanguage, supports around 60 languages Can do transcription from one language to another. For example transcribe German audio to English text. Works faster than realtime. On my Mac it transcribes 11 seconds audio in 220 ms Runs on local user machine without Internet connection Free and open source, can be used in commercial projects Feel free to use it in your projects: https://github.com/Macoron/whisper.unity  

Gord10

I implemented this into my new project. It works great, thanks for this! I couldn't get this work for IL2CPP, though, I had to use Mono. (Unity 2022.3.0)  
Gord10 said: ↑ I implemented this into my new project. It works great, thanks for this! I couldn't get this work for IL2CPP, though, I had to use Mono. (Unity 2022.3.0) Click to expand...
Macoron said: ↑ Great, nice to hear that you used it for your project. For what platform did you have problem with IL2CPP? It should be supported. Click to expand...
Gord10 said: ↑ I get following errors in player (64-bits Windows build) Click to expand...
Great! Windows, Mac (tested only for Silicon) and Linux IL2CPP builds work perfectly, now, thanks for the fix.  

Warfighter789

Warfighter789

Hi there, is this asset compatible with VR devices such as Meta Quest 2? I attempted to integrate it into my project but encountered an error. Thanks in advance. Error Unity NotSupportedException: IL2CPP doesn't allow marshaling delegates that reference instance methods to native code. The method we're trying to marshal is: Whisper.Native.whisper_progress_callback::Invoke.  
Warfighter789 said: ↑ Hi there, is this asset compatible with VR devices such as Meta Quest 2? I attempted to integrate it into my project but encountered an error. Thanks in advance. Error Unity NotSupportedException: IL2CPP doesn't allow marshaling delegates that reference instance methods to native code. The method we're trying to marshal is: Whisper.Native.whisper_progress_callback::Invoke. Click to expand...

jlmarc33

Hi, Unfortunately, I have an initialization error with Unity 2022.3 LTS concerning libwhisper.dll (DllNotFoundException: libwhisper assembly) Any advice will be welcome to allow a compilation with this version.  
jlmarc33 said: ↑ Hi, Unfortunately, I have an initialization error with Unity 2022.3 LTS concerning libwhisper.dll (DllNotFoundException: libwhisper assembly) View attachment 1267375 Any advice will be welcome to allow a compilation with this version. Click to expand...
Macoron said: ↑ Check messages above. This error should be fixed by recent update. Btw, I didn't test in Oculus Quest 2, but really interested to see how fast it works. Please write back. Edit: Make sure you use lastest-latest with this update https://github.com/Macoron/whisper.unity/pull/41 Click to expand...
Warfighter789 said: ↑ The latest update fixed the issue, thank you! The speed is quite fast. I noticed that the Speech to Voice isn't as accurate anymore, is that normal? Click to expand...
Macoron said: ↑ What do you mean Speech to Voice isn't accurate anymore? Do you have bad transcription results? If that the the case, what language do you use? Click to expand...
Warfighter789 said: ↑ Yeah, my transcription results are having problems. I've noticed that it's not picking up my voice as accurately as before. Sometimes when I say something, it comes out differently in the transcript. I'm using English. Click to expand...
Macoron said: ↑ Well, you can try to use older release. The latest master uses whisper.cpp 1.4.2 which may works different from 1.2.2. I also noticed some changes, but not sure if it's better or worse. https://github.com/Macoron/whisper.unity/releases/tag/1.1.1 If you are using English, I highly recommend you to switch to `whisper.tiny.en` or `whisper.base.en` models. They are much better in English transcription. I personally use `whisper.small.en`, but they might be too heavy for quest. Click to expand...
I tested Whisper.unity successfully on my Windows 11 laptop PC without any issues. I used Unity 2021.3.9 and the latest 2022.3.4 LTS. So, my initialization problem with Unity 2022.3.0 seems to be related only to my specific desktop PC configuration... (Windows 10 with security restrictions).  
jlmarc33 said: ↑ I tested Whisper.unity successfully on my Windows 11 laptop PC without any issues. I used Unity 2021.3.9 and the latest 2022.3.4 LTS. So, my initialization problem with Unity 2022.3.0 seems to be related only to my specific desktop PC configuration... (Windows 10 with security restrictions). Click to expand...

Bullybolton

Bullybolton

@Warfighter789 what did you do to test on Quest 2? I've just built the microphone sample scene onto quest and it was very slow.  

Spellbook

Macoron said: ↑ https://github.com/Macoron/whisper.unity Click to expand...
Spellbook said: ↑ This is something I've worked towards for years and it has effectively been impossible unless you're Google, Apple or Amazon... I don't think people quite realize how revolutionary this stuff is yet. Click to expand...
One issue I've run into is sampling a short audio clip returns 0 segments. Using push-to-talk, someone might quickly say "Yes" and the clip is 1 or 2 seconds long. The WhisperWrapper line "var n = WhisperNative.whisper_full_n_segments(_whisperCtx);" returns 0, finding no segments. I assume this is probably a limitation of the Whisper internals? I wanted to ask before I artificially append a few seconds to the end of audio clips as a hack solution.  
Spellbook said: ↑ One issue I've run into is sampling a short audio clip returns 0 segments. Using push-to-talk, someone might quickly say "Yes" and the clip is 1 or 2 seconds long. Click to expand...

Utopien

yeah , escuse my english i french sorry if question is a noob one i cant find how to translate a text from language to another exept of couse for the bool translateToEnglish, i want to translate all speech what ever the language in french any help would be highly appreciated thanks for this great paquage !  
Utopien said: ↑ yeah , escuse my english i french sorry if question is a noob one i cant find how to translate a text from language to another exept of couse for the bool translateToEnglish, i want to translate all speech what ever the language in french any help would be highly appreciated Click to expand...

Screenshot 2023-07-17 at 09.55.41.png

Macoron said: ↑ Find Whisper Manager on your scene and there find "Language" field. Write "fr" language code and make sure that "Translate To English" is disabled. Now any speech on any language will be translated to French text. Keep in mind, that it doesn't work as well as English translation and you will probably need bigger model than "tiny". With smaller models it will probably be just gibberish. View attachment 1272260 Click to expand...
Quick update: whisper.unity updated to 1.2.0 version! Biggest changes are prompting and streaming support. For more information, check release notes in Github repository .  

Sammyueru1

This is amazing I've always wanted to see something like this  

Strategos

Hey this is working great for me in editor but when i do an android build It dies thusly 09-11 23:43:30.505 1781 2132 I Unity : Trying to load Whisper model from buffer... 09-11 23:43:30.553 1781 1817 E Unity : DllNotFoundException: __Internal assembly:<unknown assembly> type:<unknown type> member null) 09-11 23:43:30.553 1781 1817 E Unity : at (wrapper managed-to-native) Whisper.Native.WhisperNative.whisper_init_from_buffer(intptr,uintptr) 09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperWrapper.InitFromBuffer (System.Byte[] buffer) [0x00054] in <82e321693d1448d4ae1fba9fa7e11c76>:0 09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperWrapper+<>c__DisplayClass27_0.<InitFromBufferAsync>b__0 () [0x00000] in <82e321693d1448d4ae1fba9fa7e11c76>:0 09-11 23:43:30.553 1781 1817 E Unity : at System.Threading.Tasks.Task`1[TResult].InnerInvoke () [0x0000f] in <0bfb382d99114c52bcae2561abca6423>:0 09-11 23:43:30.553 1781 1817 E Unity : at System.Threading.Tasks.Task.Execute () [0x00000] in <0bfb382d99114c52bcae2561abca6423>:0 09-11 23:43:30.553 1781 1817 E Unity : --- End of stack trace from previous location where exception was thrown --- 09-11 23:43:30.553 1781 1817 E Unity : 09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperWrapper.InitFromBufferAsync (System.Byte[] buffer) [0x0007d] in <82e321693d1448d4ae1fba9fa7e11c76>:0 09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperWrapper.InitFromFileAsync (System.String modelPath) [0x000c1] in <82e321693d1448d4ae1fba9fa7e11c76>:0 09-11 23:43:30.553 1781 1817 E Unity : at Whisper.WhisperManager.InitModel () [0x000bd] 09-11 23:43:35.290 1781 1817 E Unity : Whisper model isn't loaded! Init Whisper model first!  
Strategos said: ↑ Hey this is working great for me in editor but when i do an android build It dies thusly Click to expand...
Macoron said: ↑ For what device are you building? Which version of Unity? Please also check that your Player Settings has IL2CPP Scripting Backend and you are building for ARM64 architecture. Click to expand...

epl-matt

Is there a way to add custom words? There are some words that I need to use but it doesn't pick them up as that word ever.  
Also, @Macoron I tried installing the package from the package manager and kept getting an error about OnRecordStop delegate not being found. Removed it and copied the package com.whisper.unity package folder in the downloaded git zip download and no error.  
epl-matt said: ↑ Is there a way to add custom words? There are some words that I need to use but it doesn't pick them up as that word ever. Click to expand...
Strategos said: ↑ Thanks I will check these things and report back. Click to expand...
New major update: now whisper supports GPU inference on CUDA and Metal. It also should improve quality and fix some minor bugs. Check more details here .  

Tyke18

Macoron said: ↑ New major update: now whisper supports GPU inference on CUDA and Metal. It also should improve quality and fix some minor bugs. Check more details here . Click to expand...
Tyke18 said: ↑ Hi does this have a voice activity detection feature? I want to be able to capture mic input on a Meta Quest 3 without the user having to press anything to indicate they are speaking (audio permissions must be granted first of course). Click to expand...

DEV Community

DEV Community

Picovoice profile image

Posted on Feb 16, 2023 • Updated on Feb 22, 2023

Speech Recognition in Unity: Adding Voice Input

Voice inputs in Unity elevates the user experience significantly. However, adding speech recognition to Unity is not that easy. In Unity asset store there a few alternatives - that run on one platform or two. For cross-platform applications there are even less alternatives. Those support cross-platform applications rely on 3rd party cloud providers. Cloud computing comes at a cost with inherent limitations such as unpredictable latency and require a constant network connection. They will hinder the user experience. However, Picovoice by processing voice data on the device overcomes these challenges. That's why on day 33, we'll cover how to speech recognition to Unity without compromising user experience.

By the end of the tutorial we'll be able to control a player with voice by using commands like Porcupine, skip ahead 1 minute

This tutorial is to build a hands-free video player for virtual reality applications. For VR applications using physical controllers has not been convenient.

** Virtual Video Screen** By using Unity’s Render Texture, that receives frames of a video and render them as a texture. This any surface that can receive a texture can be turned into a screen.

  • Import a video into your Unity project.
  • Drag it into your scene
  • Click on the Video Player object and change the Render Mode property in the Inspector to Render Texture.
  • Right-click in your Project panel and select Create > Render Texture. Give this new object a name
  • Drag it into the “Target Texture” property of the video player.

Congratulations, you created a video player that will generate frames of your video and render them to a texture. Now, let's make the screen as well.

  • Create a new material with the shader type “Unlit/Texture”
  • Drag the render texture to the empty texture box.
  • Create a new piece of 3D geometry in the scene to apply the material to.
  • Drag the material onto this new object and hit the play button

Right now you should be able to see your video playing on the surface object!

Getting your app to understand voice inputs

We'll use Picovoice platform Unity SDK - which combines Porcupine Wake Word and Rhino Speech-to-Intent

  • Download the Picovoice Unity package
  • import it into your Unity project.
  • Download pre-trained models: "Porcupine" from Porcupine Wake Word and Video Player Context from Rhino Speech-to-Intent repositories - You can also train a custom models on Picovoice Console.
  • Sign up for the Picovoice Console for free and grab your AccessKey
  • Drop the Porcupine (.ppn file) and Rhino models (.rhn file) into your project under the StreamingAssets folder
  • Create a script called VideoController.cs and attach it to the video screen. In this script, we’ll initialize a PicovoiceManager with the keyword and context files, as well as a callback for when Porcupine detects the wake word (OnWakeWordDetected) and a callback for when Rhino has finished an inference (OnInferenceResult).

-do not forget to add your AccessKey and path for models-

Well, you might previously have challenges with recording audio from Unity. But PicovoiceManager , handles it automatically. simply call .start() to begin audio capture and .stop() to cease it. For pre-existing audio pipeline, you can use the Picovoice class to control passing audio frames to the speech recognition engine.

Integrating Voice Command Interface

  • Run below for wake word detection:
  • Run below for intent detection private void OnInferenceResult(Inference inference)

The slots are like arguments associated with the intent. When you receive the intent seek you’ll probably get minutes and/or seconds slots to tell us what time to set the video to. Using the slots, our function for seeking through the video will look something like this:

Lastly, connect connect each intent to a change in the UI and you're done!

Resources: Open-source code for the tutorial Picovoice Console Picovoice Platform SDK

Top comments (0)

pic

Templates let you quickly answer FAQs or store snippets for re-use.

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink .

Hide child comments as well

For further actions, you may consider blocking this person and/or reporting abuse

m__mdy__m profile image

Polymorphism in Javascript

mahdi - Mar 23

chintanonweb profile image

Importing Files Like a Pro: Node.js Best Practices Unveiled

chintanonweb - Apr 5

shyam1 profile image

🌟FREE Microsoft Exam Vouchers With AI Skills Cloud Skills Challenge⚡🔥🚀

Shyam Gupta - Apr 1

🌟 Microsoft is now offering FREE certification courses.🔥🚀😍

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Introducing the Unity Text-to-Speech Plugin from ReadSpeaker

Having trouble adding synthetic speech to your next video game release? Try the Unity text-to-speech plugin from ReadSpeaker AI. Learn more here.

  • Accessibility
  • Assistive Technology
  • ReadSpeaker News
  • Text To Speech
  • Voice Branding

Introducing the Unity Text-to-Speech Plugin from ReadSpeaker

As a game developer, how will you use text to speech (TTS)?

We’ve only begun to discover what this tool can do in the hands of creators. What we do know is that TTS can solve tough development problems , that it’s a cornerstone of accessibility , and that it’s a key component of dynamic AI-enhanced characters: NPCs that carry on original conversations with players.

There have traditionally been a few technical roadblocks between TTS and the game studio: Devs find it cumbersome to create and import TTS sound files through an external TTS engine. Some TTS speech labors under perceptible latency, making it unsuitable for in-game audio. And an unintegrated TTS engine creates a whole new layer of project management, threatening already drum-tight production schedules.

What devs need is a latency-free TTS tool they can use independently, without leaving the game engine—and that’s exactly what you get with ReadSpeaker AI’s Unity text-to-speech plugin.

ReadSpeaker AI’s Unity Text-to-Speech Plugin

ReadSpeaker AI offers a market-ready TTS plugin for Unity and Unreal Engine, and will work with studios to provide APIs for other game engines. For now, though, we’ll confine our discussion to Unity, which claims nearly 65% of the game development engine market. ReadSpeaker AI’s TTS plugin is an easy-to-install tool that allows devs to create and manipulate synthetic speech directly in Unity: no file management, no swapping between interfaces, and a deep library of rich, lifelike TTS voices. ReadSpeaker AI uses deep neural networks (DNN) to create AI-powered TTS voices of the highest quality, complete with industry-leading pronunciation thanks to custom pronunciation dictionaries and linguist support.

With this neural TTS at their fingertips, developers can improve the game development process—and the player’s experience—limited only by their creativity. So far, we’ve identified four powerful uses for a TTS game engine plugin. These include:

  • User interface (UI) narration for accessibility. User interface narration is an accessibility feature that remediates barriers for players with vision impairments and other disabilities; TTS makes it easy to implement. Even before ReadSpeaker AI released the Unity plugin, The Last of Us Part 2 (released in 2018) used ReadSpeaker TTS for its UI narration feature. A triple-A studio like Naughty Dog can take the time to generate TTS files outside the game engine; those files were ultimately shipped on the game disc. That solution might not work ideally for digital games or independent studios, but a TTS game engine plugin will.
  • Prototyping dialogue at early stages of development. Don’t wait until you’ve got a voice actor in the studio to find out your script doesn’t flow perfectly. The Unity TTS plugin allows developers to draft scenes within the engine, tweaking lines and pacing to get the plan perfect before the recording studio’s clock starts running.
  • Instant audio narration for in-game text chat. Unity speech synthesis from ReadSpeaker AI renders audio instantly at runtime, through a speech engine embedded in the game files, so it’s ideal for narrating chat messages instantly. This is another powerful accessibility tool—one that’s now required for online multiplayer games in the U.S., according to the 21st Century Communications and Video Accessibility Act (CVAA). But it’s also great for players who simply prefer to listen rather than read in the heat of action.
  • Lifelike speech for AI NPCs and procedurally generated text. Natural language processing allows software to understand human speech and create original, relevant responses. Only TTS can make these conversational voicebots—which is essentially what AI NPCs are—speak out loud. Besides, AI NPCs are just one use of procedurally generated speech in video games. What are the others? You decide. Game designers are artists, and dynamic, runtime TTS from ReadSpeaker AI is a whole new palette.

Text to Speech vs. Human Voice Actors for Video Game Characters

Note that our list of use cases for TTS in game development doesn’t include replacing voice talent for in-game character voices, other than AI NPCs that generate dialogue in real time. Voice actors remain the gold standard for character speech, and that’s not likely to change any time soon. In fact, every great neural TTS voice starts with a great voice actor; they provide the training data that allows the DNN technology to produce lifelike speech, with contracts that ensure fair, ethical treatment for all parties. So while there’s certainly a place for TTS in character voices, they are not a replacement for human talent. Instead, think of TTS as a tool for development, accessibility, and the growing role of AI in gaming.

ReadSpeaker AI brings more than 20 years of experience in TTS, with a focus on performance. That expertise helped us develop an embedded TTS engine that renders audio on the player’s machine, eliminating latency. We also offer more than 90 top-quality voices in over 30 languages, plus SSML support so you can control expression precisely. These capabilities set ReadSpeaker AI apart from the crowd. Curious? Keep reading for a real-world example.

ReadSpeaker AI Speech Synthesis in Action

Soft Leaf Studios used ReadSpeaker AI’s Unity text-to-speech plugin for scene prototyping and UI and story narration for its highly accessible game, in development at publication time, Stories of Blossom . Check out this video to see how it works:

“Without a TTS plugin like this, we would be left guessing what audio samples we would need to generate, and how they would play back,” Conor Bradley, Stories of Blossom lead developer, told ReadSpeaker AI. “The plugin allows us to experiment without the need to lock our decisions, which is a very powerful tool to have the privilege to use.”

This example begs the question every game developer will soon be asking themselves, a variation on the question we started with: What could a Unity text-to-speech plugin do for your next release? Reach out to start the conversation .

A phone on a blue background

ReadSpeaker’s industry-leading voice expertise leveraged by leading Italian newspaper to enhance the reader experience Milan, Italy. – 19 October, 2023 – ReadSpeaker, the most trusted,…

Accessibility Overlays: What Site Owners Need to Know

Accessibility overlays have gotten a lot of bad press, much of it deserved. So what can you do to improve web accessibility? Find out here.

Woman using recording equipment to create a podcast voice over

Struggling to produce a worthwhile voice over for your podcast? One (or more!) of these three production methods is sure to work for you.

  • ReadSpeaker webReader
  • ReadSpeaker docReader
  • ReadSpeaker TextAid
  • Assessments
  • Text to Speech for K12
  • Higher Education
  • Corporate Learning
  • Learning Management Systems
  • Custom Text-To-Speech (TTS) Voices
  • Voice Cloning Software
  • Text-To-Speech (TTS) Voices
  • ReadSpeaker speechMaker Desktop
  • ReadSpeaker speechMaker
  • ReadSpeaker speechCloud API
  • ReadSpeaker speechEngine SAPI
  • ReadSpeaker speechServer
  • ReadSpeaker speechServer MRCP
  • ReadSpeaker speechEngine SDK
  • ReadSpeaker speechEngine SDK Embedded
  • Automotive Applications
  • Conversational AI
  • Entertainment
  • Experiential Marketing
  • Guidance & Navigation
  • Smart Home Devices
  • Transportation
  • Virtual Assistant Persona
  • Voice Commerce
  • Customer Stories & e-Books
  • About ReadSpeaker
  • TTS Languages and Voices
  • The Top 10 Benefits of Text to Speech for Businesses
  • Learning Library
  • e-Learning Voices: Text to Speech or Voice Actors?
  • TTS Talks & Webinars

Make your products more engaging with our voice solutions.

  • Solutions ReadSpeaker Online ReadSpeaker webReader ReadSpeaker docReader ReadSpeaker TextAid ReadSpeaker Learning Education Assessments Text to Speech for K12 Higher Education Corporate Learning Learning Management Systems ReadSpeaker Enterprise AI Voice Generator Custom Text-To-Speech (TTS) Voices Voice Cloning Software Text-To-Speech (TTS) Voices ReadSpeaker speechCloud API ReadSpeaker speechEngine SAPI ReadSpeaker speechServer ReadSpeaker speechServer MRCP ReadSpeaker speechEngine SDK ReadSpeaker speechEngine SDK Embedded
  • Applications Accessibility Automotive Applications Conversational AI Education Entertainment Experiential Marketing Fintech Gaming Government Guidance & Navigation Healthcare Media Publishing Smart Home Devices Transportation Virtual Assistant Persona Voice Commerce
  • Resources Resources TTS Languages and Voices Learning Library TTS Talks and Webinars About ReadSpeaker Careers Support Blog The Top 10 Benefits of Text to Speech for Businesses e-Learning Voices: Text to Speech or Voice Actors?
  • Get started

Search on ReadSpeaker.com ...

All languages.

  • Norsk Bokmål
  • Latviešu valoda

Amir

Over 11,000 five-star assets

Rated by 85,000+ customers

Supported by 100,000+ forum members

Every asset moderated by Unity

Overtone - Realistic AI Offline Text to Speech (TTS)

speech to text unity github

COMMENTS

  1. Unity Speech to Text Plugin for Android & iOS

    GitHub Sponsors ☕. This plugin helps you convert speech to text on Android (all versions) and iOS 10+. Offline speech recognition is supported on Android 23+ and iOS 13+ if the target language's speech recognition model is present on the device.

  2. Speech Recognition in Unity3D

    In this tutorial, we described two methods of recognizing voice in Unity3D: Voice Commands and Dictation. Voice Commands are the easiest way to recognize pre-defined words. Dictation is a way to recognize free-form phrases. In a future article, we are going to see how to develop our own Grammar and feed it to Unity3D.

  3. How to use Text-to-Speech in Unity

    Step 6: Test and Debug. Run your Unity project and test the text-to-speech functionality. Monitor the console for any potential errors or exceptions, and make adjustments as necessary. Now, your Unity project is equipped with text-to-speech functionality using the Eden AI plugin.

  4. Manual Setup for iOS

    set the value of Automated Setup to false at Project Settings/yasirkula/Speech to Text; build your project; enter Camera Usage Description and Speech Recognition Usage Description in Xcode; insert -weak_framework Speech -weak_framework Accelerate to the Other Linker Flags of Unity-iPhone Target (and UnityFramework Target on Unity 2019.3 or newer); lastly, remove Speech .framework and ...

  5. Speech Recognition in Unity Tutorial

    This article serves as a comprehensive guide for adding on-device Speech Recognition to an Unity project.. When used casually, Speech Recognition usually refers solely to Speech-to-Text.However, Speech-to-Text represents only a single facet of Speech Recognition technologies. It also refers to features such as Wake Word Detection, Voice Command Recognition, and Voice Activity Detection (VAD).

  6. free speech to text running on your machine

    Several month ago OpenAI released powerful audio speech recognition (asr) model called Whisper.Code and weights are under MIT license. I used another open source implementation called whisper.cpp and moved it to Unity. Main features:

  7. Speech Recognition in Unity: Adding Voice Input

    Import a video into your Unity project. Click on the Video Player object and change the Render Mode property in the Inspector to Render Texture. Right-click in your Project panel and select Create > Render Texture. Give this new object a name. Drag it into the "Target Texture" property of the video player.

  8. Introducing the Unity Text-to-Speech Plugin from ReadSpeaker

    The Unity TTS plugin allows developers to draft scenes within the engine, tweaking lines and pacing to get the plan perfect before the recording studio's clock starts running. Instant audio narration for in-game text chat. Unity speech synthesis from ReadSpeaker AI renders audio instantly at runtime, through a speech engine embedded in the ...

  9. Text-to-Speech Overview

    This topic provides an overview of Voice SDK's TTS functionality.

  10. Unity Speech-to-Text

    VoiceRecognizer.cs. # Click Play button. Unity should: Be able to recognize your voice. Perform a speech-to-text process. Display the speech in the subtitle. Call Python REST APIs in Heroku to predict the tone of speech by Machine Learning. Display the tone on the screen. # Next chapter is about the related C# scripts for this task.

  11. Realistic AI Offline Text to Speech (TTS)

    Latest release date. Oct 9, 2023. Original Unity version. 2021.3.30 or higher. Over 11,000 five-star assets. Rated by 85,000+ customers. Supported by 100,000+ forum members. Get the Overtone - Realistic AI Offline Text to Speech (TTS) package from LeastSquares LLC and speed up your game development process.

  12. Speech to Text Azure from AudioSource data Unity C#

    With the information you have provided, I am assuming that you would like capture the output of the AudioSource Object and send it to the Azure Services for translation. For custom Audio Source Configuration - You could use the PullAudioInputStream / PushAudioInputStream . In this case you will have to intercept the output audio - send it to ...