In this project, I showcase my successful attempt at integrating a conversational AI avatar, named Joanna, into an interactive environment. This initiative demonstrates my ability to blend emerging technologies in AI, speech synthesis, and real-time interaction to create a responsive and engaging user experience. The aim was to navigate the complexities of connecting a virtual avatar with advanced conversational capabilities within a digital space.
Design Challenge: How can we create an interactive avatar that is responsive and engaging for users in real-time?
Problem Statement: Identifying the optimal tools and platforms to:
Select the most suitable avatar and platform for integration.
Seamlessly connect the avatar to a conversational AI, ensuring fluid interaction.
Create a functional and accurate lipsyncing.
Before finalizing the architecture for the project, I conducted comprehensive parallel research to evaluate various cutting-edge technologies. This process was crucial in determining the most effective tools and platforms to achieve the project goals.
Metahumans in Unreal Engine:
Objective: Explore high-fidelity virtual characters for potential use in future projects.
Process: Integrated MetaHumans with Unreal Engine, testing real-time rendering capabilities and performance optimizations.
Outcome: Gained insights into complex character facial animations and the suitability of MetaHumans for different interactive scenarios.
A2F App in Nvidia Omniverse and Riva Chatbot:
Objective: Test interoperability and real-time AI interactions within a photorealistic simulation.
Process: Utilized Nvidia’s Omniverse for rendering and Riva Chatbot for AI interactions, focusing on seamless integration and latency reduction.
Outcome: Established effective strategies for real-time AI communication in visually intensive applications.
During this research phase, I collaborated closely with tech support communities to resolve challenges and refine the project's architecture. This experience highlighted my ability to quickly adapt to new technologies, engage in detailed R&D, and effectively communicate with diverse technical teams.
Solution Architecture
After extensive research and development with various tools and platforms, I determined that utilizing a Ready Player Me avatar within Unity, combined with Oculus Lipsync for realistic speech animation, provided the best user experience. This decision was based on the flexibility and robust support for real-time applications that Unity offers, coupled with the wide adoption and ease of integration of Ready Player Me avatars.
Technical Process
Integration in Unity:
Unity Setup: Configured a scene with the Ready Player Me avatar and integrated Oculus Lipsync to animate the avatar based on speech.
Conversational AI Connection: Utilized a Rasa Chatbot, designed by an AI engineer, to handle the dialogue management. This chatbot runs on a local server and communicates with the Unity application via HTTP requests.
C# Scripting:
Developed C# scripts in Unity to handle user inputs and send them to the Rasa Chatbot.
Processed responses from Rasa and passed them to a text-to-speech service to generate audio outputs.
Text-to-Speech Handling:
Implemented a local Flask server to interface with Amazon Polly for speech synthesis, overcoming limitations related to WebGL deployment from Unity.
The Flask app handles POST requests from Unity, synthesizes speech using Amazon Polly with the voice 'Joanna', and serves the resulting audio back to Unity for playback.
Code Example
Challenges and Resolutions
WebGL Deployment: Faced challenges with directly using Amazon Polly in Unity's WebGL build. Solved this by routing Polly requests through a Flask server, which then serves the synthesized speech back to Unity.
Real-Time Interaction: Ensured real-time performance by optimizing network calls and streamlining the data handling between Unity, Flask, and Polly.
Conclusion
This project not only solidified my skills in integrating diverse technologies but also underscored my ability to problem solve and innovate within the constraints of current software and platform capabilities. This project highlights my journey from conception through to a functional prototype, demonstrating both the technical challenges encountered and the creative solutions applied.