Evan has over a decade of deep experience with the telecommunications industry, including both executive business and technical expertise with traditional wireless MNOs/MVNOs and WebRTC/VoIP OSS. He has published patents covering both hardware and software design and is a frequent presenter at realtime/IoT conferences covering aspects such as scalable SaaS deployments, best business practices, and utilization of machine learning models in a realtime setting.
As CTO at RingPlus, Evan helped create and grow the MVNO from a blue sky project to 120k+ mobile users spread throughout the United States. This included the in-house creation of all platforms and systems from carrier integration through deployment, billing, and customer service.
As the lead on Product Research & Development, he was instrumental in innovating a number of high profile business product offerings, including Telecom-as-a-Service (TaaS) MVNE enablement, in-call voice translation (in 2013), FluidCall (seamless legacy wireless to WiFi handoff), and innovations in telecom billing models.
Machine learning itself is decidedly not real time. Why, it can take days for multi-GPU machines to assemble a working model! The results for RTC, however, could be revolutionary: video/audio optimization, advanced NLU, speaker synthesis, realtime session transmission tuning… and crime!
Machine learning is an exploding field - with the arrival of cloud GPU instances and libraries like TensorFlow and PyTorch, models can be trained in ways that were never practical before. For us working in RTC, we can now leverage those tools to both solve issues with realtime communications and create new ones. Who’s interested in crime?
In this talk we’ll discuss the future of ML with RTC. We’ve recently seen rapid advances in ASR, TTS, and machine translation. Incorporating these directly into our platforms is now a simple API call away. Next up is optimization of audio and video using models like Google’s RAISR, allowing us to realtime upscale the image stream on the client side with no artifacting. Similarly we can do advanced noise suppression using tools like Mozilla’s RRNoise or, even more exciting, only send a subset of audio and recreate speech on the far side using a model of the speakers voice.
Machine learning works best when there is voluminous amounts of well-structured data – something that we generate each call in spades! We can also use machine learning to analyze data and watch metrics during a session to re-route or optimize bitstreams accordingly. After the call, we can take a look at the statistics to improve future performance.
But what if you weren’t the ethical, upstanding citizen you are today? These tools can also be used for nefarious purposes. Imagine receiving a call from someone you loved saying there were in the hospital, so you quickly leave your house unlocked. And are we OK putting voice over actors out of work? We’ll talk through some of the societal repercussions of these advances.