Your car is opening the garage door by calling up your voice assistant. An audiobook is using Amazon’s Whispersync or Google tech to pick up reading your New York Times bestseller in your car. It’s picked up the narrative at the same place you stopped at in bed the night before. The technology behind your virtual assistant really is amazing. It extends beyond the bounds of the smart home, to your car, maybe as far as your commute route and your office. It’s tempting to just keep following the building blocks as they snap together and create an ever-expanding digital framework, all ruled by that centralized AI intellect and your voice. Apply the brakes for a few minutes, however, for there are smaller building blocks that glue the bigger segments together. Your smart speaker, way back at the head of the virtual machine, is still at rest on your table. It looks innocent enough, yet it’s literally packed with innovative tech, some of which live in the cloud. Let’s tear down that hardware to see where the software hangs its hypothetical hat. In other words, what and where is the hardware?
There are several ways to describe the tiny roads and interconnections that shape a smart speaker. You can start at the processor and work outwards to the speaker and microphone, at which point a hop and a figurative skip takes your journey all the way back to your own voice and cognitive muscle: your brain, in other words. In defiance of that outward-facing parts expedition, how’s about starting on the outside? Out here, it’s your voice that dictates every action and counteraction. And it all begins with an active microphone array.
Starting at the outside of a smart speaker
A far-field microphone is a listening device that converts sound waves into digital signals. Frankly, that description holds true for any microphone. The far-field variety, though, has some extra talents inside its electronic module. First of all, far-field voice input processing (FFVIP) uses special software algorithms to make sense of distant voices. That means you can be on the opposite side of the room when you make your request. There’s no need to crouch down and shout into the mic. This far-range feature accurately mimics the human ear, thus reinforcing the feeling that you’re talking to an actual person, not a machine. Echoes and weird acoustic attenuation effects are next on the far-field agenda. The same algorithms jump to attention when your voice is dulled by an echo or obstructed by ambient room noise. Don’t worry, your virtual assistant isn’t alone when your voiced request is under the scrutiny of your AI. A cloud-based server also joins in to crunch the waveforms generated by your voice and your environment. In the case of an Alexa-equipped device, Amazon’s Alexa Voice Service (AVS) is the translation and noise separation engine of choice. As for your Google Home Far-field microphones, it’s hard to imagine a corporation that has access to more text and voice-based data, so their text-to-speech translating credentials are absolutely assured.
Next up is smart speaker AI in the Cloud
The hardware host interacts with the software AI. That’s how the smart features have developed at an accelerated pace, and that’s also how the natural language understanding segment of the processing chain operates. The active microphone arrays are an important gateway, one that converts sound, separates input audio sources, and logically preprocesses your voice, then the raw data flies out elsewhere. A processing brain exists remotely, far away from the gadget, on the cloud. It’s hard to reduce what happens in this distributed network into a few easy to read sentences. After all, an entire lump of brain tissue, your brain tissue, dedicates itself to the translation of language. Known as “Broca’s Area,” this part of the frontal lobe deals with grammar, context, and language. A machine intelligence doesn’t have a Broca zone, but it does know how to outsource its language decoding duties. Context, grammar, intent, all of that cool stuff is sent to the cloud and analysed by deep learning technology. The Amazon servers employ NLU (A Natural Language Understanding) platforms, an incredibly intricate TTS (Text-to-Speech) program, and many other data-heavy resources to process the raw data produced by your device’s listening microphones. And, yes, the microphones, cloud services, and actual AI are all part of this voice deconstruction routine. In the end, your device achieves the ability to instantly respond to your every request, no matter where you are in the room.
And more on the internals of a smart speaker
The block diagram of your imagined AI container is spread out on your table. Pull back that one corner before it folds back. There, that’s better. The microphone module is there, and there’s a second layer of language decoding circuitry heading off the edge of the blueprint. It’s taking your words to the cloud for further data crunching. Down below the microphone array, an inbuilt speaker takes pride of place within the system architecture. It’s here that the sound performance characteristics of your device are decided. Typically, this confined internal area doesn’t exactly encourage the generation of a wide soundstage. The condensed speaker cabinet is small, so the notion that a superior sound field is likely to radiate from your virtual assistant’s housing seems unlikely. Surprisingly, the circular shape of the smart speaker does simplify the production of an immersive audio field, for that speaker girdles all 360° of the little body. Additionally, these compact products tend to break the speaker’s architecture into three distinct sections. A woofer and tweeter occupy the speaker shell, then a reflex port creates extra bass by shaping the output sound. Powerful enough to set your feet tapping while you finish your day’s chores, the integrated speaker pulls its music, podcasts, and your AIs voice from numerous branching circuits. The processor creates many of those streams of ones-and-zeros, as does the wireless module.
What about the microphone and speaker inside this smart device?
The device specifications of a capable virtual assistant absolutely require a microphone and speaker. These dual sections equip an AI with the ears and vocal chords they need to carry out your requests and issue some kind of feedback. If you ask for your music to be transferred to your Android TV or your Fire Stick, for example, it’s natural to want a response, a near natural voice that tells you the music is now pouring into your living room. That’s just how people are. right? You effect an action, so you expect a reaction.
Smart speaker processor, Wi-Fi and more….
Next, the processor, the thinking engine inside the assistant needs to regulate all of these actions. The processor doesn’t deal with the mountains of data produced by your voice, but it does regulate every other duty as they zip through the electronic conduits. In order to accomplish this task, there’s onboard memory installed on the main circuit board, plus a few proprietary chips. These integrated digital packages handle special media processing duties. Then, located somewhere between these proprietary computer parts, a Bluetooth and Wi-Fi circuit package is inserted into the signal processing parts as the connectivity bridge. It’s here that streaming music channels and your latest podcasts are received and processor-directed, pushed onto the computing highways that descend out of your AIs simulated consciousness.
And more smart speaker circuit tracks
It’s around this point that most consumers begin to see the embedded connections that tie these learning intellects to the real world. Certainly, there’s all the nuts and bolts that are linked by finely rendered circuit tracks on some beautifully designed board. Small wire filaments further bind these linkages by connecting the Wi-Fi antenna to the comms package, the media processor to the loudspeaker and microphone array, and so forth. But what about the invisible links in your smart speaker’s processing chain? There’s the primary link, which is your voice and its unique sound waves. The cloud is another link, the place where those waves are broken down and analysed for context. Then there’s an App, a freely downloaded application that extends your platform connection. It works as a remote control, as a second microphone, and as an always ready link to your elsewhere located smart assistant. Used as a device companion piece or as a separate way of accessing your AI, an App exhibits the same level of interactivity as a mobile intellect, such as Siri or Cortana. Of course, your physical connection is still out there, somewhere in your home, controlling your appliances.
And even more smart speaker software options
If those linkages are to go any further, however, there are several problems to overcome. Primarily there are a bunch of competitive product manufacturers knocking on your door. They want in on the action. Determined to grab a precious piece of real estate in your living room or kitchen, they’re selling their own smart appliances and gadgets. Your AI talks all kinds of languages. They’re the Skills and Actions that turn light bulbs on, stream music, and carry out a thousand other handy services. They’re little computer scripts or applets, these “Skills,” and they even address the problems that arise when those competing manufacturers use different smart home platforms. Just keep in mind the fact that an Alexa (Amazon) service is called a Skill, then remember to refer to a Google service as an Action. Keep those two separate, and you’ll do fine.
Third-party developers love Skills and Actions. This is because the tiny applets enable different brands to easily access the power of Alexa or Google Assistant. An API (Application Program Interface) accelerates the compiling process, thus building the software application in a fraction of the time it would take to do it manually. You can literally look up and add over a thousand Skills to your AI platform. Many of these services provide handy shortcuts, others are just plain fun, and some are productivity-based. Order a pizza from your favourite shop with a Google Action or an Alexa skill, access an IFTTT conditional statement, or just find a movie recommendation to watch on your Fire TV stick tonight. As for the smart home manufacturer issue, there’s also an entire Skills category dedicated to the smart home.
Smart speaker communication protocol options
The IoT (Internet of Things) market is populated by many software designers and hardware specialists. Frankly, there’s a lot of potential for messy connectivity in your home, but there are ways around the mess. If you want to understand this messy domain, imagine the Zigbee communications protocol or its close relative, the Z-Wave network meshing standard. These are two common forms of wireless technology, connection methods that are mostly reserved for the Smart Home. Unfortunately, most smart speakers adhere to the Wi-Fi and Bluetooth dual platform setup, probably because they’re universally endorsed. So how does a Wi-Fi powered AI talk to a Zigbee device? Simply put, it doesn’t. Instead, you need a Hub. Samsungs’ SmartThings gear employs Z-Wave and Zigbee as a network meshing mechanism, so a Wi-Fi or Bluetooth connected AI can’t talk to this wireless standard unless a hub is added to the system. A hub, quite simply, is a piece of hardware, a centralised brain that translates different software languages. It then distributes the translated instructions, via ZigBee and Z-Wave, to the smart gear. In this example, your Samsung Smart Home is being controlled by your AI, but only when the request is passed through the SmartThings Hub. This scenario isn’t limited to a SmartThings home. Indeed, just about every Smart Home device you purchase will require a hub, a little coordinating brain that should interface seamlessly with your virtual assistant. This hub comes as part of a kit or as an optional extra, a unit that connects compatible third-party products to your home network so that they can all talk to each other.
All of these external parts challenge the harmonious operations of a properly meshed Smart Home network. There’s simply too many gadget vendors and big name electronics manufacturers vying for your home’s limited space. That’s why you have special services, as built from speedy API packages, to take on the bulk of the work. Third-party Skills and Actions accommodate this move. As for the same issues on the hardware front, well, there are other wireless protocols and more hardware platforms to manage than seems possible. But hubs are always an option, for this important part of the branded kits system architecture allows your virtual assistant to talk to the network branch so that there’s no conflict between the foreign wireless protocols.
Meanwhile, the innards of your smart speaker are deceptively intricate. They look a little like the stages of a rocket, the parts that would be jettisoned as the chemically propelled craft made its orbit around the Earth. Up top, the microphone array sits where a passenger capsule would be if this metaphor was followed to its logical conclusion. Below that, stage by stage, there are volume controls, wires, circuit boards, speaker parts, and a product base. Those exterior controls follow a relatively basic layout. They’re touch-enabled or concealed below thin plastic membranes. One of the more popular candidates in our study adopts a pure cylindrical form, which makes its uppermost section an ideal place to incorporate an unobtrusive volume ring. Otherwise, a Mute Button and an Action Button dominates the control area. After all, these devices are designed for hands-free use, but you can trigger that mute mode if you want to work by touch alone.
Alternatively, consider purchasing your AI ecosystem with a remote control. The Amazon Echo uses a remote control, one that’s offered as an optional accessory. A minimalist button layout covers the upper surface of the remote control unit. Furthermore, there’s a microphone built into the remote, so you can take charge of your AI from another room. Interestingly, the Fire TV remote is able to pull double duty on this occasion. Yes, you can use the Fire remote control to talk to Alexa. That’s the kind of system synergy that you’ll get a kick out of, for who doesn’t enjoy a little product crossover?