Major tech firms have been keen to sell speakers equipped with voice-based artificial intelligence agents recently.
The debuts of smart speakers are seen as the prelude to an AI era, ushering in a new technological age in which virtual assistants are expected to become as ubiquitous as smartphones, allowing people to connect to the internet by voice with greater ease.
Whether these speakers will really take off and whether the technology will be popular in Japan remain to be seen. The following questions and answers explore these issues as well as why AI speakers are creating a buzz and what will be the role of Japanese firms in this field.
What makes AI speakers special?
They look like normal portable home speakers, but one big difference is that they communicate with users verbally.
Users can tell the speakers to play music, search the internet, pull up weather forecasts, send text messages, make phone calls and perform other daily tasks.
Currently, Amazon is leading the field with its 2014 U.S. debut of the Echo speaker series, which comes with a voice agent called Alexa. Google Inc. and Apple Inc. are following Amazon’s lead with their own speakers, Google Home and Apple HomePod.
However, these AI speakers are not available in Japan yet.
Virtual assistants, such as Google Assistant and Apple’s Siri, are already available as features in many popular smartphones.
Other internet giants, including Microsoft and Alibaba, have also announced they will launch smart speakers, while Line Corp., Japan’s messaging app giant, will be promoting its AI engine, Clova.
If voice assistants already exist in smartphones, why do we need AI speakers?
Smart speakers enable users to speak to the devices more easily than smartphones, experts say.
With smartphones, people usually need to hold their handsets and speak to them.
AI speakers are equipped with high-quality microphones, so they can recognize users’ voice commands from relatively far away, enabling a “hands-free, location-free and focus-free” environment, said Kanae Maita, principal analyst at Gartner Japan, a technology research and consulting firm.
People don’t even have to look at the devices to use them as long as they are within a reasonable distance, which is a new user experience, she said.
Asked if other electronics devices, such as TVs, could be AI-powered, Maita said tech firms probably think the speakers are good devices as a starting point to introduce their AI voice agents.
“TVs have speakers, too, but you can’t really carry them around. Moreover, you need to buy TVs that come with AI voice agents, but TVs are not something that people purchase very often,” said Maita.
Why are tech giants now releasing these smart speakers?
They are aiming to create new markets for voice-based AI systems, just like Google and Apple crafted huge app markets for smartphones.
To create such markets, companies that already run content platforms have a competitive edge, Maita said.
Amazon, for instance, has a gigantic e-commerce service with music and audiobook content that it can provide to users through the speakers.
“Those that have content platforms are looking into providing them in different ways with a new user interface,” which is the voice-based interface, said Maita.
Are speakers really the next big thing?
Experts say that while some media describe AI speakers as a new tech trend following the smartphone revolution, the speakers themselves are not the core value.
“The essence is not the speakers but those voice-based virtual assistants behind them. It’s Alexa for Amazon. It’s Clova for Line,” Kazuo Hiyane, general manager of advanced technology at Mitsubishi Research Institute, said.
Those voice agents seem to be a package deal with the speakers, but the agents are actually on the internet, which means people will be able to access them anywhere an internet connection is available.
Hiyane added that the speakers are the first step of the tech firms’ goal to integrate voice agents seamlessly by making things around people AI-powered.
Gartner’s Maita agrees. With the emergence of voice-based AI assistants, people are likely to control computers more with their voice, “which is a paradigm shift for the computing platform,” she said.
About 3.3 percent of households worldwide will be using AI speakers by 2020, according to Gartner estimates. It also predicts that more than 50 percent of the internet experience for those homes will be voice-based by 2020.
Where do Japanese players fit in?
Japanese firms don’t have a visible presence, with the exception of Line Corp.
Line said it will launch its Clova-powered speaker this summer in Japan.
Overseas rivals have yet to start selling their speakers in the country, positioning Line ahead of competitors in the domestic market.
“When it comes to a service platform Japanese companies can create for voice-based applications, I think Line has good potential,” said Hiyane.
Line already has tens of millions of users and provides many services, so it won’t be hard to craft another platform for the new voice market by using existing content, according to Hiyane.
“Other than Line, it seems pretty tough. I think other Japanese makers can make speakers, but I don’t think they can create content ecosystems,” he said.
Still, electronics giants such as Sony Corp., Panasonic Corp. and Sharp Corp. must be thinking about using virtual assistants, Hiyane said.
Since Sony has music content, it may have a chance to compete with the rivals, but other electronics firms, such as Panasonic and Sharp, don’t really have soft content, so they will have a hard time providing attractive content platforms, he said.
Will AI speakers be popular in Japan?
Experts said they will eventually spread in Japan, but the initial stage may be challenging.
Hiyane said this is partly because many Japanese probably find it awkward to speak to machines and give orders to do something for them.
Thus, it might be key for the makers to come up with settings for users to be able to feel more emotional attachment to the AI agents and speak to them like they do with friends, he said.
IN FIVE EASY PIECES WITH TAKE 5