• Sponsored Content


At the Japan Times Advanced Technology Forum held in Tokyo on May 31, Akira Sakakibara, chief technology officer of Microsoft Japan Co. and the president of Microsoft Development Co. spoke about what artificial intelligence can do and machine learning processes, and about future possibilities and challenges. The forum was supported by the Ogasawara Foundation for the Promotion of Science and Engineering as one of its subsidized projects.

“The democratization of artificial intelligence is a key concept in the provision of our artificial intelligence technologies. By this we mean that any industry, organization, company or individual should be able to benefit from the power of AI,” Sakakibara said.

Over the last several years, Microsoft has achieved major accomplishments at the annual ImageNet Large Scale Visual Recognition Challenge, which is organized by ImageNet, an online project that supports visual object recognition software research around the world.

In 2016, Microsoft participated in the five categories of object localization, object detection, object detection from video, scene classification, and scene parsing. “We topped other participants in all five categories. In addition, we were the first participant to exceed the benchmark of human performance in scene classification,” Sakakibara said.

“We succeeded in surpassing the human benchmark in speech recognition in 2017, and machine reading comprehension in 2018. Our AI attained about the same level as humans in real-time translation, too,” he said.

He went on to explain what can be done if some technologies are combined. The combination of object recognition and semantic segmentation, which classifies every pixel in an image, allows both well-defined or small objects such as people and goods, and large or discernible objects such as the sky or sea to be recognized within one image. Visual recognition technology used in autonomous driving systems works on these principles.

Additionally, the visual recognition powered by image-captioning technology that analyzes and verbally explains scenes in images led to the creation of Seeing AI, an application designed to aid visually-impaired people.

As an example of speech recognition and sentence comprehension, Sakakibara introduced an application developed and used by McDonald’s to minimize the number of errors in taking orders and increase efficiency in their drive-through service. “It consists of speech to text, text analytics and language understanding, making it possible to comprehend complex orders,” he said.

“Then there is also machine translation,” Sakakibara said. “The conventional statistical machine translation method has been replaced by one that utilizes a neural network and deep learning technology.” Statistical machine translation is based on syntax and morphological analysis, he explained, which has posed a difficulty in dealing with two languages that are substantially different in terms of grammar and word order like English and Japanese.

Using deep learning to understand contexts and whole sentences instead of focusing only on sentence fragments and structure has allowed the translation system to continue improving itself, according to Sakakibara. “The more data fed into the system, the cleverer it becomes,” he said.

These technologies — visual recognition, sentence comprehension, translation and speech recognition — are used in various combinations to achieve more complex tasks. “For example, AttnGAN is a combination of a generative adversarial network (GAN) and another network of sentence interpretation,” said Sakakibara. He explained that GAN consists of two networks; one that draws a picture that resembles the original, and the other that tries to distinguish it from the original. These two networks compete with each other, constantly improving quality. “This technology, together with sentence interpretation ability, makes it possible to understand and process a request such as ‘draw a little bird with a yellow body and black wings,’ resulting in a realistic illustration that is faithful to the request,” he said.

In the field of language, Microsoft’s AI technology has been used in various chatbot experiments such as Xiaoice in China, Rinna in Japan and Indonesia, Zo in the U.S. and formerly, Ruuh in India. “What we are trying to achieve through these chatbots is to see how AI can respond to the emotions of the users, as well as create long-lasting conversation,” Sakakibara said.

AI agents are designed differently based on their purposes. While Rinna’s main focus is empathy, Cortana, a virtual assistant in Windows, is aimed at increasing work efficiency. “Ask each bot whether it is going to be sunny tomorrow, and Cortana will provide you with facts. Rinna will start a conversation: ‘Are you going out tomorrow?’ They use two totally different approaches,” said Sakakibara. Some retailers, including a major convenience store chain in Japan, already use chatbots to effectively promote their products to targeted age groups.

“These AI technologies will become more commonplace in society as they are being applied to various industries,” he said. “An effective real-world application of such technologies is ‘simulation,’ when the AI engages in experience-based learning inside a simulator.” One example is Project Malmo, developed by Microsoft and Minecraft. This research project looks at how AI agents learn through experiences and interactions with other agents in the virtual world of Minecraft.

Minecraft is an online game, but similar simulation methods can be applied to develop AI systems for industrial use. “While it is too costly, dangerous and time-consuming to conduct test runs or flights using real autonomous cars or drones, you can do the testing in a virtual town built in a simulator that can operate nonstop for 24 hours a day, 365 days a year,” Sakakibara said.

He explained that the there is no need to put real machines into operation until inference models that have learned enough in the simulator are planted into machines.

“Industrial AI matures over time,” said Sakakibara. “At first, it is only monitoring what is happening at a production site, then it starts to make predictions and improve productivity. It will soon optimize wider areas of production and its usage expands. Ultimately, it will work autonomously.”

According to Sakakibara, reinforcement learning is an important method for creating a solid autonomous system. “A typical deep learning method keeps feeding AI sets of input and output data to teach how this input data should be handled and output,” he said. “On the other hand, reinforcement learning only requires a small portion of such data to start with. Once the system completes the cycle of interpretation and output, it uses the output data to repeat the learning process by itself.” He predicts that the field of machine teaching will advance further to make this reinforcement learning process more efficient.

“That is why we acquired Bonsai last year, a startup established in the U.S. in 2014 dedicated to AI’s deep reinforcement learning models. Its main feature is a concept network that accelerates learning. It segments concepts that the AI intends to learn into smaller sub-concepts,” he said. “It is easier and faster for AI to achieve smaller goals one by one. Everything can be put together at the end.”

The next challenge is to visualize what AI is learning, which is necessary to increase data coverage and adequately handle possible biases included in the data.

There are gender, race or religious biases in various data. “Vectorizing or digitalizing words and neutralizing them is called ‘debiasing.’ But whether particular information should be debiased or not depends on the purpose and use of the AI,” said Sakakibara.

“Let’s say the AI will be incorporated in the lending system of a financial institution. A decision about a loan amount should not be affected by the borrower’s gender, religion, sexual orientation or area of residence, so the debiasing of such data is a must,” he said. “But in the case of medical AI, it is often better to leave the physical or symptomatic differences between men and women as they are,” he said, stressing the importance of creating a data strategy that encompasses the kind of data that should be collected, as well as how the learning should be designed, depending on the purpose of each AI system.

“We want to create AI not to replace humans, but instead to be used to enhance human abilities and to support human activities,” said Sakakibara. He explained that transparency and accountability are keys in achieving this aim. “AI and Ethics for Engineering and Research is our in-house advisory committee that screens, evaluates and gives advice from an ethical point of view on projects that involve sensitive issues such as bullying and suicides,” he said.

Sakakibara stated that various academic institutions, research bodies and industry organizations around the world have also created committees or guidelines to encourage and facilitate discussions on the ethical implications of AI. “There is no notion of competition when we talk about these issues. Microsoft has collaborated with players that are usually considered our competitors such as IBM, Facebook, Deepmind, Google and Amazon Web Services to form Partnership on AI, a non-profit that promotes discussions on ethical topics and the social implementations of AI technologies,” Sakakibara said.

“While many private enterprises and governmental bodies from around the world have also joined the community, there is only one participating company from Japan — Sony CSL,” he said. Sakakibara noted that ethics or ideas unique to Japan may not be reflected in the discussions that center around AI-related topics, and that it could be disadvantageous if there was not enough representation of Japanese enterprises and organizations within the partnership. Such topics include what is safety-critical AI, fairness, employment and economic issues, the coexistence of AI and humans, and social impacts. “I hope that more Japanese companies will join Partnership on AI and deliver their messages to the world,” he said.

Microsoft has also launched an investment program called AI for Good. There are three categories so far: AI for Earth, AI for Accessibility and AI for Humanitarian Action. AI for Earth was launched in 2017. It is a five-year, $50 million initiative granted to projects aimed at addressing global issues such as climate change, natural disaster prevention and conservation of biodiversity through the use of AI. So far, 236 grants have been awarded to projects with impact in 63 countries.

AI for Accessibility is committing $25 million over five years to projects developed by or with people with disabilities to tackle issues concerning employment, daily life, and communication and connection.

AI for Humanitarian Action is committing $40 million over five years to support nonprofit and humanitarian organizations working across areas such as disaster response, refugees and displaced people, human rights and the needs of children. Microsoft works together with organizations to create solutions and innovations through the use of AI technology in these fields.

“Several applications have been received from Japan in the categories of AI for Earth and AI for Accessibility, two of which were successful. We are looking forward to working with more people and organizations with new ideas,” said Sakakibara.

This page has been produced with the support of the Ogasawara Foundation for the Promotion of Science and Engineering, which was founded by late Toshiaki Ogasawara, the former chairman and publisher of The Japan Times. This page is also part of the series that highlight ESG (environmental, social and governance) activities of companies and other organizations.