NTT's new generative AI platform able to parse diagrams, charts

Japan telecom giant Nippon Telegraph and Telephone Corp. said Thursday its newly launched generative artificial intelligence platform can also parse documents containing charts and diagrams.

Tsuzumi, named after a Japanese hand drum used in traditional events, was launched last month for business use as the major telecom company seeks to catch up with foreign rivals in the fast-moving market.

In addition to being a multimodal AI model, tsuzumi has higher Japanese language processing capabilities than ChatGPT, a widely used AI chatbot developed by U.S.-based OpenAI, according to NTT.

With visual comprehension capabilities, NTT's large-scale language model can summarize and extract necessary information from an illustration or chart.

The functionality means it can also convert a document with many diagrams into text or calculate expenses based on taxi fare or meal receipts.

While AI platforms developed by overseas competitors do well in generating images or videos from text prompts and vice versa, parsing documents that contain diagrams and other media has been considered a challenge due to variations in file formats.

"If this technology becomes widely adopted by firms, productivity will improve by leaps and bounds," a developer at NTT said.

© Kyodo News