Đăng nhập
 
Tìm kiếm nâng cao
 
Tựa bài viết
Tác giả
Năm xuất bản
Tóm tắt
Lĩnh vực
Phân loại
Số tạp chí
 

Bản tin định kỳ
Báo cáo thường niên
Tạp chí khoa học ĐHCT
Tạp chí tiếng anh ĐHCT
Tạp chí trong nước
Tạp chí quốc tế
Kỷ yếu HN trong nước
Kỷ yếu HN quốc tế
Book chapter
Bài báo - Tạp chí
Vol. 17, No. Special issue: ISDS (2025) Trang: 97-105

People with visual impairments often face significant challenges in identifying and accessing product information in their daily lives, particularly when visual cues such as packaging details, labels, or expiration dates are inaccessible. In this paper, we present NaviBlind, a multimodal AI-powered assistive system designed to help visually impaired individuals understand key product details through natural interactions. Our system combines image understanding using Gemini Flash vision models with Vietnamese speech recognition powered by PhoWhisper for extracting information needs directly from user voice commands. By uploading an image of the product and speaking what kind of information is needed, such as name, color, type, or expiry date, the system analyzes the image and returns a concise, structured textual description, which is then converted into Vietnamese speech. To ensure reliability, we incorporate mechanisms to detect uncertain or hallucinated outputs from the vision model, especially in cases of low-quality images. The system is deployed as a user-friendly web application, enabling real-time accessibility for users with limited visual capabilities. Experimental evaluation demonstrates the potential of NaviBlind in promoting autonomy and independence for the visually impaired in everyday shopping and product recognition tasks.

 


Vietnamese | English






 
 
Vui lòng chờ...