siri

“Siri, do you understand Hebrew?”

  •   Teaching computers to speak Hebrew will improve scanning, machine learning, & more
  •  
     
    ​Communicated by the Nt'l Digital Agency

    The Hebrew language corpus project aims to create an infrastructure for understanding the Hebrew language by a computer. The ability of a computer to “understand” Hebrew will enable a wide variety of uses through natural language processing – including talking in Hebrew to computers and electrical and electronic devices and smart bots, improving the ability to extract insights from free hand texts and scanned documents, machine learning from Hebrew text, automatic translation, etc.

    The current project includes a high-level morphological tagging (“gold tagging”) of about 200,000 words, carried out by the Hebrew Language Academy.

    In contrast to automatic tagging, this is manual tagging by linguists – based upon automatic suggestions from the Academy’s systems – two different taggers for each word and control by a professional tagger acting on behalf of the Academy – hence its importance for training natural language processing applications.

    The corpus is being released in open source code and enables the use of advanced technologies based on natural language also in Hebrew.

    Today, the first batch of about 20,000 words was released for public use, in the worlds of content of the Ministry of Justice and the Bank of Israel.

    The rest of the tagged corpus will be published during the coming year.

    To the Corpus


    Shira Lev Ami, the Director General of the National Digital Agency, stated: The National Digital Agency has taken up the gauntlet to establish a national infrastructure of the Hebrew language tagged for the digital world, which will enable businesses and organizations to produce solutions that will improve the quality of life in the country.

    This infrastructure will further enable the development of NLP solutions in Hebrew, which will allow any computer and digital device an understanding of a conversation in Hebrew, which enables a large variety of applications, and will also help disabled citizens.

    We are expending resources and will continue to expend resources in improving the content world of understanding the Hebrew language, as part of the investment in artificial intelligence-based solutions.

    Tali Ben Yehuda, the Director General of the Hebrew Language Academy, stated: The Academy considers it of major importance to promote natural language processing in Hebrew, in order to ensure the continued use of Hebrew in all areas of life, also in the era of the increasing use of automatic tools.

    The cooperation with the National Digital Agency is intended to enable a quality and open infrastructure for natural language processing in Hebrew, so that it will be possible to develop tools in Hebrew that will not be inferior in quality to those in English.

    The Innovation Unit at the National Digital Agency and the Hebrew Language Academy previously partnered in a pioneer project tagging Hebrew texts for machine learning and this project is the direct continuation of the pioneer project.

    For additional details: keren@digital.gov.il​