Since mathematical expressions play fundamental roles in Science, Technology, Engineering and Mathematics (STEM) documents, it is beneficial to extract meanings from formulae. Such extraction enables us to construct databases of mathematical knowledge, search for formulae, and develop a system that generates executable codes automatically.

TeX is widely used to write STEM documents and provides us with a way to
represent meanings of elements in formulae in TeX by macros. As a simple
example, we can define a macro `\def\inverse#1{#1^{-1}}`

, and use it as
`$\inverse{A}$`

in documents to make it clear that the expression means “the
inverse of matrix $A$” rather than “value $A$ to the power of $-1$”. Using
such meaningful representations is useful in practice for maintaining document
sources, as well as converting TeX sources to other formal formats such as
first-order logic and content markup in MathML. However, this manner is
optional and not forced by TeX. As a result, many authors neglect it and write
messy formulae in TeX documents (even with wrong markup).

To make it possible to associate elements in formulae and their meanings instead of authors, recently I began to research on detecting or disambiguating the meaning for each element in formulae by conducting synthetic analyses on mathematical expressions and natural language text. In this presentation, I will show the goal of my research, the approach I’m taking, and the current status of the work.

Date

2019-08-10

16:45
— 17:15

Event

Location

Palo Alto, USA