Bootstrapping Computerised Language Understanding With Semantic Primes

I believe that a child’s need to acquire language is about the intent or a need to communicate ideas. Therefore the ideas come first before language.  By first developing an AI’s self awareness of concepts you can get a mechanism for bootstrapping language acquisition and eventually start to simulate an artificial consciousness.

To behave like a human I believe an AI will need to learn like a child. If this is the case then it would need to learn about the world and then learn language by speaking and listening. Learning to read is then a secondary activity.

There are a set of universal indivisible language concepts (known as semantic primes – see later) that can be thought of as an understanding of the world that cannot be defined with language. These pre/non-verbal concepts need to be learnt from the environment and from the experience of having a consciousness and from social interaction.  If you could define a model for defining each semantic prime using physical observation or some mechanistic knowledge of the world it should be possible to define these concepts to yourself and an AI before you had added any language to describe them.

Semantic primes are grouped by category.  The first category are the “Substantives” (I, YOU, SOMEONE/PERSON, PEOPLE).  To give a robot understanding of these particular words you would need to identify the underlying process and data that gives rise to these word definitions.  In this case the process is “Recognition”.  These “substantive” words are all the outcome of a recognition process.  So to understand the substantives conceptually a robot will need to implement a recognition process that correctly identifies a substantive.  Another group of semantic primes are the relational substantives; these are SOMETHING/THING, BODY, KIND, PART.   These are actually less specialised versions of the first “Substantives” group that just relate to objects rather than people.  I believe all the semantic primes could be understood by identifying their underlying processing and data storage and also by building them into a learning hierarchy.  Therefore “relational substantive primes” would be learnt before the “substantive primes”.

(The following is from Wikipedia)

Semantic primes or semantic primitives are semantic concepts that are innately understood, but cannot be expressed in simpler terms. They represent words or phrases that are learned through practice, but cannot be defined concretely. For example, although the meaning of “touching” is readily understood, a dictionary might define “touch” as “to make contact” and “contact” as “touching”, providing no information if neither of these words are understood.

The concept of innate semantic primes was largely introduced by Anna Wierzbicka‘s book, Semantics: Primes and Universals.

Semantic primes represent universally meaningful concepts, but to have meaningful messages, or statements, such concepts must combine in a way that they themselves convey meaning. Such meaningful combinations, in their simplest form as sentences, constitute the syntax of the language.

Wierzbicka provides evidence that just as all languages use the same set of semantic primes, they also use the same, or very similar syntax. She states: “I am also positing certain innate and universal rules of syntax-not in the sense of some intuitively unverifiable formal syntax a la Chomsky, but in the sense of intuitively verifiable patterns determining possible combinations of primitive concepts(Wierzbicka, 1996).” She gives one example comparing the English sentence, “I want to do this”, with its equivalent in Russian. Although she notes certain formal differences between the two sentence structures, their semantic equivalence emerges from the “….equivalence of the primitives themselves and of the rules for their combination.
This work [of Wierzbicka and colleagues] has led to a set of a highly concrete proposals about a hypothesised irreducible core of all human languages. This universal core is believed to have a fully ‘language-like’ character in the sense that it consists of a lexicon of semantic primitives together with a syntax governing how the primitives can be combined (Goddard, 1998).

The semantic primes by category (categoies shown as a link to their definition) are:

Substantives

I, YOU, SOMEONE/PERSON, PEOPLE

Relational Substantives

SOMETHING/THING, BODY, KIND, PART

Determiners

THIS, THE SAME, OTHER

Quantifiers

ONE, TWO, SOME, ALL, MANY/MUCH

Evaluators

GOOD, BAD

Descriptors

BIG, SMALL

Mental predicates

THINK, KNOW, WANT, FEEL, SEE, HEAR

Speech

SAY, WORDS, TRUE

Actions, Events, Movement, contact

DO, HAPPEN, MOVE

Existence, Possession

THERE IS/EXIST, HAVE

Life and Death

LIVE, DIE

Time

WHEN/TIME, NOW, BEFORE, AFTER, A LONG TIME, A SHORT TIME, FOR SOME TIME, MOMENT

Space

WHERE/PLACE, HERE, ABOVE, BELOW, FAR, NEAR, SIDE, INSIDE, TOUCH (CONTACT)

Logical Concepts

NOT, MAYBE, CAN, BECAUSE, IF

Intensifier, Augmenter

VERY, MORE

Similarity

LIKE/WAY

We have being looking at process modelling and producing a world view on which to hang an understanding of our semantic prime definitions.  We will be looking further at this work and how we can build a software object model / AI that describes the world using conceptual awareness.

Could you help with co-operation input or funding?  See elsewhere on this website for further details.