In the beginning was the Word, and the Word was with Chatbot, and the Word was Chatbot

Published on 29 June 2024 | Updated on 19 August 2024

Subscribe to Diplo's Blog

Given the profound importance of language and its various disciplines in technological developments, it is crucial to consider how chatbots function as products of advanced technology. This understanding contributes to recognising how chatbots learn through algorithmic cognition and how they respond effectively and accurately to diverse user queries, reflecting their systems in linguistic studies.

By introducing the argument above, there is little need to mention the importance of ‘the Word’ and, by extension, language and its specific disciplines, and what we humans have achieved over time through our enriched communication systems, especially in technological and diplomatic contexts, where words are an essential and powerful instrument.

Since linguistics, especially nowadays, is an inseparable element of the realm of technology, it is absolutely legitimate to question how chatbots, the offshoots of the latest technology, work. In other words, it is legitimate to question how chatbots learn through digital (algorithmic) cognition and how they accurately and articulately express themselves in response to diverse queries and inputs.

The image shows an illustration of a human-like robot with a speech bubble containing an elipsis.

What makes the human-like cognitive power of deep learning LLMs?

To understand AI and the epicentre of its evolution, chatbots, which interact with people by responding to different prompts, we should delve into the branches of linguistics called semantics and syntax, and the process of learning and elaboration of most diverse and articulated info by chatbots.

The complex understanding of language and how it is assimilated by humans – and in this case, by deep learning machines – was explained as far back as the work of Ferdinand de Saussure.

For that reason, we will explore the cognitive mechanisms underlying semantics and syntax in large language models (LLMs) such as ChatGPT, integrating the theoretical perspectives of one of the most renowned linguistic philosophers, Saussure. By synthesising linguistic theories with contemporary AI methodologies, the aim is to provide a comprehensive understanding of how LLMs process, understand, and generate natural language. What follows is a modest examination of the models’ training processes, data integration, and real-time interaction with users, highlighting the interplay between linguistic theories and AI language assimilation systems.

An overview of Saussure’s studies related to synta(x)gmatic relations and semantics

Ferdinand de Saussure, one of the first linguistic scientists of the 20th century (along with Charles Sanders Peirce and Leonard Bloomfield), wrote an introduction to syntax and semantics in his Course in General Linguistics, where he depicts language as a scientific phenomenon, emphasising the synchronic study of language. This approach focuses on its current state rather than its historical evolution, in a structuralist view, with syntax and semantics as fundamental components of its structure.

Syntax

Syntax, within this framework, is a grammar discipline that represents and explains the systematic and linear arrangement of words and phrases to form meaningful sentences within a given language. Saussure views syntax as an essential aspect of language, an abstract system that encompasses grammar, vocabulary, and rules. He argues that syntax operates according to inherent principles and conventions established within a linguistic community, rather than being governed by individual speakers. His structuralist approach to linguistics highlights the interdependence between syntax and other linguistic elements, such as semantics, phonology, and morphology, within the overall structure of language.

The image shows a colourful artists illustration of a language model, depicted as a complex chip-like structure with many nodes, inputs, and outputs.

Semantics

Semantics is a branch of linguistics and philosophy concerned with the study of meaning in language. It explores how words, phrases, sentences, and texts convey meaning and how interpretation is influenced by context, culture, and usage. Semantics covers various issues, including the meaning of words (lexical semantics), the meaning of sentences (compositional semantics), and the role of context in understanding language (pragmatics).

However, one of Saussure’s biggest precepts within semantics posits that language is a system of signs composed of the signifier (sound/image) and the signified (concept). This dyadic structure is crucial for understanding how LLMs process the meaning of words and their possible ambiguity.

How do chatbots cognise semantics and syntax in linguistic processes?

Chatbots’ processing and understanding of language usage involves several key steps:

Training on vast amounts of textual data from the internet to predict the next word in a sequence
Tokenisation to divide the text into smaller units
Learning relationships between words and phrases for semantic understanding
Using vector representations to recognise similarities and generate contextually relevant responses
Leveraging transformer architecture to efficiently process long contexts and complex linguistic structures

Although it does not learn in real time, the model is periodically updated with new data to improve performance, enabling it to generate coherent and useful responses to user queries.

As explained earlier, in LLMs, words and phrases are tokenised and transformed into vectors within a high-dimensional space. These vectors function similarly to Saussure’s signifiers, with their positions and relationships encoding meaning (the signified). Thus, within the process of ‘Tokenisation and Embedding’, LLMs tokenise text into discrete units (signifiers) and map them to embeddings that capture their meanings (signified). The model learns these embeddings by processing vast amounts of text, identifying patterns and relationships analogous to Saussure’s linguistic structures.

Chatbots’ ability to understand and generate text relies on their grasp of semantics (meaning) and syntax (structure). They process semantics through contextual word embeddings that capture meanings based on usage, an attention mechanism that weighs word importance in context, and layered contextual understanding that handles polysemy and synonymy. The model is pre-trained on general language patterns and fine-tuned on specific datasets for enhanced semantic comprehension. For syntax, it uses positional encoding to understand word order, attention mechanisms to maintain syntactic coherence, layered processing to build complex structures, and probabilistic grammar learning from vast text exposure. Tokenisation and sequence modelling help track dependencies and coherence, while the transformer model integrates syntax and semantics at each layer, ensuring that responses are both meaningful and grammatically correct. Training on diverse datasets further enhances its ability to generalise across various language uses, making the chatbot a powerful natural language processing tool.

The image shows an illustration of a complex chip-like structure, with multiple paths labelled 'hidden layers'.

An interesting invention

Recently, researchers in the Netherlands developed an AI platform capable of recognising sarcasm, which was presented at the Acoustical Society of America and Canadian Acoustical Association meeting. By training a neural network with the Multimodal Sarcasm Detection Dataset (MUStARD) using video clips and text from sitcoms like Friends and The Big Bang Theory, the large language model accurately detected sarcasm in about 75% of unlabelled exchanges.

Sarcasm generally takes the form of a, linguistically speaking, layered and ironic remark, often rooted in humour, that is intended to mock or satirise something. When a speaker is being sarcastic, they say something different from what they actually mean, and that’s why it is hard for a large language model to detect such nuances in someone’s speech.

This process leverages deep learning techniques that analyse both syntax and semantics, and the concepts of syntagma and idiom, to understand the layered structure and meaning of language and how comprehensive the acquisition of human speech by an LLM is.

The image shows a colourful illustration of two people in conversation, with one laughing.

By integrating Saussure’s linguistic theories with the cognitive mechanisms of large language models, we gain a deeper understanding of how these models process and generate language. The interplay between structural rules, contextual usage, and the fluidity of meaning partially depicts the sophisticated performance of LLMs’ language generation. This synthesis not only illuminates the inner workings of contemporary AI systems but also reinforces the enduring relevance of classical linguistic theories in the age of AI.

The blog post was first published on DigWatch.

Events Blogs Resources

AI and diplomacy – Workshop at ITU

16 Jun 25 - 16 Jun 25Geneva, Switzerland

Introducing the WSIS+20 for the Asia Pacific Internet Community

03 Jun 25 - 03 Jun 25Online

Diplo/GIP at IGF 2025

23 Jun 25 - 27 Jun 25Lillestrøm, Norway

Tech attache briefing: UN80 Initiative, AI, and digital governance

28 May 25 - 28 May 25Geneva - In Situ

Expert Workshop on the Rule of Law and Human Rights Aspects of Using Artificial Intelligence for Counter-Terrorism Purposes

08 May 25 - Geneve Centre for Security Policy

Swiss Plateforme Tripartite: Meeting on WSIS+20

06 May 25 - 06 May 25

WSIS+20 review: What’s in it for Africa?

07 May 25 - 07 May 25Geneva

Trump and tech: After 100 days

30 Apr 25 - 30 Apr 25Online

AI Apprenticeship for International Organisations blended course

29 Apr 25 - 29 Apr 25Geneva and online

GITEX Africa 2025

14 Apr 25 - 16 Apr 25

Demystifying AI: How to prepare international organisations for AI transformation?

29 Apr 25 - 29 Apr 25Geneva

Tech attache briefing: WSIS+20 and AI governance negotiations – Updates and next steps

16 Apr 25 - 16 Apr 25Geneva - In Situ

AI and Magical Realism: When technology blurs the line between wonder and reality

The challenges of governing artificial intelligence often feel like something out of a Gabriel García Márquez novel, where the extraordinary blends seamlessly with the everyday, and the line between[...]

Jovan Kurbalija

27 Jun, 2025

AI in Sophie’s world: How a philosophy book can help us govern AI

As we convene in Oslo for the Internet Governance Forum, we reflect on the philosophical insights from Jostein Gaarder's "Sophie’s World." The novel's exploration of identity and constructed reality[...]

Jovan Kurbalija

21 Jun, 2025

Advancing Swiss AI Trinity: Zurich’s entrepreneurship, Geneva’s governance, and communal subsidiarity

Switzerland can inspire global AI transformation by leveraging its unique strengths: Zurich’s entrepreneurial spirit, Geneva’s governance expertise, and a focus on communal subsidiarity. This "AI [...]

Jovan Kurbalija

15 Jun, 2025

EU Digital Diplomacy: Geopolitical shift from focus on values to economic security

The EU's International Digital Strategy 2025 shifts focus from a values-centric approach to prioritizing geopolitical and economic security. While it retains a commitment to human rights, the new stra[...]

Jovan Kurbalija

10 Jun, 2025

Empowering communities through bottom-up AI: The example of ThutoHealth

In Botswana, a silent epidemic claims nearly half of all lives. Hypertension, diabetes, cancer, and other non-communicable diseases (NCDs) are responsible for 46% of deaths nationwide—a staggering s[...]

DiploFoundation

26 May, 2025

What can we learn from 160 years of tech diplomacy at ITU?

On May 17, 1865, the International Telecommunication Union (ITU) was founded by 20 European states to streamline telegraph messaging across borders, highlighting the need for multilateral cooperation [...]

Jovan Kurbalija

17 May, 2025

Part 1: An introduction to digital twins

When Spain & Portugal went dark, it wasn't just lights that failed. It was a reminder: technology isn't just a tool – it's the system we live in.[...]

Anita Lamprecht

14 May, 2025

Part 7: ‘Converging realities: Embedding governance through digital twins’

The metaverse is no longer a question of ‘what if’ – it’s already being built. Digital twins, embedded governance, and the collapse of the digital–physical divide mark the next frontier.[...]

Anita Lamprecht

05 May, 2025

Tech continuity in President Trump’s first 100 days

During President Trump’s first 100 days, technology policy exhibited continuity rather than disruption, with a focus on AI and digital regulation characterized by incremental adjustments. Only 9 of [...]

Jovan Kurbalija

27 Apr, 2025

From geopolitics to classrooms: The hopeful side of the US-China AI race

The competition between the US and China in AI education is emerging as a vital battleground amidst geopolitical tensions. Both nations are prioritizing AI education to prepare future generations for [...]

Jovan Kurbalija

27 Apr, 2025

Politeness in 2025: Why are we so kind to AI?

A Fortune study shows that about 80% of users in the UK and USA use polite language, like "please" and "thank you," when interacting with AI. This behavior reflects deep-rooted psychological and cultu[...]

Jovan Kurbalija

23 Apr, 2025

Linguists in the AI era: From resistance to renaissance

In the context of Geneva's multilingual landscape, the rise of AI has sparked both concern and opportunity within the linguistic community. While AI will automate many translation and interpretation t[...]

Jovan Kurbalija

18 Apr, 2025

2025

The latest from Diplo and GIP

Tailor your subscription to your interests, from updates on the dynamic world of digital diplomacy to the latest trends in AI.

Subscribe to more Diplo and Geneva Internet Platform newsletters!

Subscribe now

Trending in Diplo Academy

Trending in Resources

Trending in Topics

Courses & Programmes

Faculty & Alumni

Publications

Research

Trending in Blogs

Diplo Events

DigWatch Events

Trending Projects

Contact us

Social icons

In the beginning was the Word, and the Word was with Chatbot, and the Word was Chatbot

Contents

See also

Subscribe to Diplo's Blog

What makes the human-like cognitive power of deep learning LLMs?

An overview of Saussure’s studies related to synta(x)gmatic relations and semantics

How do chatbots cognise semantics and syntax in linguistic processes?

An interesting invention

The latest from Diplo and GIP

Diplo: Effective and inclusive diplomacy

Diplo on Social