Semantic Alchemy: Cracking Word2Vec with CBOW and Skip-Gram
Before we had Large Language Models writing poetry, we had to teach computers that “king” and “queen” are related not just by spelling, but by meaning. This is the story of that breakthrough. It’s the moment we stopped counting words and started mapping their souls—turning raw text into a mathematical landscape where math can solve analogies. Welcome to the world of Word2Vec. 🔮 Language models require vector representations of words to capture semantic relationships. Before the 2010s, models used word count-based vector representations that captured only the frequency of words (e.g., One-Hot Encoding). The Problems: 🚧 ...
The DNA of Language: A Deep Dive into LLM Tokenization concepts
Imagine you have to build a house. You cannot build a stable house using only massive boulders as walls (too big), nor can you build one using only tiny pebbles (too small). You need exactly the right-sized bricks. The same analogy applies to linguistics. We need to find strategies to break down petabytes of language data into usable, atomic chunks. In the context of Large Language Models (LLMs), these bricks are called tokens. Tokens enable us to transform a sizable amount of fluid language data into a discrete mathematical language that machines can process. It is the invisible filter at the heart of LLMs through which every prompt is passed and every response is born. ...
My First Post
Welcome to Vectors & Verbs This is a demo post to verify the PaperMod theme setup. Features of this theme: Clean and minimal design Dark mode support Fast loading speed def hello_world(): print("Hello, Hugo!") Stay tuned for more updates!