Grok-1 IT-Times Yizuo Media

Deutsch-Chinesische Enzyklopädie, 德汉百科

German — Chinese

IT-Times Artificial Intelligence

Grok-1

1 8 months ago

First author

Network architecture of Grok-1, large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

Grok-1 is currently designed with the following specifications:

Parameters: 314B
Architecture: Mixture of 8 Experts (MoE)
Experts Utilization: 2 experts used per token
Layers: 64
Attention Heads: 48 for queries, 8 for keys/values
Embedding Size: 6,144
Tokenization: SentencePiece tokenizer with 131,072 tokens
Additional Features:
- Rotary embeddings (RoPE)
- Supports activation sharding and 8-bit quantization
Maximum Sequence Length (context): 8,192 tokens

This image, video or audio may be copyrighted. It is used for educational purposes only. If you find it, please notify us byand we will remove it immediately.

Author

Grok-1

8 months ago