Grok-1 IT-Times Yizuo Media

漢德百科全書 | 汉德百科全书

Chinese — German

IT-Times Artificial Intelligence

Grok-1

1 7 months ago

First author

Grok-1

Network architecture of Grok-1, large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

Grok-1 is currently designed with the following specifications:

Parameters: 314B
Architecture: Mixture of 8 Experts (MoE)
Experts Utilization: 2 experts used per token
Layers: 64
Attention Heads: 48 for queries, 8 for keys/values
Embedding Size: 6,144
Tokenization: SentencePiece tokenizer with 131,072 tokens
Additional Features:
- Rotary embeddings (RoPE)
- Supports activation sharding and 8-bit quantization
Maximum Sequence Length (context): 8,192 tokens

This image, video or audio may be copyrighted. It is used for educational purposes only. If you find it, please notify us byand we will remove it immediately.

Author

Grok-1

7 months ago