漢德百科全書 | 汉德百科全书
       
Chinese — German
Grok-1
  Grok-1
  1 7 months ago
Grok-1
Network architecture of Grok-1, large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

Grok-1 is currently designed with the following specifications:

  • Parameters: 314B
  • Architecture: Mixture of 8 Experts (MoE)
  • Experts Utilization: 2 experts used per token
  • Layers: 64
  • Attention Heads: 48 for queries, 8 for keys/values
  • Embedding Size: 6,144
  • Tokenization: SentencePiece tokenizer with 131,072 tokens
  • Additional Features:
    • Rotary embeddings (RoPE)
    • Supports activation sharding and 8-bit quantization
  • Maximum Sequence Length (context): 8,192 tokens
This image, video or audio may be copyrighted. It is used for educational purposes only. If you find it, please notify us byand we will remove it immediately.
7 months ago