Deutsch-Chinesische Enzyklopädie, 德汉百科
       
German — Chinese
Grok-1 Deutscher Wortschatz
  1 8 months ago
Network architecture of Grok-1, large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

Grok-1 is currently designed with the following specifications:

  • Parameters: 314B
  • Architecture: Mixture of 8 Experts (MoE)
  • Experts Utilization: 2 experts used per token
  • Layers: 64
  • Attention Heads: 48 for queries, 8 for keys/values
  • Embedding Size: 6,144
  • Tokenization: SentencePiece tokenizer with 131,072 tokens
  • Additional Features:
    • Rotary embeddings (RoPE)
    • Supports activation sharding and 8-bit quantization
  • Maximum Sequence Length (context): 8,192 tokens
This image, video or audio may be copyrighted. It is used for educational purposes only. If you find it, please notify us byand we will remove it immediately.
8 months ago