人気の記事一覧
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models
Compute Better Spent: Replacing Dense Layers with Structured Matrices
MABViT -- Modified Attention Block Enhances Vision Transformers
Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity