人気の記事一覧
Toward a Theory of Tokenization in LLMs
#422 テクノロジーネタ~Command R+はトークナイザーもすごかった
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese
Biomedical Language Models are Robust to Sub-optimal Tokenization