Zephyr-7B-βを試した

2023年10月29日 20:22

Zephyr-7B-βをαと比較しました。

0. 環境

OS：Windows
CPU：Intel(R) Core i9-13900KF
RAM：128GB
GPU：RTX 4090

1. Arxivの"NLP"論文のabstractを要約

Arxivの"NLP"最新論文の情報を収集

import arxiv

arxiv_query = "NLP"
search = arxiv.Search(
        query=arxiv_query,
        max_results=5,
        sort_by=arxiv.SortCriterion.SubmittedDate,
)

titles, abstracts, urls = [], [], []
for result in search.results():
    titles.append(result.title)
    abstracts.append(result.summary)
    urls.append(result.pdf_url)

今回使用する論文は下記5件
”torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP"
"1D-Touch: NLP-Assisted Coarse Text Selection via a Semi-Direct Gesture
The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks"
"LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation"
"De-novo Chemical Reaction Generation by Means of Temporarily Convolutional Neural Networks"

Zephyr-7Bの各モデルを準備

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")
pipe_a = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-alpha", torch_dtype=torch.bfloat16, device_map="auto")

abstract部分を要約

from tqdm.notebook import tqdm
 #Zephyr -7B-βによる要約list
summary_beta = []
for text in tqdm(abstracts):
    messages = [
        {
            "role": "system",
            "content": "You are an excellent technician, have a deep understanding of patents in particular, and always give careful, accurate and clear replies.",
        },
        {"role": "user", "content": f"Please summarize the patent specification section below, using bullet points to identify the main purpose and means used.\n-----\n{text}"},
    ]
    prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    outputs = pipe(prompt, max_new_tokens=1024, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
    summary = outputs[0]["generated_text"][outputs[0]["generated_text"].find('<|assistant|>')+14:]
    summary_beta.append(summary)
 #Zephyr -7B-αによる要約list
summary_alpha = []
for text in tqdm(abstracts):
    messages = [
        {
            "role": "system",
            "content": "You are an excellent technician, have a deep understanding of patents in particular, and always give careful, accurate and clear replies.",
        },
        {"role": "user", "content": f"Please summarize the patent specification section below, using bullet points to identify the main purpose and means used.\n-----\n{text}"},
    ]
    prompt = pipe_a.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    outputs = pipe_a(prompt, max_new_tokens=1024, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
    summary = outputs[0]["generated_text"][outputs[0]["generated_text"].find('<|assistant|>')+14:]
    summary_alpha.append(summary)

Zephyr-7B-βによる要約

for i in range(len(titles)):
    print(f"Title : {titles[i]}")
    print(summary_beta[i])
    print("\n")

Title : torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP
- Purpose: To provide a significantly upgraded version of torchdistill, a deep learning framework for reproducible knowledge distillation experiments, which supports more tasks using third-party libraries.
- Means: Modular-driven coding-free framework, initial release supported only image classification and object detection tasks, upgraded to support more tasks, reproduce GLUE benchmark results of BERT models using a script based on the upgraded torchdistill, harmonizing with various Hugging Face libraries, publish all 27 fine-tuned BERT models and configurations on Hugging Face, widely used model weights in research communities, reimplement popular small-sized models and new knowledge distillation methods, perform additional experiments for computer vision tasks.

Title : 1D-Touch: NLP-Assisted Coarse Text Selection via a Semi-Direct Gesture
1. Purpose: 1D-Touch is a new text selection method that complements traditional carets-based sub-word selection by facilitating the selection of semantic units of words and above.
2. Means: A simple vertical slide gesture is used to expand and contract a selection area from a word. Expansion can be by words or by semantic chunks ranging from sub-phrases to sentences.
3. Concept: Shifts the concept of text selection from defining a range by locating the first and last words to a dynamic process of expanding and contracting a textual semantic entity.
4. Variants: Two variants are prototyped and tested - WordTouch, which offers a straightforward word-by-word expansion, and ChunkTouch, which leverages NLP to chunk text into syntactic units, allowing the selection to grow by semantically meaningful units in response to the sliding gesture.
5. Evaluation: Focused on coarse-grained selection tasks, 1D-Touch shows a 20% improvement over the default word-snapping selection method on Android.

Title : The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
- Purpose: Investigating the impact of dataset design choices on conclusions drawn about compositional generalization abilities of NLP models
- Means:
1. Six modeling approaches are evaluated across four datasets
2. Datasets are split according to eight compositional splitting strategies
3. Model rankings are determined based on 18 compositional generalization splits in total
4. Results show that:
a. Datasets designed to evaluate compositional generalization rank modeling approaches differently
b. Human-generated datasets align better with each other than with synthetic datasets, or than synthetic datasets among themselves
c. Whether datasets are sampled from the same source is more predictive of the resulting model ranking than whether they maintain the same interpretation of compositionality
d. Which lexical items are used in the data can strongly impact conclusions
5. Overall, the study demonstrates the need for more rigorous standards for establishing the validity of evaluation sets in NLP research.

Title : LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation
- Purpose: Introduce LightLM, a lightweight Transformer-based language model for generative recommendation that addresses the unique demand for personalized generative modeling in recommendation tasks.
- Means:
- Light-weight deep and narrow Transformer architecture specifically tailored for direct generation of recommendation items, as the input predominantly consists of short tokens that are well-suited for the model's capacity.
- User and item ID indexing methods, Spectral Collaborative Indexing (SCI) and Graph Collaborative Indexing (GCI), that enable the deep and narrow Transformer architecture to outperform large-scale language models for recommendation.
- Constrained generation process for generative recommenders to address the hallucination problem of generating items as output.
- Results: LightLM outperforms various competitive baselines in terms of both recommendation accuracy and efficiency on real-world datasets. Code available at https://github.com/dongyuanjushi/LightLM.

Title : De-novo Chemical Reaction Generation by Means of Temporarily Convolutional Neural Networks
- Purpose: Combining Recurrent Neural Networks (RNN) and Temporarily Convolutional Neural Networks (TCN) for de novo reaction generation using a new representation of reactions called CGRSmiles with atom mapping directly incorporated.
- Means:
- RNN: Autoregressive properties commonly used in language modeling and SMILES generation due to their autoregressive nature.
- TCN: Similar properties as RNN with a wide receptive field while adhering to the causality required for natural language processing (NLP).
- Combination: Both latent representations expressed through TCN and RNN lead to better performance compared to RNN alone.
- Fine-tuning: Different protocols significantly impact the generative scope of the model during transfer learning on a dataset of interest.

Zephyr-7B-αによる要約

for i in range(len(titles)):
    print(f"Title : {titles[i]}")
    print(summary_alpha[i])
    print("\n")

Title : torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP
- Purpose: To present a significantly upgraded version of torchdistill, a deep learning framework for reproducible knowledge distillation experiments
- Main means used:
- Modular-driven coding-free framework
- Supports image classification and object detection tasks in initial release
- Upgraded to support more tasks with third-party libraries
- Reproduced GLUE benchmark results of BERT models using a script based on the upgraded torchdistill
- Published all 27 fine-tuned BERT models and configurations on Hugging Face
- Model weights already widely used in research communities
- Reimplement popular small-sized models and new knowledge distillation methods for computer vision tasks
- Additional experiments performed for computer vision tasks.

Title : 1D-Touch: NLP-Assisted Coarse Text Selection via a Semi-Direct Gesture
- Purpose: Introducing 1D-Touch, a novel text selection method that facilitates the selection of semantic units of words and above.
- Means: A simple vertical slide gesture to expand and contract a selection area from a word.
- Purpose: Shifting the concept of text selection from defining a range by locating the first and last words towards a dynamic process of expanding and contracting a textual semantic entity.
- Variant 1: WordTouch, which offers a straightforward word-by-word expansion.
- Variant 2: ChunkTouch, which leverages NLP to chunk text into syntactic units, allowing the selection to grow by semantically meaningful units in response to the sliding gesture.
- Purpose: Improving coarse-grained text selection tasks, particularly on word and phrase levels.
- Means: Prototyping and testing two variants, WordTouch and ChunkTouch, for comparison with the default word-snapping selection method on Android.
- Results: A 20% improvement over the default word-snapping selection method on Android.

Title : The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
- Purpose: To investigate the impact of dataset design choices on the conclusions drawn about the capabilities of NLP models in the domain of compositional generalization.
- Means: Six modeling approaches are examined across four datasets, split according to eight compositional splitting strategies. The models are ranked by 18 compositional generalization splits in total.
- Results:
- The datasets, designed to evaluate compositional generalization, rank modeling approaches differently.
- Human-generated datasets align better with each other than with synthetic datasets or with other synthetic datasets.
- Whether datasets are sampled from the same source is more predictive of the resulting model ranking than whether they maintain the same interpretation of compositionality.
- The use of specific lexical items can strongly impact conclusions.
- Overall, the results demonstrate that much work remains to be done in assessing whether popular evaluation datasets measure what they intend to measure, and suggest that elucidating more rigorous standards for establishing the validity of evaluation sets could benefit the field.

Note: The original text may contain additional information or nuances not captured in this summary.

Title : LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation
- Purpose: Present LightLM, a lightweight Transformer-based language model for generative recommendation.
- Means:
- Introducing a light-weight deep and narrow Transformer architecture for direct generation of recommendation items.
- Observing that language models do not have to be too wide for recommendation tasks, as input consists of short tokens that are well-suited for the model's capacity.
- Devising user and item ID indexing methods: Spectral Collaborative Indexing (SCI) and Graph Collaborative Indexing (GCI).
- Proposing a constrained generation process for generative recommenders to address the hallucination problem.
- Results: LightLM outperforms various competitive baselines in terms of both recommendation accuracy and efficiency.
- Link to code: https://github.com/dongyuanjushi/LightLM.

Title : De-novo Chemical Reaction Generation by Means of Temporarily Convolutional Neural Networks
1. Purpose: Presenting a combination of two networks, Recurrent Neural Networks (RNN) and Temporarily Convolutional Neural Networks (TCN), for de novo reaction generation using CGRSmiles with atom mapping directly incorporated.
2. Main means:
a. RNN: Autoregressive properties, frequently used in language modeling with direct application to SMILES generation.
b. TCN: Similar properties to RNN with wide receptive field, obeying causality required for natural language processing (NLP).
c. Combination of both RNN and TCN results in an overall better performance compared to RNN alone.
d. Fine-tuning protocols have a profound impact on the generative scope of the model when applied on a dataset of interest via transfer learning.

2. Arxivの"NLP"論文本文を要約

さきほどの5文献のうち1件の本文を取得

from langchain.document_loaders import OnlinePDFLoader

data = OnlinePDFLoader(urls[0]).load()
pdf_text = data[0].page_content

本文を適当に分割

def split_text(pdf_text, length):
    full_text = pdf_text.strip()
    chunks = []
    while len(full_text) > 0:
        first_part = full_text[:length]

        last_period_index = first_part.rfind(".")
        if last_period_index != -1:
            first_part = first_part[:last_period_index+1]

        chunks.append(first_part)
        remaining_text = full_text[len(first_part):]
        if len(remaining_text) > 0:
            next_part = remaining_text[:length]

            last_period_index = next_part.rfind(".")
            if last_period_index != -1:
                next_part = next_part[:last_period_index+1]
            chunks.append(next_part)
            full_text = remaining_text[len(next_part):]
        else:
            break

    return chunks

promptに入れる

chunks = split_text(pdf_text, length=15000)

messages_list = []
for i in range(len(chunks)):
    messages = [
        {
            "role": "system",
            "content": "You are an excellent technician, have a deep understanding of patents in particular, and always give careful, accurate and clear replies.",
        },
        {"role": "user", "content": f"The following text is part of a technical paper. You are a good technician and will summarize it. In particular, please clearly state the purpose of the paper, approach used to achieve the purpose, and what you infer to be the conclusion of the paper. If references are listed, such as author names or journal names, please delete them without summarizing. \n-----\n{chunks[i]}"},
    ]
    messages_list.append(messages)

要約

 #Zephyr -7B-βによる要約
main_summary_beta = []
for messages in tqdm(messages_list):
    try:
        prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        outputs = pipe(prompt, max_new_tokens=1024, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
        main_summary = outputs[0]["generated_text"][outputs[0]["generated_text"].find('<|assistant|>')+14:]
        main_summary_beta.append(main_summary)
    except:
        main_summary_beta.append("")

text = "\n".join(main_summary_beta)
messages = [
        {
            "role": "system",
            "content": "You are an excellent technician, have a deep understanding of patents in particular, and always give careful, accurate and clear replies.",
        },
        {"role": "user", "content": f"Below is the summary text of the patent. Please itemize the main purpose and means used specifically. \n-----\n{text}"},
    ]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=1024, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
main_summary_beta_l = outputs[0]["generated_text"][outputs[0]["generated_text"].find('<|assistant|>')+14:]
 #Zephyr -7B-αによる要約
main_summary_alpha = []
for messages in tqdm(messages_list):
    try:
        prompt = pipe_a.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        outputs = pipe_a(prompt, max_new_tokens=1024, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
        main_summary = outputs[0]["generated_text"][outputs[0]["generated_text"].find('<|assistant|>')+14:]
        main_summary_alpha.append(main_summary)
    except:
        main_summary_alpha.append("")

text = "\n".join(main_summary_alpha)
messages = [
        {
            "role": "system",
            "content": "You are an excellent technician, have a deep understanding of patents in particular, and always give careful, accurate and clear replies.",
        },
        {"role": "user", "content": f"Below is the summary text of the patent. Please itemize the main purpose and means used specifically. \n-----\n{text}"},
    ]
prompt = pipe_a.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe_a(prompt, max_new_tokens=1024, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
main_summary_alpha_l = outputs[0]["generated_text"][outputs[0]["generated_text"].find('<|assistant|>')+14:]

出力（Zephyr-7B-β）

print(main_summary_beta_l)

Main Purpose:
The main purpose of this paper is to provide an overview of recent advancements in knowledge distillation, a technique used to improve the performance and efficiency of machine learning models by transferring knowledge from a pre-trained teacher model to a smaller student model. The paper discusses various approaches to knowledge distillation, its applications in different domains, and the results of experiments.

Means Used:
1. Review of existing approaches: The paper discusses model compression, curriculum learning, and self-supervision as different approaches to knowledge distillation.
2. Analysis of results: The authors analyze the results of experiments on different tasks and datasets, highlighting the strengths and weaknesses of each approach.
3. Discussion of applications: The paper explores the use of knowledge distillation in natural language processing, image recognition, and other domains.
4. Overview of future directions: The authors provide insights into the future of knowledge distillation and its potential impact on the field of machine learning.
5. Introducing torchdistill: The paper presents torchdistill, a modular-driven coding-free deep learning framework for reproducible knowledge distillation studies. The framework is upgraded from the initial release, supporting more tasks with third-party packages of user's choice.
6. Reproducing BERT benchmark results: The paper demonstrates the upgraded framework by reproducing the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill and harmonizing with various Hugging Face libraries. All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face, and the model weights have already been widely used in research communities.
7. Comparison of methods: The authors compare the standard knowledge distillation method with many state-of-the-art knowledge distillation methods for a popular teacher-student pair and highlight that the standard method outperforms many of these methods.

Overall, the paper provides a comprehensive overview of knowledge distillation, its approaches, techniques, and applications, and demonstrates the usefulness of torchdistill for reproducible knowledge distillation studies.

一応和訳を下記に

主な目的
本論文の主な目的は、知識蒸留における最近の進歩の概要を提供することである。知識蒸留は、事前に訓練された教師モデルから、より小さな生徒モデルに知識を転送することによって、機械学習モデルの性能と効率を向上させるために使用される技術である。本稿では、知識蒸留の様々なアプローチ、様々なドメインにおける応用、実験結果について述べる。

使用した手段
1. 既存のアプローチのレビュー：本論文では、知識抽出の様々なアプローチとして、モデル圧縮、カリキュラム学習、自己監視について議論する。
2. 結果の分析：著者らは、異なるタスクとデータセットでの実験結果を分析し、各アプローチの長所と短所を強調する。
3. アプリケーションの考察本論文では、自然言語処理、画像認識、その他の領域における知識抽出の利用を探求している。
4. 今後の方向性知識蒸留の将来と、機械学習分野への潜在的な影響についての洞察を提供する。
5. 5.torchdistillの紹介：本論文では、再現可能な知識蒸留研究のためのモジュール駆動型コーディングフリー深層学習フレームワークであるtorchdistillを紹介する。このフレームワークは初期リリースからアップグレードされ、ユーザが選択したサードパーティパッケージでより多くのタスクをサポートしている。
6. BERTベンチマーク結果の再現本論文では、アップグレードされた torchdistill をベースとし、様々な Hugging Face ライブラリと調和させたスクリプトを用いて、BERT モデルの GLUE ベンチマークの結果を再現することで、アップグレードされたフレームワークを実証する。結果を再現するために微調整された 27 個の BERT モデルと構成はすべて Hugging Face で公開されており、モデルの重みはすでに研究コミュニティで広く使用されている。
7. 手法の比較：著者らは、一般的な教師と生徒のペアについて、標準的な知識抽出手法を多くの最先端の知識抽出手法と比較し、標準的な手法がこれらの手法の多くを凌駕していることを強調している。

全体として、本論文は知識蒸留、そのアプローチ、テクニック、アプリケーションの包括的な概要を提供し、再現可能な知識蒸留研究のためのtorchdistillの有用性を実証している。

出力（Zephyr-7B-α）

print(main_summary_alpha_l)

Purpose:
The main purpose of this paper is to present an updated version of torchdistill, a modular-driven coding-free deep learning framework for reproducible knowledge distillation studies. The authors aim to make torchdistill less dependent on torchvision and support more tasks with third-party packages of user's choice.

Approach:
The authors generalize some of the key components in the framework and export task-specific implementations to executable scripts and local packages. This allows for more flexibility in PyYAML configurations, such as importing arbitrary local packages and instantiating an arbitrary class with a log message.

Conclusion:
The upgraded framework is demonstrated by reproducing the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill and harmonizing with various Hugging Face libraries. The standard knowledge distillation method can outperform many state-of-the-art knowledge distillation methods for a popular teacher-student pair.

Main Means:
The main means used by the authors to achieve their purpose is to generalize key components in the framework and export task-specific implementations to executable scripts and local packages. This allows for more flexibility in PyYAML configurations, which provides more options for the user to customize the framework to their specific needs. Additionally, the authors reimplements popular small-sized models and popular knowledge distillation methods for computer vision tasks, which allows for more versatility in the framework's application.

和訳

目的
本論文の主な目的は、再現可能な知識抽出研究のためのモジュール駆動型コーディングフリー深層学習フレームワークであるtorchdistillの更新版を提示することである。著者らは、torchdistillをtorchvisionに依存しないようにし、ユーザが選択したサードパーティパッケージでより多くのタスクをサポートすることを目指している。

アプローチ
著者らは、フレームワークの主要なコンポーネントの一部を一般化し、タスク固有の実装を実行可能なスクリプトやローカルパッケージにエクスポートする。これにより、任意のローカルパッケージをインポートしたり、ログメッセージで任意のクラスをインスタンス化したりといった、PyYAMLのより柔軟な設定が可能になります。

結論
アップグレードされたフレームワークを、アップグレードされたtorchdistillに基づくスクリプトを用いてBERTモデルのGLUEベンチマーク結果を再現し、様々なHugging Faceライブラリと調和させることで実証した。標準的な知識蒸留法は、人気のある教師と生徒のペアに対して、多くの最先端の知識蒸留法を凌駕することができる。

主な手段
著者らが目的を達成するために用いた主な手段は、フレームワークの主要なコンポーネントを一般化し、タスク固有の実装を実行可能なスクリプトやローカルパッケージにエクスポートすることである。これにより、PyYAML の設定に柔軟性を持たせることができ、ユーザが特定のニーズに合わせてフレームワークをカスタマイズするための選択肢を増やすことができます。さらに、著者らは、コンピュータビジョンタスクによく使われる小型モデルや、よく使われる知識抽出法を再実装し、フレームワークの応用の汎用性を高めている。

正直性能としては、αで十分すぎる品質であったので、ちょっと試した程度では違いはわかりませんでした。

それにしても、やはりZephy-7Bは素晴らしい性能ですね。

この記事が気に入ったらサポートをしてみませんか？