这是作者多年以来学习总结的笔记,经整理之后开源于世。考虑到正式出版的时间周期较长,而且书本购买成本高不利于技术广泛传播,因此作者采取开源的形式。 笔记内容仅供个人学习使用,非本人同意不得应用于商业领域。

笔记内容较多,可能有些总结的不到位的地方,欢迎大家探讨。联系方式:huaxz1986@163.com qq: 525875545

另有个人在 github 上的一些内容:

20241212 修订:

新增 CTR Prediction 相关的 5 篇论文:EDCN, GDCN, DCN V3, FINAL, FinalMLP

20230920 修订:

新增 LLM 量化章节,新增 9 篇相关的论文

  • 《Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference》
  • 《Mixed Precision Training》
  • 《The case for 4-bit precision: k-bit Inference Scaling Laws》
  • 《SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models》
  • 《LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale》
  • 《ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers》
  • 《SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot》
  • 《GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers》
  • 《LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models》
  • 20230828 修订:

    7.Transformer(9) 章节新增两篇论文:LIMA、LLAMA2

    新增 PEFT 章节,新增 10 篇关于 LORA 和 ADAPTER 相关的热门论文

  • 《Parameter-Efficient Transfer Learning for NLP》
  • 《BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models》
  • 《LoRA: Low-Rank Adaptation of Large Language Models》
  • 《Towards a Unified View of Parameter-Efficient Transfer Learning》
  • 《AdapterDrop: On the Efficiency of Adapters in Transformers》
  • 《AdapterFusion: Non-Destructive Task Composition for Transfer Learning》
  • 《QLoRA: Efficient Finetuning of Quantized LLMs》
  • 《AdapterHub: A Framework for Adapting Transformers》
  • 《Compacter: Efficient Low-Rank Hypercomplex Adapter Layers》
  • 《MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer》
  • 20230801 修订:

    新增 36 篇关于 Prompt Engineering 的热门论文:

  • 《Chain of Thought Prompting Elicits Reasoning in Large Language Models》
  • 《Least-to-Most Prompting Enables Complex Reasoning in Large Language Models》
  • 《Automatic Chain of Thought Prompting in Large Language Models》
  • 《Self-Consistency Improves Chain of Thought Reasoning in Language Models》
  • 《Large Language Models are Zero-Shot Reasoners》
  • 《Calibrate Before Use: Improving Few-Shot Performance of Language Models》
  • 《What Makes Good In-Context Examples for GPT-3?》
  • 《Making Pre-trained Language Models Better Few-shot Learners》
  • 《It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners》
  • 《Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference》
  • 《GPT Understands, Too》
  • 《P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks》
  • 《Prefix-Tuning: Optimizing Continuous Prompts for Generation》
  • 《The Power of Scale for Parameter-Efficient Prompt Tuning》
  • 《How Can We Know What Language Models Know?》
  • 《Eliciting Knowledge from Language Models Using Automatically Generated Prompts》
  • 《Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity》
  • 《Can language models learn from explanations in context?》
  • 《Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?》
  • 《Multitask Prompted Training Enables Zero-Shot Task Generalization》
  • 《Language Models as Knowledge Bases?》
  • 《Do Prompt-Based Models Really Understand the Meaning of Their Prompts?》
  • 《Finetuned Language Models Are Zero-Shot Learners》
  • 《Factual Probing Is [MASK]: Learning vs. Learning to Recall》
  • 《How many data points is a prompt worth?》
  • 《Learning How to Ask: Querying LMs with Mixtures of Soft Prompts》
  • 《Learning To Retrieve Prompts for In-Context Learning》
  • 《PPT: Pre-trained Prompt Tuning for Few-shot Learning》
  • 《Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm》
  • 《Show Your Work: Scratchpads for Intermediate Computation with Language Models》
  • 《True Few-Shot Learning with Language Models》
  • 《Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning》
  • 《Improving and Simplifying Pattern Exploiting Training》
  • 《MetaICL: Learning to Learn In Context》
  • 《SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer》
  • 《Noisy Channel Language Model Prompting for Few-Shot Text Classification》
  • 20230524 修订:

  • 新增 Transformer 7、8、9 三章,包括:《Scaling Laws for Neural Language Models》、 《Training Compute-Optimal Large Language Models》、LLaMA、GLM、GLM-130B、 GPT-NeoX-20B、Bloom、PaLM、PaLM2、Self-Instruct 等十篇论文。
  • 20230516 修订:

  • 新增 HuggingFace Transformer 应用、Gradio。 所有 HuggingFace Transformer 官方教程和 API , 包括 Tokenizer、Dataset、Trainer、Evaluator、Pipeline、Model、Accelerate、AutoClass、应用,等九章内容
  • 历史更新请参考 这里

    数学基础

    统计学习

    深度学习

    工具

    CRF

    lightgbm

    xgboost

    scikit-learn

    spark

    numpy

    scipy

    matplotlib

    pandas

    huggingface_transformer

    Scala