eshiraishi (Enzo Shiraishi)

upvoted an article about 20 hours ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

2 days ago

• 97

upvoted a paper 4 days ago

A General Theoretical Paradigm to Understand Learning from Human Preferences

Paper • 2310.12036 • Published Oct 18, 2023 • 12

upvoted an article 10 days ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 242

upvoted a collection about 2 months ago

sentence-transformers-from-synthetic-data

Collection

Example of using distilabel to generate synthetic triplets data for fine-tuning a Sentence Transformer model • 4 items • Updated Jun 21 • 21

upvoted an article 4 months ago

Article

Improving Prompt Consistency with Structured Generations

Apr 30

• 52

upvoted a paper 4 months ago

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

Paper • 2405.01535 • Published May 2 • 114

upvoted 3 papers 5 months ago

upvoted a collection 5 months ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Aug 2 • 673

upvoted 10 papers 5 months ago

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 83

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9 • 63

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52

StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows

Paper • 2403.11322 • Published Mar 17 • 1

Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 79

From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Paper • 2309.04269 • Published Sep 8, 2023 • 32

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 7

Meta-Transformer: A Unified Framework for Multimodal Learning

Paper • 2307.10802 • Published Jul 20, 2023 • 43

Challenges and Applications of Large Language Models

Paper • 2307.10169 • Published Jul 19, 2023 • 47

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 9

upvoted 15 papers 6 months ago

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Paper • 2201.11903 • Published Jan 28, 2022 • 9

Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks

Paper • 1901.00032 • Published Dec 31, 2018 • 1

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Paper • 2107.05720 • Published Jul 12, 2021 • 1

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 44

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Paper • 2101.00190 • Published Jan 1, 2021 • 6

Memory-assisted prompt editing to improve GPT-3 after deployment

Paper • 2201.06009 • Published Jan 16, 2022 • 1

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 94

Efficient Few-Shot Learning Without Prompts

Paper • 2209.11055 • Published Sep 22, 2022 • 2

Task-Oriented Dialogue with In-Context Learning

Paper • 2402.12234 • Published Feb 19 • 1

RAGAS: Automated Evaluation of Retrieval Augmented Generation

Paper • 2309.15217 • Published Sep 26, 2023 • 3

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26 • 77

Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 46

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 45

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

Paper • 2205.00445 • Published May 1, 2022 • 1

upvoted 2 papers 7 months ago

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

Paper • 2202.12837 • Published Feb 25, 2022 • 1

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 590

Enzo Shiraishi

AI & ML interests

Organizations

eshiraishi's activity

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

SmolLM - blazingly fast and remarkably powerful

Improving Prompt Consistency with Structured Generations