
Commits · SqueezeAILab/TinyAgent · GitHub
Commit History Commits on Sep 4, 2024 Add pointer to ArXiv (#3) sidjha1 authored Copy full SHA for cc45c0e
GitHub - SqueezeAILab/SqueezedAttention: [ACL 2025] Squeezed …
Squeezed Attention is a method to accelerate attention for long input prompts where a large portion of the input prompt is fixed across successive user queries. Many LLM applications require processing …
GitHub - SqueezeAILab/MultipoleAttention: [NeurIPS 2025] Multipole ...
About [NeurIPS 2025] Multipole Attention for Efficient Long Context Reasoning
SqueezeLLM: Dense-and-Sparse Quantization - GitHub
SqueezeLLM is a post-training quantization framework that incorporates a new method called Dense-and-Sparse Quantization to enable efficient LLM serving. TLDR: Deploying LLMs is difficult due to …
SqueezeAILab/plan-and-act - GitHub
We introduce Plan-and-Act, a framework that enables accurate and reliable long-horizon task solving with explicit planning. We additionally introduce a synthetic data generation method for training the …
ETS/rebase.py at main · SqueezeAILab/ETS · GitHub
ETS: Efficient Tree Search for Inference-Time Scaling - ETS/rebase.py at main · SqueezeAILab/ETS
GitHub - SqueezeAILab/QuantSpec
Feb 22, 2025 · Contribute to SqueezeAILab/QuantSpec development by creating an account on GitHub.
LLM2LLM/GSM8K/config.yaml at main - GitHub
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement - SqueezeAILab/LLM2LLM
Commits · SqueezeAILab/TinyAgent · GitHub
Commit History Commits on Sep 4, 2024 Add pointer to ArXiv (#3) sidjha1 authored
SqueezedAttention/offline_clustering.py at main - GitHub
SqueezedAttention / offline_clustering.py Cannot retrieve latest commit at this time.