StreamMultiDiffusion:交互式实时AI绘画系统,支持多文本提示+局部重绘
“StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control”
StreamMultiDiffusion可以根据用户指定的区域,结合多文本提示,生成...
PoseAnimate:首个高质量零样本角色动画生成方法
“PoseAnimate: Zero-shot high fidelity pose controllable character animation”论文地址:https://arxiv.org/pdf/2404.13680.pdf
摘要
PoseAnimate是一个...
阿里发表DivAvatar,简单提示即可生成多样化3D人物头像,单张V100即可运行
“DivAvatar: Diverse 3D Avatar Generation with a Single Prompt”
近日,阿里发表了DivAvatar,解决了当前头像方法中普遍存在的多样性挑战。DivAvatar能够从单个文本...
美团&浙大发表MobileVLM V2,端侧实时运行,更快更强的轻量化VLM
“MobileVLM V2: Faster and Stronger Baseline for Vision Language Model”
大模型的轻量化已经成为了业界追逐的热点,近日,美团、浙大发表了MobileVLM V2。MobileVLM ...
华科&金山发表TextMonkey,通用文档理解大模型,刷新多个SOTA
“TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document”
由华科和金山联合开发的Monkey多模态大模型早前已被人工智能领域国际顶级会议CVPR2024...
Stability AI发布Stable Video 3D,单张图片即可生成高质量3D视频,模型已开源,单张4090即可运行
“SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion”
Stability AI日前发布了用于3D视频模型Stable Video 3D...
RiskLabs:基于多源数据的大模型金融风险预测方法
“RiskLabs: Predicting Financial Risk Using Large Language Model Based on Multi-Sources Data”
论文地址:https://arxiv.org/pdf/2404.07452.pdf
摘要
...
谷歌发表Infini-Transformer,开启无限上下文Transformer新纪元
“Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention”
传统的Transformer模型在处理长序列数据时,往往受到内存和计算资源的限...
微软发表Pix2Gif,最佳表情包生成器,单张图像生成逼真GIF
项目主页:https://hiteshk03.github.io/Pix2Gif/
论文地址:https://arxiv.org/pdf/2403.04634
Github地址:
摘要
Pix2Gif是一个用于图像生成GIF的运动引导扩散...
港中文发表Mini-Gemini,助力低资源学术多模态大模型研究
“Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models”
近日,香港中文大学终身教授贾佳亚团队发表了Mini-Gemini多模态模型,该模型在多模...