“StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control” StreamMultiDiffusion可以根据用户指定的区域,结合多文本提示,生成...
2024-04-28 388

“PoseAnimate: Zero-shot high fidelity pose controllable character animation”论文地址:https://arxiv.org/pdf/2404.13680.pdf 摘要 PoseAnimate是一个...
2024-04-28 458

“DivAvatar: Diverse 3D Avatar Generation with a Single Prompt” 近日,阿里发表了DivAvatar,解决了当前头像方法中普遍存在的多样性挑战。DivAvatar能够从单个文本...
2024-04-19 570

“MobileVLM V2: Faster and Stronger Baseline for Vision Language Model” 大模型的轻量化已经成为了业界追逐的热点,近日,美团、浙大发表了MobileVLM V2。MobileVLM ...
2024-04-19 715

“TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document” 由华科和金山联合开发的Monkey多模态大模型早前已被人工智能领域国际顶级会议CVPR2024...
2024-04-19 599

“SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion” Stability AI日前发布了用于3D视频模型Stable Video 3D...
2024-04-19 845

“RiskLabs: Predicting Financial Risk Using Large Language Model Based on Multi-Sources Data” 论文地址:https://arxiv.org/pdf/2404.07452.pdf 摘要 ...
2024-04-19 458

“Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention” 传统的Transformer模型在处理长序列数据时,往往受到内存和计算资源的限...
2024-04-19 399

项目主页:https://hiteshk03.github.io/Pix2Gif/ 论文地址:https://arxiv.org/pdf/2403.04634 Github地址:   摘要 Pix2Gif是一个用于图像生成GIF的运动引导扩散...
2024-04-19 534

“Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models”   近日,香港中文大学终身教授贾佳亚团队发表了Mini-Gemini多模态模型,该模型在多模...
2024-04-18 351
显示验证码