Enhancing Video Summarization via Vision-Language Embedding

   Abstract