accv 2009

The 1st International Workshop on Video Event Categorization, Tagging and Retrieval (VECTaR2009)

In Conjunction with ACCV 2009

Xi'an, China, Sep.24, 2009



Keynote Speakers

Call for Papers

Important Dates


Program Committee




Technical Program now available (also see: ACCV2009 Website )

Notification to authors sent out and review comments now available online.

Online Camera-ready submission website:
Camera-ready paper submission deadline: August 15, 2009.

Registration: Please follow the ACCV 2009 registration website for conference registration.

Keynote Speakers

Title: Concept based video retrieval

Speaker: Prof. Jianmin Li, Tsinghua University, Beijing

Although text based video retrieval which is widely used by commercial video search engines fulfills the needs for finding complete video to some extent, it is still difficult to search a specific video segment, e.g. find shots of one or more people with one or more horses. Due to the difference between low level features extracted by computer and semantic interpretation of human, content based methods can only be used in some special cases. To overcome the semantic gap, videos segments are labeled with predefined lexicon for further retrieval.
This talk is aimed at providing an overview of the research in the field of concept based video retrieval at Intelligent Multimedia Group, State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University. Two key problems will be presented. The first one is how to detect the semantic concept in video segments. We have proposed a framework to make use of diversity of features and classifiers and some specific methods integrated in the framework including a specific kernel function with implicit spatial constraints for histogram based features. Our systems achieved best performance in High Level Feature Extraction (HFE) task in TRECVID 2006 and TRECVID 2007. The second one is how to use the result of concept detection in automatic retrieval and interactive retrieval. We have proposed some strategies to construct appropriate concept expressions from user's query and feedback. The experiments show that retrieval performance will be improved significantly.

Jianmin Li received his B.E degree majored in computer science and technology, and Ph. D degree majored in computer application from the Department of Computer Science and Technology, Tsinghua University in 1995 and 2003 respectively. He works in Intelligent Multimedia Group (IMG), State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University. His research lies in multimedia information retrieval, including structural and semantic analysis of image and video, content based and concept based image and video retrieval, and machine learning algorithms for above applications. He is the principal investigator of several projects related to video analysis retrieval sponsored by NSFC, Intel China Research Center, etc. He was also in charge of designing and implementing video subsystem in several vertical search engines for Internet video. Besides, he and other members in IMG took part in TRECVID and achieved best performance in some tasks in the past several years. He has published more than 40 papers in this field.

Call for Papers

One of the remarkable capabilities of human visual perception system is to interpret and recognize thousands of events in videos, despite high level of video object clutters, different types of scene context, variability of motion scales, appearance changes, occlusions and object interactions. As an ultimate goal of computer vision system, the interpretation and recognition of visual events is one of the most challenging problems and has increasingly become very popular for decades. This task remains exceedingly difficult because of several reasons: 1) there still remain large ambiguities in the definition of different levels of events. 2) A computer model should be capable of capturing the meaningful structure for a specific event. At the same time, the representation (or recognition process) must be robust under challenging video conditions. 3) A computer model should be able to understand the context of video scenes to have meaningful interpretation of a video event. Despite those difficulties, in recent years, steady progress has been made towards better models for video event categorisation and recognition, e.g., from modelling events with bag of spatial temporal features to discovering event context, from detecting events using a single camera to inferring events through a distributed camera network, and from low-level event feature extraction and description to high-level semantic event classification and recognition.

The goal of this workshop is to provide a forum for recent research advances in the area of video event categorisation, tagging and retrieval. The workshop seeks original high-quality submissions from leading researchers and practitioners in academia as well as industry, dealing with theories, applications and databases of visual event recognition. Topics of interest include, but are not limited to:

  • Motion interpretation and grouping
  • Human Action representation and recognition
  • Abnormal event detection
  • Contextual event inference
  • Event recognition among a distributed camera network
  • Multimodal event recognition
  • Spatial temporal features for event categorisation
  • Hierarchical event recognition
  • Probabilistic graph models for event reasoning
  • Machine learning for event recognition
  • Global/local event descriptors
  • Metadata construction for event recognition
  • Bottom up and top down approaches for event recognition
  • Event-based video segmentation and summarization
  • Video event database gathering and annotation
  • Efficient indexing and concepts modelling for video event retrieval
  • Semantic-based video event retrieval
  • Online video event tagging
  • Evaluation methodologies for event-based systems
  • Event-based applications (security, sports, news, etc.)

Important Dates

  • Submission deadline: July 19, 2009
  • Notification of acceptance: August 10, 2009
  • Camera-ready papers: August 15, 2009
  • Workshop: September 24, 2009

Workshop Co-Chairs

  • Dr. Jianguo Zhang, Queen's University Belfast, UK
  • Dr. Ling Shao, Philips Research Laboratories, The Netherlands
  • Dr. Lei Zhang, Microsoft Research Asia, China
  • Prof. Graeme A. Jones, Kingston University, UK

Paper Submission

  • When submitting manuscripts to this workshop, the authors acknowledge that the manuscripts or papers substantially similar in content have NOT been submitted to another conference, workshop, or journal.

  • The format of the paper is the same as the ACCV main conference paper. Please follow the instructions on the website

  • For the paper submission, please follow the Submission Website (


Each submission will be reviewed by at least three reviewers from program committee members and external reviewers for originality, significance, clarity, soundness, relevance and technical contents. Accepted papers will be published together with the proceedings of ACCV 2009 in electronic format by Springer. High-quality papers will be invited to submit in an extended form to an edited book or a special issue of a top computer vision journal (e.g. CVIU) after the conference.

Program Committee (alphabetical order)

    • Rama Chellappa, University of Maryland, USA
    • Roy Davies, Royal Holloway, University of London, UK
    • James W. Davis, Ohio State University, USA
    • Ling-Yu Duan, Peking University, China
    • Tim Ellis, Kingston University, UK
    • James Ferryman, University of Reading, UK
    • GianLuca Foresti, University of Udine, Italy
    • Shaogang Gong, Queen Mary University London, UK
    • Kaiqi Hang, Chinese Academy of Sciences, China
    • Winston Hsu, National Taiwan University
    • Yu-Gang Jiang, City University of Hong Kong, China
    • Graeme A. Jones, Kingston University, UK
    • Ivan Laptev, INRIA, France
    • Jianmin Li, Tsinghua University, China
    • Xuelong Li, Birkbeck College, University of London, UK
    • Zhu Li, Hong Kong Polytechnic University, China
    • Marcin Marszalek, Unviersity of Oxford, UK
    • Tao Mei, Microsoft Research Asia
    • Paul Miller, Queen's University Belfast, UK
    • Ram Nevatia, University of Southern California, USA
    • Yanwei Pang, Tianjin University, China
    • Federico Pernici, University of Florence, Italy
    • Carlo Regazzoni, University of Genoa, Italy
    • Shin'ichi Satoh, National Institute of Informatics, Japan
    • Dan Schonfeld, University of Illinois at Chicago, USA
    • Ling Shao, Philips Research Laboratories, The Netherlands
    • Yan Song, University of Science and Technology of China
    • Peter Sturm, INRIA, France
    • Dacheng Tao, Nanyang Technological University, Singapore
    • Xin-Jing Wang, Microsoft Research Asia
    • Tao Xiang, Queen Mary University London, UK
    • Dong Xu, Nanyang Technological University, Singapore
    • Li-Qun Xu, BT exact UK
    • Hongbin Zha, Peking University, Beijing China
    • Jianguo Zhang, Queen's University Belfast, UK
    • Lei Zhang, Microsoft Research Asia

Technical Program (Date: 24 Sept. 2009, Location: Board Room)

    13:30 - 13:35

    Opening Remarks: Ling Shao, Lei Zhang


    Keynote Speech: Contept based video retrieval
    Speaker: Prof. Jianmin Li (Tsinghua University)


    Session A Chair: Ling Shao

    Interactive inquiry of indoor scene transition with awareness and automatic correction of mis-understanding
    Kazuhiro Maki, Nobutaka Shimada, Yoshiaki Shirai (Ritsumeikan University)

    Object movement event detection for household environments via layered-background model and keypoint-based tracking
    Shigeyuki Odashima, Taketoshi Mori, Masamichi Shimosaka, Hiroshi Noguchi, Tomomasa Sato (University of Tokyo)


    Session B Chair: Lei Zhang

    A dynamic texture model for fire recognition
    Zhangxian Wu, Guotian Yang, Xiangjie Liu, Pengyuan Yang, Sifei Liu (North China Electric Power University)

    Vision-based group-behavior evaluation in delivery simulation training: a view-independent approach (invited paper)
    Jungong Han, Minwei Feng, Peter De With (Eindhoven University of Technology)

    A face clustering algorithm using SIFT for video surveillance
    Gaopeng Gou, Yunhong Wang, Jiangwei Li (Beihang University)

    Projected orthogonal shape contexts for human action description and categorization
    Ruoyun Gao (Leiden University), Ling Shao (The University of Sheffield)


    Closing: Ling Shao, Lei Zhang