China and Australia are both important academic communities in the world. In this joint session, we aim to boost the communication and connectivity between Chinese and Australian research communities in computer vision. The 5th Chinese Conference on Pattern Recognition and Computer Vision 2022 (PRCV 2022) will invite presentions from excellent papers at International Conference on Digital Image Computing: Techniques and Applications (DICTA 2022). Meanwhile, excellent PRCV 2022 papers will be recommended for presentations at DICTA 2022. The recommended papers will be decided by the PRCV & DICTA 2022 Joint Session Committee.
DICTA 2022 will be held in Sydney, New South Wales Australia from 30 Nov to 2 Dec. The conference venue is Rydges World Square, 389 Pitt Street, Sydney, NSW, 2000. The International Conference on Digital Image Computing: Techniques and Applications (DICTA) is the main Australian Conference on computer vision, image processing, pattern recognition, and related areas. DICTA was established in 1991 as the premier conference of the Australian Pattern Recognition Society (APRS). More details on DICTA 2022 can be found at http://dicta2022.dictaconference.org/ .
PRCV & DICTA 2022 Joint Session Committee
Shiqi Yu, Zhaoxiang Zhang, Pong-Chi Yuen, Junwei Han, Min Xu, Du Huynh, Wei Xiang
Stephen Gould 报告嘉宾
Professor, Australian National University
嘉宾简介：Stephen Gould is a Professor of Computer Science at the Australian National University (ANU). He is also an Australian Research Council (ARC) Future Fellow and Amazon Scholar. He is a former ARC Postdoctoral Fellow, Microsoft Faculty Fellow, Contributed Researcher at Data61, Principal Research Scientist at Amazon Inc, and Director of the ARC Centre of Excellence in Robotic Vision. Stephen received his BSc degree in mathematics and computer science and BE degree in electrical engineering from the University of Sydney in 1994 and 1996, respectively. He received his MS degree in electrical engineering from Stanford University in 1998. He then worked in industry for several years where he co-founded Sensory Networks, which later sold to Intel in 2013. In 2005 he returned to Stanford University and was awarded his PhD degree in 2010. In November 2010, he moved back to Australia to take up a faculty position at the ANU. Stephen has broad interests in the areas of computer and robotic vision, machine learning, deep learning, structured prediction, and optimization. He teaches courses on advanced machine learning, research methods in computer science, and the craft of computing. His main research focus is on automatic semantic, dynamic and geometric understanding of images and videos.
报告题目：Deep Declarative Networks with Application to Optimal Transport
报告摘要：Deep declarative networks (DDNs) are a new class of deep learning model that allows optimization problems to be embedded within end-to-end learnable pipelines. In this talk I will introduce DDNs and related concepts---implicit layers and differentiable optimization---and give some formal results for second-order differentiable problems. I will then present a concrete example of a DDN layer in the case of optimal transport, and show that by applying the DDN results we can obtain significant memory and speed improvements over unrolling Sinkhorn iterates, as would be required in traditional deep learning models Limitations of DDNs and open questions will also be discussed.
Jianfeng Weng 报告嘉宾
PhD student, University of Sydney
报告题目：Robust Knowledge Adaptation for Federated Unsupervised Person ReID
报告摘要：Person Re-identification (ReID) has been extensively studied in recent years due to the increasing demand in public security. However, collecting and dealing with sensitive personal data raises privacy concerns. Therefore, federated learning has been explored for Person ReID, which aims to share minimal sensitive data between different parties (clients). However, existing federated learning based person ReID methods generally rely on laborious and time-consuming data annotations and it is difficult to guarantee cross-domain consistency. Thus, in this work, a federated unsupervised cluster-contrastive (FedUCC) learning method is proposed for Person ReID. FedUCC introduces a three-stage modelling strategy following a coarse-to-fine manner. In detail, generic knowledge, specialized knowledge and patch knowledge are discovered using a deep neural network. This enables the sharing of mutual knowledge among clients while retaining local domain-specific knowledge based on the kinds of network layers and their parameters. Comprehensive experiments on 8 public benchmark datasets demonstrate the state-of-the-art performance of our proposed method.
Jonathon M Holder 报告嘉宾
PhD student, Griffith University
报告题目：Machine Vision Approach for Slipper Lobster Weight Estimation
报告摘要：Computer vision techniques have been successfully applied across a large number of industries for a variety of purposes. In this work we extend the capabilities of computer vision to slipper lobster weight estimation. Our proposed method combines machine learning and traditional computer vision techniques to first detect slipper lobsters and their eyes. An algorithm to determine which eyes belong to which slipper lobster and estimate the weight from the distance between the eyes is then developed. The proposed method correctly identifies 86% of lobster eye pairs and estimates weight with a mean error of 4.78g. Our weight estimation method achieves high accuracy and has the potential to be implemented within aquaculture operations in the future.
Peipei Song 报告嘉宾
PhD student, Australian National University
报告题目：Stereo Saliency Detection by Modeling Concatenation Cost Volume Feature
报告摘要：RGB-D image pair based salient object detection models aim to localize the salient objects in an RGB image with extra depth information about the scene provided to guide the detection process. The conventional practice for this task involves explicitly using depth as input to achieve multi-modal learning. In this paper, we observe two main issues within existing RGB-D saliency detection frameworks. Firstly, we claim that it is better to define depth as extra prior information instead of as a part of the input for RGB-D saliency detection, as we can directly perform saliency detection based only on the appearance information from the RGB image, while we cannot perform saliency detection given only the depth data. Secondly, there exists a huge domain gap in terms of the source of depth between different benchmark testing datasets, e.g., depth from Kinect and stereo cameras. In this paper, we focus on the variant of stereo image pair based saliency detection, where the depth is implicitly encoded in the stereo image pair for effective RGB-D saliency detection. Experimental results illustrate the effectiveness of our solution.
Akib Mashrur 报告嘉宾
PhD student, Deakin University
报告题目：Semantic multi-modal reprojection for robust visual question answering
报告摘要：Despite recent progress in the development of vision-language models in accurate visual question answering (VQA), the robustness of these models is still quite limited in the presence of out-of-distribution datasets that include unanswerable questions. In our work, we first implement a randomised VQA dataset with unanswerable questions to test the robustness of a state-of-the-art VQA model. The dataset combines visual input with randomised questions from the VQA v2 dataset to test the sensitivity of the model predictions. We establish that even on unanswerable questions that are not relevant to the visual clues, a state-of-the-art VQA model either fails to predict the "unknown" answer or gives an inaccurate answer with a high softmax score. To alleviate this issue without needing to retrain the large backbone models, we propose a technique called Cross Modal Augmentation (CMA), a multi-modal semantic augmentation during test time only. CMA reprojects the visual and textual inputs into multiple copies, while maintaining semantic information. These multiple instances, with similar semantics, are then fed to the same model and the predictions are combined to achieve a more robust output from the model. We demonstrate that using this model-agnostic technique enables the VQA model to provide more robust answers in scenarios that may include unanswerable questions.
The Program Committee of each conference will recommend 10 excellent papers as the candidates. The Joint Session Committee will then select 10 papers (5 from each conference) from these 20 candidate papers and send out invitations to their authors for presentations at the joint session.
Q: I am an author of a PRCV 2022 paper. How can I join the joint session?
A: We do not accept self-nomination. You will receive an invitation if your paper is selected by the PRCV & DICTA 2022 Joint Session Committee.
Q: If my PRCV 2022 paper is selected for the joint session, do I need to present at DICTA 2022?
A: Yes. You need to prepare two presentations. One at PRCV 2022, and the other at DICTA 2022.
Q: Is the joint session online or in-person?
A: The joint session will be of a hybrid mode, with both in-person and virtual attendance options. Given the travel impacted by COVID-19, speakers can choose to present their papers online or in-person.
Q: Will the selected papers be presented in English or Chinese?
A: The presentation and relevant materials should be prepared in English.
Q: If my PRCV 2022 paper is selected, do I need a registration at DICTA 2022?
A: Yes. But your DICTA 2022 registration fee will be waived.
Q: If my PRCV 2022 paper is selected, will it be included in the DICTA 2022 proceedings?
A: No. Your PRCV 2022 paper will not appear in the DICTA 2022 proceedings.