Video generation consists of generating a video sequence so that an object in a source image is animated according to some external information (a conditioning label or the motion of a driving video). In this talk I will present some of our recent achievements adressing these specific aspects: 1) generating facial expressions, e.g., smiles that are different from each other (e.g., spontaneous, tense, etc.) using diversity as the driving force. 2) generating videos without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to any object of this class. To achieve this, we decouple appearance and motion information using a self-supervised formulation. To support complex motions, we use a representation consisting of a set of learned keypoints along with their local affine transformations. A generator network models occlusions arising during target motions and combines the appearance extracted from the source image and the motion derived from the driving video. Our solutions score best on diverse benchmarks and on a variety of object categories.
Nicu Sebe is a professor in the University of Trento, Italy, where he is leading the research in the areas of multimedia information retrieval and human-computer interaction in computer vision applications. He received his PhD from the University of Leiden, The Netherlands and has been in the past with the University of Amsterdam, The Netherlands and the University of Illinois at Urbana-Champaign, USA. He was involved in the organization of the major conferences and workshops addressing the computer vision and human-centered aspects of multimedia information retrieval, among which as a General Co-Chair of the IEEE Automatic Face and Gesture Recognition Conference, FG 2008, ACM International Conference on Multimedia Retrieval (ICMR) 2017 and ACM Multimedia 2013. He was a program chair of ACM Multimedia 2011 and 2007, ECCV 2016, ICCV 2017 and ICPR 2020. He is a general chair of ACM Multimedia 2022 and a program chair of ECCV 2024. He is a fellow of ELLIS, IAPR and a Senior member of ACM and IEEE.
AI approaches have the potential to enhance humanity, solve business challenges and create unprecedented opportunities. In this talk, I would like to share my perspective from both academic and industry experiences about Large Scale AI with examples to demonstrate its transformative power in real world. I will talk about Data, Tools and Applications, starting with how Data and Tools help to enable applications in research and real world scenarios. AI has achieved promising progress in diverse industries. I will then show interesting applications in industry with a particular focus on innovation within digital health.
Jia Li is the Co-founder and President of HealthUnity, a non-profit organization aiming to build an open consortium that will enable insights, knowledge and applications from behavioral data to make life healthy and enjoyable. She has served several roles including Chief AI Fellow, RWE for Sleep Health and Adjunct Professor at Stanford University in the School of Medicine. In Healthcare, She is interested in how AI could improve the outcome of health and wellbeing. She was the Founding Head of R&D at Google Cloud AI. At Google, she oversaw the development of the full stack of AI products on Google Cloud to power solutions for diverse industries with healthcare as one of the top verticals. With the passion to make more impact in the healthcare field, she later became an entrepreneur in healthcare, building and advising companies with award-winning platforms to solve today's greatest healthcare challenges. She serves as Professor-in-Residence and Mentor at StartX, advising diversity and healthcare founders/companies from Stanford/Alumni. She also serves as an advisor to the United Nations Children's Fund (UNICEF). She is a board member of the Children's Discovery Museum of San Jose. Before joining Google, she was the Head of Research at Snap, leading the AI/AR innovation effort. And before that, she was the Group Lead of the Visual Computing and Learning group at Yahoo! Labs. She received her Ph.D. degree from the Computer Science Department at Stanford University.
She was the leader of the OPTIMOL team, which won the first prize in the Semantic Robotics Vision Challenge sponsored by NSF and AAAI in 2007. She is part of the ImageNet team, which received the 2019 Longuet-Higgins Prize. She and her collaborators from MIT jointly won the first place of the 2019 IEEE Low Power Image Recognition Challenge in both the image classification and object detection tracks. She was a Distinguished Speaker at the IEEE AI Symposium in 2017. She was selected as a Young Global Leader by the World Economic Forum in 2018. She served as the Program Chair and Area Chair of several impactful AI conferences. She served on The Computer Vision Foundation Industrial Advisory Board from 2016-2017. She serves as Associate Editor of IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) and Transactions on Circuits and Systems for Video Technology. Her work has been cited over 50,000 times and reported in the media including: Forbes, TechCrunch, CNBC, New Scientist, MIT Technology Review and more in recent years.
Vision Transformer (ViT) and its variants have shown great promise on computer vision tasks. The ability to capture local and global visual dependencies through self-attention is the key to its success. But it also casts practical challenges due to quadratic computational overhead, especially for the high-resolution vision tasks (e.g., object detection). Recent works have attempted to reduce the cost and improve model performance by applying either coarse-grained global attention or fine-grained local attention. However, these approaches cripple the modeling power of self-attention in multi-layer Transformers, leading to sub-optimal solutions.
In this talk, we introduce focal attention, a new mechanism where each visual token attends its closest surrounding tokens at a fine granularity and the tokens far away at a coarse granularity, and thus can capture both short- and long-range visual dependencies efficiently and effectively. With focal attention, we have developed a set of neural image encoders which outperform the state-of-the-art ViTs (e.g., Swin Transformers) with similar time and memory cost on the tasks of image classification, object detection, and semantic segmentation. The results render focal attention a favorable alternative to self-attention for effective and efficient visual modeling in real-world applications.
Jianfeng Gao is a Distinguished Scientist and Vice President of Microsoft. He is the head of the Deep Learning group at Microsoft Research, leading the development of AI systems for natural language processing, Web search, vision language understanding, dialogue, and business applications. He is an affiliate professor of Computer Science & Engineering at University of Washington, an IEEE fellow, and a Distinguished Member of ACM.
Dr. Ted Chang is Chief Technology Officer (CTO), Vice President and General Manager of Quanta Computer, known as the world’s biggest computer ODM and laptop computer maker. Along with his role as CTO, he oversees corporate technology strategy and global research partnership. Dr. Chang takes lead of Quanta Research Institute (QRI) for advanced technology research and BU12, a business unit dedicated to AIoT solutions for e-Heath, Smart Medicine and Smart Agriculture.
Appointed by the President of Taiwan (ROC), Dr. Chang has served as the representative to APEC Business Advisory Council (ABAC) since 2019. He serves as Co-Chair of Digital Working Group in ABAC 2022 and as Co-Convenor of Supporting Emergent Technology Taskforce with focus on AI and digital health for ABAC 2021.
Academic wise, Dr. Chang holds various guest professorships in EECS colleges of National Taiwan University (NTU), National Cheng-Kung University (NCKU), Asia University (AU) and AI college of National YangMing Chao-Tung University (NYCU).
Dr. Chang is board director of Epoch Foundation, Spring Foundation, Quanta Culture and Education Foundation (QCEF), Chines Medical Advancement Foundation and Ming Dao Culture and Education Foundation. He is member of many advisory and project committees of Ministry of Economic Affairs (MoEA), Ministry of Science and Technology(MoST), Ministry of Health and Welfare (MoHW) as well as major universities in Taiwan on innovation, entrepreneurship and advanced technology.
Among many awards and honors, Dr. Chang recently received IF Design Award on Smart Agriculture and two REDDOT Design Awards on Smart Medicine in 2021. He received distinguished alumni awards from both National Cheng-Kung University and National Chia-Yi High School in 2021. In 2020, He led Quanta-NCTU Joint AI Center to win CES Innovation Award and WITSA (World Innovation, Technology, Service Alliance)－PPP Silver Award. He was invited as the chief advisor to Taiwan Pavilion "Swingphony" at London Design Biennale 2020.
Dr. Chang has over 200 patents granted globally by January 2022. One of his most important inventions is "A Network Object Delivery System for Personal Computing Device", in which the "Application Module Store" was introduced and well defined. The invention was filed back in 2001 with US patent granted in 2010, several years ahead of the modern smartphone, Apple App Store and Google Play were introduced to the market.
Dr. Chang joined Quanta in 2000, promoted as VP in 2009 and further promoted as the first CTO in Quanta history in 2010. Starting in 2004, Dr. Chang has initiated and served as the program director of the T-Party Project, a 10-year 45Mil. strategic research collaboration project with Computer Science and Artificial Lab (CSAIL) of Massachusetts Institute of Technology (MIT) on future computing and communication, (https://www.csail.mit.edu/Quanta.html). In 2019, Quanta kicked off another 5 years collaboration with MIT CSAIL with focus on computational health and AI Medicine. Dr. Chang has been a visiting scientist for MIT CSAIL for over ten years. Dr. Chang's past projects, One Laptop Per Child (OLPC) and the QRI research model, were published as business cases by Harvard Business School.
Dr. Chang holds B.S. (88), M.S. (92) and Ph.D. (96) degrees, all from the Institute of Aeronautics and Astronautics, National Cheng-Kung University. Dr. Chang’s research interests now focuses on human centric innovation for smarter lifestyle through AIoT (Artificial Intelligence of Things) that integrate IoT, cloud computing, big data analytics and machine learning. He is dedicated to create a sustainable model so that great ideas in research lab can turn into great social impacts through product and business innovation.