Document Type
Conference Proceeding
Publication Date
4-30-2023
Journal / Book Title
ACM Web Conference 2023 Proceedings of the World Wide Web Conference Www 2023
Abstract
Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents Candle, an end-to-end methodology for extracting high-quality cultural commonsense knowledge (CCSK) at scale. Candle extracts CCSK assertions from a huge web corpus and organizes them into coherent clusters, for 3 domains of subjects (geography, religion, occupation) and several cultural facets (food, drinks, clothing, traditions, rituals, behaviors). Candle includes judicious techniques for classification-based filtering and scoring of interestingness. Experimental evaluations show the superiority of the Candle CCSK collection over prior works, and an extrinsic use case demonstrates the benefits of CCSK for the GPT-3 language model. Code and data can be accessed at https://candle.mpi-inf.mpg.de/.
DOI
10.1145/3543507.3583535
Journal ISSN / Book ISBN
85159356003 (Scopus)
Montclair State University Digital Commons Citation
Nguyen, Tuan Phong; Razniewski, Simon; Varde, Aparna S.; and Weikum, Gerhard, "Extracting Cultural Commonsense Knowledge at Scale" (2023). School of Computing Faculty Scholarship and Creative Works. 49.
https://digitalcommons.montclair.edu/computing-facpubs/49
Published Citation
Tuan-Phong Nguyen, Simon Razniewski, Aparna Varde, and Gerhard Weikum. 2023. Extracting Cultural Commonsense Knowledge at Scale. In Proceedings of the ACM Web Conference 2023 (WWW '23). Association for Computing Machinery, New York, NY, USA, 1907–1917. https://doi.org/10.1145/3543507.3583535
Comments
This work is licensed under a Creative Commons Attribution International 4.0 License