View this email in your browser

WELCOME TO KGC 2021!

We are only a month away from the Knowledge Graph Conference '21! As we prepare for the conference, one thing has become clear - this is our biggest year yet! With speakers from a variety of industries and top tier companies gathering to share their wisdom and experience with knowledge graphs, you're bound to find something that will spark your interest!

Join us on May 3 - May 6 at KGC 2021 to learn more about knowledge graphs, graph databases, graph AI, and semantic tech!

Early Bird Tickets On Sale Now!

BUY YOUR TICKET

Introducing: Speakers of the Week!

We have an amazing lineup prepared for KGC 2021 and will be introducing new speakers every week!

Mike Welch from Verizon
Freddy Lecue from CortAIx
Bernhard Krabina from KDZ
Antonin Delpeuch from University of Oxford
Mark Grover  from Stemma.Ai, Amundsen
Julian Grümmer   from Friedrich-Alexander-Universität Erlangen-Nürnberg
Chen Zhang & Dmytro Dolgopolov from Finra
Alena Vasilevich from Coreon GmbH
Johannes Keizer  from Lore Star SRL
Ben De Meester from IDLab - imec
Duane Forrester from Yext

Talk Previews

Mike Welch from Verizon

Serving a web scale Knowledge Graph

The Yahoo Knowledge Graph powers entity data for user experiences across multiple products at Verizon Media, from search to media to ads. A "knowledge panel" for a public company on a web search results page might include basic data like the company's founders and number of employees, combined with a realtime stock quote and relevant images. Legacy serving systems typically pre-collected and exposed a static view of an entity, which limited the ability for clients to traverse the graph or adapt the user experience for different contexts without offline preprocessing. They also required clients to implement a more complex series of requests to fetch partial data, extract dependencies, and query additional services to stitch together their full experience. In this talk we will take you through our experience moving away from serving these static subgraph views with independent services and client side processing. We will describe how we addressed the shortcomings of such a system and built a federated, realtime, web-scale graph querying framework with GraphQL on top of Amazon Neptune, Vespa, and third party APIs. Our unified graph approach has enabled customer teams to experiment, rapidly iterate over their designs, and easily power rich experiences by bringing together all the relevant data under a single request.

Freddy Lecue from CortAIx

On the role of knowledge graphs in explainable machine learning

Machine Learning (ML), as one of the key drivers of Artificial Intelligence, has demonstrated disruptive results in numerous industries. However, one of the most fundamental problems of applying ML, and particularly Artificial Neural Network models, in critical systems is its inability to provide a rationale for their decisions. For instance, a ML system recognizes an object to be a warfare mine through comparison with its similar observations. No human-transposable rationale is given, mainly because common sense knowledge or reasoning is out-of-scope of ML systems. We present how knowledge graphs could be applied to expose more human-understandable machine learning decisions, and present an asset, combining ML and knowledge graphs to expose a human-like explanation when recognizing an object of any class in a knowledge graph of 4,233,000 resources.

Bernhard Krabina from KDZ

Semantic MediaWiki as Knowledge Graph Interface

Semantic MediaWiki (SMW), which was introduced as early as in 2006, has since gone on to establish a vital community and is currently one of the few semantic wiki solutions still in existence. There are many reasons why SMW should not be overlooked by the knowledge graph community:
(list shortened for email)

SMW is capable of directly connecting to several triple stores (Blazegraph, Virtuoso, Jena), which is why it can be considered an interface for entering data into knowledge graphs.
SMW can use its internal relational database (or ElasticSearch), enabling users to build simple knowledge graphs without in-depth knowledge about triple stores.
SMW has the capability to reuse existing ontologies by importing vocabularies and providing unique identifiers.
SMW has low barriers to implementation as it is a clean extension to MediaWiki, which is PHP software running on regular web hosts.

In the talk, I will give an overview of the mentioned aspects and highlight some main differences to Wikibase – which is an alternative approach for managing structured data in MediaWiki – as well as the current limitations of SMW.

Antonin Delpeuch from University of Oxford

Scaling and maintaining OpenRefine

OpenRefine is a data wrangling tool that celebrated its 10th birthday this year. Cleaning and importing data in knowledge graphs is its core use case since it was originally designed to help populate Freebase. In this talk, I want to give a broad overview of the latest developments in the tool and our efforts to consolidate it as a mature open source project. Please come along and tell us where you would like to see the tool in a few years!

Mark Grover from Stemma.Ai, Amundsen
From discovering to trusting data

Over a third of analyst time is spent in understanding what data exists, can it be trusted and how to use it. Countless Data Engineering time is spent in answering the same questions about data - what does that column mean, how does it get populated, how often does it update and if there’s any incident going on? The answer thus far to such questions has been curation.

At Lyft, we have made our analysts and data scientists over 20% more productive by making it easier to discover data. Recently, we open sourced Amundsen and it’s now being used by ING, Square, Workday and many more. This talk gives a quick overview of Amundsen and then goes into detail on how we have tried both automated and curated metadata to showcase what’s trusted and not in Amundsen. It will dive deep into linking the Airflow DAG which produced the data (task level lineage), linking what and how many dashboards are built from a given data set (table level lineage), as well as SLAs and historical landing times to give users signal into what’s trusted. The talk will end with an insight into current challenges and how we may solve them in the future.

Julian Grümmer from Friedrich-Alexander-Universität Erlangen-Nürnberg

What can we learn from knowledge graphs? A Wirecard perspective

The Wirecard scandal was one of the most shocking economic events in Germany in 2020. The former DAX30 company collapsed, owing creditors more than €3.5 billion (almost $4 billion) after disclosing a gaping hole in its books that its auditor EY said was the result of a sophisticated global fraud. But were there any signs that something went wrong? In our study, we gathered all relevant information from management and supervisory board members from the 30 largest companies (DAX30) in Germany for the fiscal year 2019. Relevant information includes place of birth, date of birth, education and work experience. Our knowledge graph contains 745 people, 1.203 companies and organizations, 5.116 roles and 1.128 degrees or educational programs. All this information helps us to understand what was different at Wirecard and what may be the reasons why Wirecard failed. Using a knowledge graph enables us to automatically detect and visualize all kinds of ties between supervisory and management board members, whereas detecting them manually requires more effort and is more error prone. The collection of the data reveals that there was little to no information available for most of the supervisory and management board members of Wirecard, which is uncommon for a DAX30 firm. In addition to that, it is obvious that the management board has far less work experience than managers from comparable companies. When looking at the experience of Wirecard’s supervisory board members, it becomes clear they were little or not all familiar with board activities. Furthermore, our graph shows that supervisory board members from Wirecard are less connected to other members of supervisory boards compared to other DAX30 companies.

Chen Zhang & Dmytro Dolgopolov from Finra
Entity Disambiguation with Knowledge Graph

During the presentation, we will share our experience in building a knowledge graph leveraging Spark, NLP, and Machine Learning. We will start with explaining the business problems and challenges then walk through our data pipeline, including text analytics processes, name similarity solutions, street address normalization, clustering algorithms, confidence level building, etc. To finish, we will discuss the business impact and the takeaways.

Alena Vasilevich from Coreon GmbH
Benefits of Collaboration AI vs Manual Creation of the Graph: Taxonomization of IATE, the EU Terminology

In the realm of data-driven businesses, structured data, being highly organized and easily understood by machines, is a valuable resource. Coreon team elevated a sub-domain of IATE terminology into a multilingual knowledge graph. We taxonomized a flat list of 425 concepts within the COVID sub-domain, benchmarking two approaches to tackle this task: automatically through a custom-enhanced off-the-shelf language model and a manual creation of the knowledge graph by a linguist expert. The automatically created knowledge graph was later revised by a human, corrections and time effort measured and compared with performance metrics of the manual approach.

In this talk, we will dwell on the performance and resource-saving advantages of our custom method and show how the achieved productivity rate can make the taxonomization of even large terminology databases economically viable. We demonstrate empirically the effectiveness of our collaborative-robot approach in a typical industry use case scenario: using the resulting IATE/Covid graph for initialization of a Convolutional Neural Network (CNN) in a multilingual document classification task, we get a classification granularity that is not reachable by state-of-the-art models, such as non-initialized CNNs and zero-shot classifiers.

Johannes Keizer from Lore Star SRL
"VocBench", a semantic web collaborative development platform for ontologies, thesauri, and lexicons

This presentation will feature and demonstrate "VocBench", an semantic web collaborative development platform for ontologies, thesauri and lexicons. VocBench has initially been developed for maintenance of the thesaurus "Agrovoc". It has become then a generic tool for thesaurus management and now features dedicated support for editing OWL ontologies, SKOS(/XL) thesauri, Ontolex-lemon lexicons, EDOAL linksets and generic RDF datasets. Advanced editing capabilities, high scalability (the platform integrates with high performant triple stores such as RDF4J and GraphDB) and editing and publication workflow management all contribute to a full-fledged collaborative online environment. The tool is opensource, but professional resources are available to offer services or develop necessary extensions.

Ben De Meester from ID Lab - imec
PROV4ITDaTa: Flexible KG generation within reach (tool presentation)

Personal Knowledge Graph generation is no longer a cumbersome technical endeavor. PROV4ITDaTa is an MIT open-source platform to provide a smooth user experience for generating knowledge graphs from your online web services, such as Google, Flickr, and Imgur, into your personal data space. This brings your personal data back under your control, and as a graph, its true interlinking potential is unleashed.

PROV4ITDaTa allows to configure and set up a web application where users can easily pick one or more web services to extract their data from, transform that data into best-practice knowledge graphs, and push those graphs to a personal data space, such as a Solid pod. All heavy lifting is included in PROV4ITData: management of service authentication (e.g., OAuth 1.0/2.0 sessions), setting up the infrastructure to extract and transform your personal data from popular web services, directly loading those graphs into your personal data space, and generating a simple user interface.

With a click of a button, users can try out knowledge graph applications using their actual data from music streaming systems, fitness apps, address books, social media, etc. Continuing this product, we are building a data processing workbench, where these different data processing pipeline configurations can be managed, scheduled, and orchestrated, giving companies more control, and allowing to upscale PROV4ITData more easily.

Duane Forrester from Yext
Your Future in Search is Built on Knowledge Graphs

The future of search is already known, and it's built on Knowledge Graphs. From a business' POV, we'll explore the focus of search engines today, why KGs are so integral and examine the benefits of building your own site around a knowledge graph. From increases in customer retention, conversion and satisfaction, we'll see how this approach works. If you want to improve your user experience, there's an impact. If you want to decrease support costs and build consumer confidence - and impact reviews - we'll see how a KG-based approach works. There is a lot of work to do, but in today's world of search, the systems are complex and deep...and simple models of managing websites, content and search from years passed simply won't keep up.

Introducing: Sponsors of the Week!

Ontotext

Ontotext is a global leader in enterprise knowledge graph technology and semantic database engines. Ontotext employs big knowledge graphs to enable unified data access and cognitive analytics via text mining and integration of data across multiple sources. Ontotext main products are GraphDBтм engine and Ontotext Platform. They power business critical systems in the biggest banks, media, market intelligence agencies, car and aerospace manufacturers. Ontotext technology and solutions are spread wide across the value chain of the most knowledge intensive enterprises in financial services, publishing, healthcare, pharma, manufacturing and public sectors. Leveraging AI and cognitive technologies, Ontotext helps enterprises get a competitive advantage, by connecting the dots of their proprietary knowledge and putting it in the context of global intelligence.

Fluree

Fluree is an Open-Source Semantic Graph Database that guarantees data integrity, provides traceability into data provenance, facilitates secure data collaboration, and powers connected data insights. Fluree organizes cryptographically-secured data in a temporal immutable ledger. Fluree exposes the ledger data in industry-standard RDF as a semantic graph database. Fluree can run on any machine, and its “edge” query servers can be packaged and scaled to meet any level of enterprise demand. Fluree can even run as an in-memory database alongside your code, thanks to our Javascript library and React Frontend capabilities.

GET YOUR TICKETS TODAY!

BUY YOUR TICKET

JOIN THE COMMUNITY
Let us know what you're working on in Slack.