July 21, 2020
WordCraft: A Reinforcement Learning environment for enabling common-sense based agents
A sample episode of WordCraft: The agent needs to create the goal entity (cyborg) from a set of starting entities.
The human ability to solve a wide variety of real-world problems usually includes a common-sense understanding of the world around us. While advanced Reinforcement Learning algorithms have been developed, it remains an open challenge to better extract such common-sense information from natural language corpora and combine it with agents in RL. Researchers at University College London and the University of Oxford claim that is in part due to a lack of lightweight simulation environments that accurately represent real-world semantics and include established sources of information with respect to RL observations.

In their paper which was accepted to the International Conference on Machine Learning (ICML 2020), they propose a new benchmarking tool to enable research on agents making use of common sense knowledge. The framework they are calling "WordCraft", is an RL environment based on the game Little Alchemy 2 (a game that tasks players with mixing ingredients to create new items). Similar to its inspiration, WordCraft measures the reasoning capabilities of RL agents by providing them more than 700 different entities (ingredients), and letting them combine previously discovered entities like “water” and “earth” to create “mud.”

The environment starts off with a set of four basic items, the agent is tasked with generating as many different items as possible. Each non-starter item can be created by combining two other items. For example, combining “moon” and “butterfly” yields “moth”, and combining “human” and “medusa” yields “statue”. There are 3,417 valid item combinations in WordCraft, and an agent must use information about concept relationships to solve the game efficiently without trying out all the other possible combinations. Every task is generated by random sampling of a target entity, true constituent entities, and distracting entities, and the complexity of the task can be changed by increasing the number of distracting entities or raising the number of intermediate entities that must be formed.

Along with the WordCraft benchmarking environment, the researchers also present an agent architecture that makes use of information from an external knowledge-graph to guide the agent’s policy. This agent architecture is that of an actor-critic network, based on the concepts of self-attention and an external knowledge-graph link prediction model. To illustrate how well the tasks represent real-world relationships between entities, when this actor-critic agent model was evaluated they used GloVe embedding representations of the entities to capture real semantic information about them. They conclude that this agent architecture with a full knowledge-graph and GloVe embeddings can perform at par with a human at the same task with up to 8 distracting entities.

