The dataset was generated by running a custom agent on about 1000 different games generated by Microsoft TextWorld. The agent took random vaild steps and, for every 10 steps, it would take a policy action followed by a 'look' action, an 'inventory' action and an invalid action. For some specific relatinship in the graph, Textworld generates edges with more than 2 subjects, and our scripts rewrote all these into the format (subject, relation, object).
All files were generated using generate.py, so you may want to take a look into the code and change parameters to adopt to your usage. As a reminder, generate.py needs textworld installed as prerequisite, so it may be helpful to take a look at the TextWorld repository and follow its instructions.
There are at most 200 game states in each game_walks_*.json file, and each walk-through is structured as a list of states, and each state contains four properties: description, previous action, valid action and complete knowledge graph. Below is a fabricated example:
game_walk[0][0] = {
'obs': 'This is a desciption',
'prev_action': 'exam the box',
'valid_actions': [
'open box',
'close door'
],
'complete_graph':[
['you', 'at', 'garden'],
['box', 'in', 'garden'],
['apple', 'in', 'box'],
['apple', 'is', 'edible']
]
}Here, game_walk is loaded from a single game_walks_*.json file using any json package, and game_walk[0] will index the first game walk-through and game_walk[0][0] will index the first game state of that walk-through.
Côté, M.-A., Kádár, Á., Yuan, X., Kybartas, B., Barnes, T., Fine, E., Moore, J., Tao, R. Y., Hausknecht, M., Asri, L. E., Adada, M., Tay, W., & Trischler, A. (2018). TextWorld: A Learning Environment for Text-based Games. CoRR, abs/1806.11532.
This dataset was used for this project.