It’s well-known that transformers don’t have “long-term” memory in the sense that they can only use tokens up to a fixed maximum context length to make inferences/predictions. In some situations this can be a significant limitation, since long-term memory seems to be necessary to perform at a high-level on certain tasks relevant to humans (e.g. writing a legal brief, a long essay, or a novel). But while the transformer architecture itself doesn’t contain a mechanism for storing/recalling long-term memories, it may be possible to wrap certain logic around transformer-based large language models to effectively give them a sort of long-term memory. In particular, some transformer-based LLMs seem to do a good job summarizing text. If you are trying to use a transformer-based LLM to generate a long sequence of text, you could: (1) use the transformer to generate some initial segment of text, (2) use the transformer to generate a short summary of the previously-generated text, (3) ask the transformer to continue writing using the summary and potentially a portion of the previously generated text as context, and (4) iterate this process until you have a complete document/text. In theory, information from any point in the sequence of generated text can be passed forward indefinitely through the summaries that are included in the transformer input for each inference.
The code in this repository is an initial attempt to implement this idea using the OpenAI API. For a given prompt, the code will generate one response using summarization and one response not using summarization. The code will also generate "log" files associated with each response that shows the sequence of prompts submitted to the OpenAI API and the API's responses (the start of a new prompt or response is indicated with text in all caps).
There are various parameters (e.g. how long the summaries are allowed to be, how much previously-generated text should be used as context for each new generation, various parts of the prompts, etc.) that a user can adjust through command-line flags, so it should hopefully be fairly easy to use the code to experiment. You can run the program with the -h flag to see a description of all the parameters that the user can change via command line flags.
In addition to the code, the repository also contains an example of text generated using this code (with summarization) and the associated log file. I haven’t yet done extensive comparisons of the quality of text generated with and without summarization and I admittedly haven’t yet tried really stretching the capabilities of the OpenAI models (in the example I attached I limited the length of the context and output to well below the model’s maximum), but the log files from some of my initial tests (including the one in this repo) suggest the code does in fact result in key information from previous text generations being passed forward to future generations through the summaries.