Next-Token Prediction

Create a language model based on a body of text and get high-quality predictions (next word, next phrase, next pixel, etc.).

Install

npm i next-token-prediction

Usage

Simple (from a built-in data bootstrap)

Put this /training/ directory in the root of your project.

Now you just need to create your app's index.js file and run it. Your model will start training on the .txt files located in /training/documents/. After training is complete it will run these 4 queries:

const { Language: LM } = require('next-token-prediction');

const MyLanguageModel = async () => {
  const agent = await LM({
    bootstrap: true
  });

  // Predict the next word

  agent.getTokenPrediction('what');

  // Predict the next 5 words

  agent.getTokenSequencePrediction('what is', 5);

  // Complete the phrase

  agent.complete('hopefully');

  // Get a top k sample of completion predictions

  agent.getCompletions('The sun');
};

MyLanguageModel();

Advanced (provide `trainingData` or create it from .txt files)

Put this /training/ directory in the root of your project.

Because training data was committed to this repo, you can optionally skip training, and just use the bootstrapped training data, like this:

const { dirname } = require('path');
const __root = dirname(require.main.filename);

const { Language: LM } = require('next-token-prediction');
const OpenSourceBooksDataset = require(`${__root}/training/datasets/OpenSourceBooks`);

const MyLanguageModel = async () => {
  const agent = await LM({
    dataset: OpenSourceBooksDataset
  });

  // Complete the phrase

  agent.complete('hopefully');
};

MyLanguageModel();

Or, train on your own provided text files:

const { dirname } = require('path');
const __root = dirname(require.main.filename);

const { Language: LM } = require('next-token-prediction');

const MyLanguageModel = () => {
  // The following .txt files should exist in a `/training/documents/`
  // directory in the root of your project

  const agent = await LM({
    files: [
      'marie-antoinette',
      'pride-and-prejudice',
      'to-kill-a-mockingbird',
      'basic-algebra',
      'a-history-of-war',
      'introduction-to-c-programming'
    ]
  });

  // Complete the phrase

  agent.complete('hopefully');
};

MyLanguageModel();

Run tests

npm test

Examples

Readline Completion

UI Autocomplete

Videos

readline-completion.mp4

readline-completion-verbose.mp4

Browser example: Fast text autocomplete

With more training data you can get more suggestions, eventually hitting a tipping point where it can complete anything.

autocomplete.mp4

Browser example: Fast generative fill

Coming soon!

Goals

Provide a high-quality text prediction library for:

autocomplete
autocorrect
spell checking
search/lookup
summarizing
paraphrasing

Create image and audio models for other prediction formats
Simplify AI/ML methodologies

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
examples		examples
models		models
training		training
.gitignore		.gitignore
.nvmrc		.nvmrc
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
test.js		test.js
utils.js		utils.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Next-Token Prediction

Install

Usage

Simple (from a built-in data bootstrap)

Advanced (provide `trainingData` or create it from .txt files)

Run tests

Examples

Videos

Browser example: Fast text autocomplete

Browser example: Fast generative fill

Goals

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Next-Token Prediction

Install

Usage

Simple (from a built-in data bootstrap)

Advanced (provide trainingData or create it from .txt files)

Run tests

Examples

Videos

Browser example: Fast text autocomplete

Browser example: Fast generative fill

Goals

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Advanced (provide `trainingData` or create it from .txt files)

Packages