A research project exploring the theoretical mathematics behind Long Short-Term Memory (LSTM) networks, with the goal of understanding their inner workings from first principles and investigating potential improvements to the standard architecture.
Note: A detailed write-up is available in
README.pdfin the root of this repository.
Standard deep learning frameworks abstract away the internals of LSTM cells, making it difficult to study or experiment with the underlying math. This project implements an LSTM from scratch — without relying on high-level libraries like Keras or PyTorch's built-in LSTM layers — in order to:
- Deeply understand the gate equations (input, forget, output, and cell state)
- Verify correctness against established implementations
- Explore modifications to the standard LSTM formulation and measure their impact
Custom-LSTM/
├── Code/ # Jupyter notebooks implementing the custom LSTM
├── Documents/ # Supporting documents and write-ups
├── Images/ # Figures, diagrams, and result visualizations
├── Literature/ # Reference papers and academic sources
├── Proposal/ # Original project proposal
└── README.pdf # Detailed project report
An LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) designed to learn long-term dependencies in sequential data. Standard RNNs struggle with the vanishing gradient problem — gradients shrink to near zero during backpropagation through many time steps, preventing the network from learning long-range patterns.
LSTMs solve this with a cell state — a memory that runs through the sequence — regulated by three learned gates:
| Gate | Purpose |
|---|---|
| Forget gate | Decides what information to discard from the cell state |
| Input gate | Decides what new information to write to the cell state |
| Output gate | Decides what part of the cell state to output as the hidden state |
The core equations at each time step t are:
f_t = σ(W_f · [h_{t-1}, x_t] + b_f) # Forget gate
i_t = σ(W_i · [h_{t-1}, x_t] + b_i) # Input gate
C̃_t = tanh(W_C · [h_{t-1}, x_t] + b_C) # Candidate cell state
C_t = f_t * C_{t-1} + i_t * C̃_t # Updated cell state
o_t = σ(W_o · [h_{t-1}, x_t] + b_o) # Output gate
h_t = o_t * tanh(C_t) # Hidden state output
- Python 3.9+
- Jupyter Notebook or JupyterLab
1. Clone the repository
git clone https://github.com/Preston-Robertson/Custom-LSTM.git
cd Custom-LSTM2. Install dependencies
pip install -r requirements.txt(If no requirements.txt is present, the core dependencies are numpy, matplotlib, and jupyter.)
3. Launch Jupyter
jupyter notebookThen open the notebooks in the Code/ folder.
Jupyter notebooks containing the custom LSTM implementation, experiments, and comparisons. The notebooks walk through:
- Forward pass implementation
- Backpropagation through time (BPTT)
- Training and evaluation on test sequences
- Comparisons with standard LSTM behavior
Supporting write-ups and analysis documents related to the project.
Visualizations generated during experimentation, including training curves, gate activation plots, and architecture diagrams.
Academic papers and references that informed the project, including foundational LSTM literature.
The original project proposal outlining the research goals and methodology.
- Hochreiter, S. & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation.
- Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to Forget: Continual Prediction with LSTM.
- Olah, C. (2015). Understanding LSTM Networks. colah.github.io