Skip to content

MikeMiaris/NN_WIth_Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Problem Description

Dataset Generation (ΣΔΤ)

The assignment required creating a synthetic dataset of 8000 points inside the square [-1, 1] × [-1, 1], split into:

  • Training set: 4000 points
  • Test set: 4000 points

Each point (x1, x2) is assigned to one of four categories (C1–C4) according to the geometric rules detailed below.

Classification Rules

Condition Label
(x1 – 0.5)² + (x2 – 0.5)² < 0.2 and x2 > 0.5 C1
(x1 – 0.5)² + (x2 – 0.5)² < 0.2 and x2 < 0.5 C2
(x1 + 0.5)² + (x2 + 0.5)² < 0.2 and x2 > -0.5 C1
(x1 + 0.5)² + (x2 + 0.5)² < 0.2 and x2 < -0.5 C2
(x1 – 0.5)² + (x2 + 0.5)² < 0.2 and x2 > -0.5 C1
(x1 – 0.5)² + (x2 + 0.5)² < 0.2 and x2 < -0.5 C2
(x1 + 0.5)² + (x2 – 0.5)² < 0.2 and x2 > 0.5 C1
(x1 + 0.5)² + (x2 – 0.5)² < 0.2 and x2 < 0.5 C2
None of the above and x1 * x2 > 0 C3
None of the above and x1 * x2 < 0 C4

This creates a challenging distribution: C1 and C2 form small circular clusters around (0.5,0.5), (±0.5,±0.5), while C3 and C4 fill regions in the first/third and second/fourth quadrants, respectively.


Architecture & Design

Network Topology (PT2 & PT3)

Two MLP variants were implemented as required:

Network Hidden Layers Description
PT2 2 hidden Two hidden layers + output layer
PT3 3 hidden Three hidden layers + output layer

Layer dimensions (configurable):

  • Input: d = 2 (features x1, x2)
  • Hidden 1: H1 neurons
  • Hidden 2: H2 neurons
  • Hidden 3: H3 neurons (only for PT3)
  • Output: K = 4 neurons (one per category)

Activation Functions

Layer Options Recommended
Hidden layers tanh or ReLU tanh (smoother gradients)
Output layer softmax (implicit, via cross‑entropy loss) Required for multi‑class

Why softmax? The output layer uses softmax to convert raw logits into a probability distribution over the four categories, enabling proper multi‑class classification.

Loss Function & Output Layer (Softmax)

The assignment called for multi‑class classification, so the network uses a softmax activation at the output layer combined with cross‑entropy loss. During backpropagation, the delta for each output neuron is simply:

This elegant formulation arises from the derivative of the cross‑entropy loss with respect to the softmax inputs, which makes the implementation both correct and efficient.

Backpropagation & Gradient Descent

The Network class implements full backpropagation:

  1. Output layer error: For each output neuron, compute delta = output - target and update the bias.

  2. Hidden layer error propagation: Starting from the last hidden layer and moving backward, deltas are computed using the chain rule:

    • The delta of each neuron in layer L is the weighted sum of the deltas of the next layer, multiplied by the derivative of the activation function.
    • This implements the standard backpropagation equations for multi‑layer networks (Rumelhart et al., 1986).
  3. Weight update: After accumulating deltas across a mini‑batch, weights and biases are updated using the learningRate and the average gradient.

Batch Training (Mini‑batch)

The training loop processes N training examples in mini‑batches of size B (where B divides N):

  • When B = 1, gradients are updated after every single example (stochastic gradient descent – often noisier but faster).
  • When B = N, gradients are accumulated over the entire training set (batch gradient descent – stable but computationally heavy).
  • The actual code uses a configurable batch size.

The batch size is defined as N/B where B is set at the beginning of the program. The assignment explicitly requested testing with B = N/20 and B = N/200 to compare convergence speed and generalization.

Weight Initialization

All weights and biases are initialized randomly in the range (-1, 1) at the start of training, as specified by the assignment. This ensures symmetry is broken and the network can learn properly.


Implementation Details

Key Components

Class Responsibility
Neuron.java Stores weights, bias, delta (gradient), and output; applies activation functions.
Layer.java Groups multiple neurons; manages forward/backward operations for one layer.
Network.java Coordinates layers; performs forward pass, softmax, backward pass, and weight updates.
DataGenerator.java Generates synthetic training and test datasets as CSV files.
DataLoader.java Reads CSV files and returns feature arrays and one‑hot encoded label arrays.
NeuralNetworkMain.java Entry point: loads data, initializes the network, runs training, and reports results.

Training & Evaluation Loop

  1. Data Loading: Load classification_train.csv and classification_test.csv.
  2. Network Initialization: Create either a PT2 or PT3 Network with specified hyperparameters.
  3. Training:
    • For a maximum number of epochs (default 1000), process mini‑batches.
    • After each epoch, calculate and print the total training error.
    • Early stopping: Terminate when the absolute difference in training error between two consecutive epochs falls below a threshold (e.g., 0.001), but only after at least 800 epochs have completed (as required by the assignment).
  4. Evaluation:
    • Compute and print the generalization accuracy (% correct predictions) on the test set.
    • Identify the best network configuration based on test accuracy.

Experimental Setup & Hyperparameters

The assignment required systematic experimentation with:

  • Hidden layer sizes (H1, H2, H3): multiple combinations tested.
  • Activation functions: tanh vs ReLU in the deeper hidden layers.
  • Batch sizes: B = N/20 and B = N/200.
  • Learning rate: default 0.001.
  • Threshold: 0.001 (for early stopping criterion).

The results were recorded in a table documented in the assignment report (PDF). The best network was then used to visualize test set predictions (correct vs misclassified) using distinct markers.


How to Run

Prerequisites

  • Java Development Kit (JDK 8 or later)
  • Git

Steps

  1. Clone the repository:

    git clone https://github.com/MikeMiaris/NN_WIth_Java.git
    cd NN_WIth_Java/NN_project
    
  2. Compile the source files: javac DataGenerator.java DataLoader.java Layer.java Network.java NeuralNetworkMain.java

  3. Generate the datasets: java DataGenerator

  4. Run the neural network: java NeuralNetworkMain

Note: Compiled .class files are also present for quick execution.

Note

The current main class loads a PT3 network with:

  • Hidden layers: H1=3, H2=3, H3=3
  • Hidden activation: ReLU
  • Output activation: softmax
  • Learning rate: 0.001
  • Batch size: 20 (i.e., N/20)

You can modify these values directly in NeuralNetworkMain.java.

Results & Observations

The network successfully learns to classify the four categories with high accuracy. The full experimental tables are included in the assignment report (YN_project_2024-25.pdf, in Greek).

Key observations:

  • PT3 (3 hidden layers) generally achieves higher accuracy than PT2 (2 hidden layers) for this dataset, owing to its increased representational capacity.
  • tanh tended to provide smoother convergence than ReLU for deeper networks, though ReLU avoided vanishing gradient issues when networks were deep.
  • Smaller batch sizes (B=N/200) introduced more noise but often led to faster convergence in terms of epoch count.
  • The classification accuracy approaches ~97‑99% on the test set with well‑tuned hyperparameters.

For details on the second assignment (K‑means clustering) and the full report, please refer to the course materials.

Future Work

  • Implement momentum and learning rate decay to improve convergence.
  • Add support for dropout regularization.
  • Extend the network to handle regression problems.
  • Provide a command‑line interface or configuration file to set hyperparameters.

License

This project is for educational/academic purposes. Feel free to use and modify it. Original assignment credit: University of Ioannina, Department of Computer Science & Engineering, 2024‑2025.

About

This is a Neural Network Implementation in Java

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages