GitHub - MikeMiaris/NN_WIth_Java: This is a Neural Network Implementation in Java

Problem Description

Dataset Generation (ΣΔΤ)

The assignment required creating a synthetic dataset of 8000 points inside the square [-1, 1] × [-1, 1], split into:

Training set: 4000 points
Test set: 4000 points

Each point (x1, x2) is assigned to one of four categories (C1–C4) according to the geometric rules detailed below.

Classification Rules

Condition	Label
`(x1 – 0.5)² + (x2 – 0.5)² < 0.2` and `x2 > 0.5`	C1
`(x1 – 0.5)² + (x2 – 0.5)² < 0.2` and `x2 < 0.5`	C2
`(x1 + 0.5)² + (x2 + 0.5)² < 0.2` and `x2 > -0.5`	C1
`(x1 + 0.5)² + (x2 + 0.5)² < 0.2` and `x2 < -0.5`	C2
`(x1 – 0.5)² + (x2 + 0.5)² < 0.2` and `x2 > -0.5`	C1
`(x1 – 0.5)² + (x2 + 0.5)² < 0.2` and `x2 < -0.5`	C2
`(x1 + 0.5)² + (x2 – 0.5)² < 0.2` and `x2 > 0.5`	C1
`(x1 + 0.5)² + (x2 – 0.5)² < 0.2` and `x2 < 0.5`	C2
None of the above and `x1 * x2 > 0`	C3
None of the above and `x1 * x2 < 0`	C4

This creates a challenging distribution: C1 and C2 form small circular clusters around (0.5,0.5), (±0.5,±0.5), while C3 and C4 fill regions in the first/third and second/fourth quadrants, respectively.

Architecture & Design

Network Topology (PT2 & PT3)

Two MLP variants were implemented as required:

Network	Hidden Layers	Description
PT2	2 hidden	Two hidden layers + output layer
PT3	3 hidden	Three hidden layers + output layer

Layer dimensions (configurable):

Input: d = 2 (features x1, x2)
Hidden 1: H1 neurons
Hidden 2: H2 neurons
Hidden 3: H3 neurons (only for PT3)
Output: K = 4 neurons (one per category)

Activation Functions

Layer	Options	Recommended
Hidden layers	`tanh` or `ReLU`	`tanh` (smoother gradients)
Output layer	`softmax` (implicit, via cross‑entropy loss)	Required for multi‑class

Why softmax? The output layer uses softmax to convert raw logits into a probability distribution over the four categories, enabling proper multi‑class classification.

Loss Function & Output Layer (Softmax)

The assignment called for multi‑class classification, so the network uses a softmax activation at the output layer combined with cross‑entropy loss. During backpropagation, the delta for each output neuron is simply:

This elegant formulation arises from the derivative of the cross‑entropy loss with respect to the softmax inputs, which makes the implementation both correct and efficient.

Backpropagation & Gradient Descent

The Network class implements full backpropagation:

Output layer error: For each output neuron, compute delta = output - target and update the bias.
Hidden layer error propagation: Starting from the last hidden layer and moving backward, deltas are computed using the chain rule:
- The delta of each neuron in layer L is the weighted sum of the deltas of the next layer, multiplied by the derivative of the activation function.
- This implements the standard backpropagation equations for multi‑layer networks (Rumelhart et al., 1986).
Weight update: After accumulating deltas across a mini‑batch, weights and biases are updated using the learningRate and the average gradient.

Batch Training (Mini‑batch)

The training loop processes N training examples in mini‑batches of size B (where B divides N):

When B = 1, gradients are updated after every single example (stochastic gradient descent – often noisier but faster).
When B = N, gradients are accumulated over the entire training set (batch gradient descent – stable but computationally heavy).
The actual code uses a configurable batch size.

The batch size is defined as N/B where B is set at the beginning of the program. The assignment explicitly requested testing with B = N/20 and B = N/200 to compare convergence speed and generalization.

Weight Initialization

All weights and biases are initialized randomly in the range (-1, 1) at the start of training, as specified by the assignment. This ensures symmetry is broken and the network can learn properly.

Implementation Details

Key Components

Class	Responsibility
`Neuron.java`	Stores weights, bias, delta (gradient), and output; applies activation functions.
`Layer.java`	Groups multiple neurons; manages forward/backward operations for one layer.
`Network.java`	Coordinates layers; performs forward pass, softmax, backward pass, and weight updates.
`DataGenerator.java`	Generates synthetic training and test datasets as CSV files.
`DataLoader.java`	Reads CSV files and returns feature arrays and one‑hot encoded label arrays.
`NeuralNetworkMain.java`	Entry point: loads data, initializes the network, runs training, and reports results.

Training & Evaluation Loop

Data Loading: Load classification_train.csv and classification_test.csv.
Network Initialization: Create either a PT2 or PT3 Network with specified hyperparameters.
Training:
- For a maximum number of epochs (default 1000), process mini‑batches.
- After each epoch, calculate and print the total training error.
- Early stopping: Terminate when the absolute difference in training error between two consecutive epochs falls below a threshold (e.g., 0.001), but only after at least 800 epochs have completed (as required by the assignment).
Evaluation:
- Compute and print the generalization accuracy (% correct predictions) on the test set.
- Identify the best network configuration based on test accuracy.

Experimental Setup & Hyperparameters

The assignment required systematic experimentation with:

Hidden layer sizes (H1, H2, H3): multiple combinations tested.
Activation functions: tanh vs ReLU in the deeper hidden layers.
Batch sizes: B = N/20 and B = N/200.
Learning rate: default 0.001.
Threshold: 0.001 (for early stopping criterion).

The results were recorded in a table documented in the assignment report (PDF). The best network was then used to visualize test set predictions (correct vs misclassified) using distinct markers.

How to Run

Prerequisites

Java Development Kit (JDK 8 or later)
Git

Steps

Clone the repository:

git clone https://github.com/MikeMiaris/NN_WIth_Java.git
cd NN_WIth_Java/NN_project

Compile the source files: javac DataGenerator.java DataLoader.java Layer.java Network.java NeuralNetworkMain.java
Generate the datasets: java DataGenerator
Run the neural network: java NeuralNetworkMain

Note: Compiled .class files are also present for quick execution.

Note

The current main class loads a PT3 network with:

Hidden layers: H1=3, H2=3, H3=3
Hidden activation: ReLU
Output activation: softmax
Learning rate: 0.001
Batch size: 20 (i.e., N/20)

You can modify these values directly in NeuralNetworkMain.java.

Results & Observations

The network successfully learns to classify the four categories with high accuracy. The full experimental tables are included in the assignment report (YN_project_2024-25.pdf, in Greek).

Key observations:

PT3 (3 hidden layers) generally achieves higher accuracy than PT2 (2 hidden layers) for this dataset, owing to its increased representational capacity.
tanh tended to provide smoother convergence than ReLU for deeper networks, though ReLU avoided vanishing gradient issues when networks were deep.
Smaller batch sizes (B=N/200) introduced more noise but often led to faster convergence in terms of epoch count.
The classification accuracy approaches ~97‑99% on the test set with well‑tuned hyperparameters.

For details on the second assignment (K‑means clustering) and the full report, please refer to the course materials.

Future Work

Implement momentum and learning rate decay to improve convergence.
Add support for dropout regularization.
Extend the network to handle regression problems.
Provide a command‑line interface or configuration file to set hyperparameters.

License

This project is for educational/academic purposes. Feel free to use and modify it. Original assignment credit: University of Ioannina, Department of Computer Science & Engineering, 2024‑2025.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
NN_project		NN_project
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Description

Dataset Generation (ΣΔΤ)

Classification Rules

Architecture & Design

Network Topology (PT2 & PT3)

Activation Functions

Loss Function & Output Layer (Softmax)

Backpropagation & Gradient Descent

Batch Training (Mini‑batch)

Weight Initialization

Implementation Details

Key Components

Training & Evaluation Loop

Experimental Setup & Hyperparameters

How to Run

Prerequisites

Steps

Note

Results & Observations

Future Work

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Problem Description

Dataset Generation (ΣΔΤ)

Classification Rules

Architecture & Design

Network Topology (PT2 & PT3)

Activation Functions

Loss Function & Output Layer (Softmax)

Backpropagation & Gradient Descent

Batch Training (Mini‑batch)

Weight Initialization

Implementation Details

Key Components

Training & Evaluation Loop

Experimental Setup & Hyperparameters

How to Run

Prerequisites

Steps

Note

Results & Observations

Future Work

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages