Skip to content

Nathan-R-Wheeler/COS226Final

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COS226Final

Nathan Wheeler
Professor Schotter
COS-226
12/15/25

This is the final project for my Data structures and Algorithms class. It uses indexes in B+ trees and A linked list hash table, as well as mergesort to store bulk items.

How to Run:
-The program must be run from the "dataStorage" file
-When the program first runs, it prompts the user with the main menu for what function they want the program to carry out:

Commands and expected outputs:
  1. Indexing:
    -When indexing, the initial input shall be a 1 to start indexing.
    -Input 0-9 for the property to be indexed.
    -The program should respond with a list of the properties again
    -The indexed ones now having "-Indexed" next to their property.

  2. Exact Value Search:
    -The user should initially input 2.
    -The program will prompt the user for choice of using the title or a quote.
    -The user enters 0 for Title, and 1 for Quote.
    -The program confirms their choice, and asks for input of title/quote.
    -The user inputs "Bee Movie".
    -The program will either deny the item in the database, or return it has found it.
    -The program will prompt the user to export it in a csv with 0, or remove it from the database with 1.
    -If the user choses 0, the program adds the choice to a .csv.
    -If the user choses 1, the program deletes it from the list.
    -The program then returns to the main menu.

  3. Range Query:
    -First, you must Index what you wish to Query.
    -Initial input should be 3.
    -The program will return a list of indexed properties to choose.
    -The user will enter in the corresponding number for their property.
    -The program will ask if the user wants to choose less than (0), greater than (1), or in between two values (2)
    -The user will input 0-2.
    -The program will prompt for the value(s) to search by.
    -The user will input the corresponding value(s).
    -The program will either deny the items in the database for that range, or return the values in the range.
    -The program will prompt the user to export them in a csv with 0, or remove them from the database with 1.
    -If the user choses 0, the program adds the values to a .csv.
    -If the user choses 1, the program deletes them from the list.
    -The program then returns to the main menu.

  4. Bye:
    -The user inputs 4.
    -The program thanks the user and terminates


Initialization of the database:

-In my database initialization, it takes around 2.92 seconds to load the 15000 dataItems, and due to the use of mergesort and bulkAdding, has around O(N log(N)) efficiency.

Creation of indexes:

-In my indexing, it is very fast, as i take already sorted bulk data from the initialization, and place it into a B plus tree. It uses the same Mergesorted data, and so the time it takes is still O(N log(N)). It indexes all data in 0.00002 seconds.

Queries:
-My queries are based off of exact values and indexes. My implementation of exact value uses a hash table with a double hashed linked list. at best it has an efficiency of O(n) and at worst its efficiency is O(1). The B plus tree, which has an efficiency of (O log(N)) is used for indexing.

Deletions:
-My deletion first has to delete from all hash tables, and due to my linked list setup, this takes around O(n) time to complete, next it deletes all the indexes and values from the b+ trees, which due to the nature of the B+ tree, this takes (O log(n)) time.


Hash function design choices:
-I chose to implement the best hash that I hav made within my hashing functions homework. It uses a double hashed linked list. I used this for my database because it had minimal collisions and good time efficiency in my other homework.

BPlus tree implementation approach:
-In my B plus tree, I took the one I made for homework, and added object oriented implementation for all of the properties in the dataItems.
-I refactored the tree to be bottom up when making the bulk insert for efficiency. that way it does not have to rotate, merge or steal. as well as keeping the buckets only 3/4ths fill, lowers the amount of merges and steals that have to happen.

Searching Choices:
-I made only the Title and Quotes searchable, because that is what makes sense to me, as although movies that share the same title are more common than ones that contain exactly the same quotes, most people look up what movie they are searching for by the title or phrases in the movie. I did not make the director searchable, because directors make tend to have multiple movies, and not many people look up movies by director. The fields that I did not make searchable are the numeric ones, such as rating, revanue, duration. But also the fields in which multiple movies share, like genre, director, production company, I chose to have those indexable as well.

Limitations and Issues:
-If indexing a large amount of dataItems, the writing to file feature can handle it no problem, however the deletion can sometimes crash the program. I am not too sure how to fix this, but deleting one or a few at a time works fine.

About

This is the final project for my Data structures and Algorithms class. It uses B+ trees and A hash table to sort bulk items.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages