Search

Graph Algorithms and Visualization

This is a working graph visualization and algorithms program that produces outputs to common graph algorithms and images for certain graphs. It uses the Graphviz library for C++ to generate graph images, but the other algorithms are implemented by me. There are four graphs produced by the directed and undirected versions each. Each will have information regarding the operation and print the graph along with the paths in red if applicable. As for the algorithms, the results are outputted to the console, but the program supports most algorithms on graphs.

Distributed_Analytics_of_US_Residential_Zoning

This is a project that aims to do distributed analytics using clusters using a spatial dataset. Our goal with this project was to analyze the impact of single family rresidential zoning in the US and correlate it to quality of life measures in an effort to dissuade a segreggation of zoning types and promote inclusivity. We hoped to be able to compare the results against data from other countries that have more includive zoning laws, but this was not possible due to constraints on data availability and language barriers. For the distributed component, we are using a cluster of 10 machines that are managed by Yarn. To do the processing of data and calculations, we applied Spark using Java and Gradle. The data itself was stored using HDFS and totaled to ~3.2 GB. For more detail on our motivation, procedures, project structure, and results, please reference the latex file or the presentation in the GitHub repo.

Analysis of the MovieLens Dataset using Apache Spark

This project was an introduction to using Apache Spark to analyze a large file (~800 MB), namely the Movie Lens dataset containing movies, genres, ratings, etc. The files were stored using HDFS and cluster size consisted of 10 machines. There is 1 Java file with 7 Spark jobs which are focused on answering the 7 questions that can be found on GitHub.

Analysis of Million Songs Dataset using Hadoop MapReduce

This project was an introduction to using Hadoop MapReduce to analyze a large file (~1.6 GB), namely the Million Song subset containing 10,000 songs. The files were stored using HDFS and cluster size consisted of 10 machines. There are 10 Java files with jobs of their own which are focused on answering the 10 questions below. Please visit the github for more details on the questions, answers, and more.

CS220: Discrete Structures and their Applications

Practiced the use and implementation of integer representations and properties, propositions, predicates, sets, functions, program proofs, induction, counting, complexity, graphs, trees, invariants. Accompanied lectures with Python assignments and labs.

CS165: Data Structures

Studied and utilized various data structures, algorithms, and interfaces in Java. Some examples are arrays, lists, queues, stacks, BSTs, expression trees, B+ Trees, graphs, hash tables, iterables, etc.