Found 4 result(s) for "Distributed"! Click on the links for more details
Using Dijkstra to Route Packages in a Network Overlay
The goal of this project was to use an overlay to communicate between messaging nodes (clients) and the Registry (server) to send messages using threads. It follows the principles of TCP/IP protocols and overlays used in P2P systems. The registry accepts messaging nodes using a server socket and creates receiver and sender threads for that connection. The messaging nodes also have their own sets of threads for the registry and peers. The foreground process in the Registry allows the users to specify commands that creates the undirected graph with all registered nodes, connect to peers to form bidirectional connections, and start sending messages. When all nodes finish messaging each other randomly, the statistics are collected and can be compared to see if any packets were lost during the period.
Distributed_Analytics_of_US_Residential_Zoning
This is a project that aims to do distributed analytics using clusters using a spatial dataset. Our goal with this project was to analyze the impact of single family rresidential zoning in the US and correlate it to quality of life measures in an effort to dissuade a segreggation of zoning types and promote inclusivity. We hoped to be able to compare the results against data from other countries that have more includive zoning laws, but this was not possible due to constraints on data availability and language barriers. For the distributed component, we are using a cluster of 10 machines that are managed by Yarn. To do the processing of data and calculations, we applied Spark using Java and Gradle. The data itself was stored using HDFS and totaled to ~3.2 GB. For more detail on our motivation, procedures, project structure, and results, please reference the latex file or the presentation in the GitHub repo.
Analysis of the MovieLens Dataset using Apache Spark
This project was an introduction to using Apache Spark to analyze a large file (~800 MB), namely the Movie Lens dataset containing movies, genres, ratings, etc. The files were stored using HDFS and cluster size consisted of 10 machines. There is 1 Java file with 7 Spark jobs which are focused on answering the 7 questions that can be found on GitHub.
CS455: Introduction to Distributed Systems
Covered fundamental ideas and issues in building distributed systems. Examined issues related to concurrent programming, thread pools and safety, non-blocking I/O, scalable server design, file system design, distributed mutual exclusion and deadlock detection, consensus and consistency, pipelining schemes, distributed graph algorithms, distributed shared memory, distributed objects, and MapReduce.