Scalable Computation of Molecular Trajectory Analysis Using Apache Spark

This project demonstrates the use of Apache Spark for scalable and efficient post-processing of molecular dynamics (MD) simulation data. It focuses on calculating the Mass Accommodation Coefficient (MAC) to analyze phase change phenomena like evaporation and boiling.

Features

Parallel processing of large MD trajectory datasets using Apache PySpark.
Performance benchmarking against sequential implementations.
Handles datasets with periodic boundary conditions and large-scale molecule trajectories.

Problem Statement

MD simulations generate vast datasets that require significant computational resources for post-processing. This project leverages Spark's distributed memory mechanism to reduce runtime and improve efficiency.

Project Structure

Sequential Code: A MATLAB implementation for baseline performance.
Parallel Code: A PySpark implementation to process data efficiently.

Results

Achieved up to 4x speedup in runtime with optimized code.
Benchmarked performance on datasets up to 3 GB in size.

Challenges

Limited by the lack of an HDFS cluster and reliance on single-node Databricks Community Edition.

How to Use

Clone the repository:

git clone https://github.com/swargo98/Scalable-Computation-of-Molecular-Trajectory-Analysis.git

Install dependencies
Fix the sbatch file to submit the job.
Submit the job with your trajectory data using:
```
sbatch your_script.sbatch
```

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ColabNotebook_MolecularTrajectoryAnalysis.ipynb		ColabNotebook_MolecularTrajectoryAnalysis.ipynb
DatabricksNotebook_MolecularTrajectoryAnalysis.ipynb		DatabricksNotebook_MolecularTrajectoryAnalysis.ipynb
Project Report.pdf		Project Report.pdf
PythonScript_MolecularTrajectoryAnalysis.py		PythonScript_MolecularTrajectoryAnalysis.py
README.md		README.md
sequential_code.m		sequential_code.m
spark_cluster_slurm_script.sbatch		spark_cluster_slurm_script.sbatch
spark_one_node_slurm_script.sbatch		spark_one_node_slurm_script.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scalable Computation of Molecular Trajectory Analysis Using Apache Spark

Features

Problem Statement

Project Structure

Results

Challenges

How to Use

About

Releases

Packages

Languages

swargo98/Scalable-Computation-of-Molecular-Trajectory-Analysis

Folders and files

Latest commit

History

Repository files navigation

Scalable Computation of Molecular Trajectory Analysis Using Apache Spark

Features

Problem Statement

Project Structure

Results

Challenges

How to Use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages