-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
122 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,8 +4,6 @@ __pycache__/ | |
*$py.class | ||
.DS_Store | ||
|
||
# module | ||
nodemon.json | ||
|
||
# Cif files | ||
20240531_ternary_binary_combined/ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"proseWrap": "always", | ||
"printWidth": 80 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,16 +2,25 @@ | |
|
||
![Header](https://s9.gifyu.com/images/SViLp.png) | ||
|
||
[![Integration tests](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml/badge.svg)](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml) ![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg) ![Python 3.11](https://img.shields.io/badge/python-3.12-blue.svg) | ||
[![Integration tests](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml/badge.svg)](https://github.com/bobleesj/cif-bond-analyzer/actions/workflows/python-run-pytest.yml) | ||
![Python 3.10](https://img.shields.io/badge/python-3.10-blue.svg) | ||
![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg) | ||
![Python 3.11](https://img.shields.io/badge/python-3.12-blue.svg) | ||
|
||
The CIF Bond Analyzer (CBA) is an interactive, command-line-based application designed for high-throughput extraction of bonding information from CIF (Crystallographic Information File) files. CBA offers Site Analysis, System Analysis for binary/ternary systems, and Coordination Analysis. The outputs are saved in `.json`, `.xlsx`, and `.png `formats. | ||
The CIF Bond Analyzer (CBA) is an interactive, command-line-based application | ||
designed for high-throughput extraction of bonding information from CIF | ||
(Crystallographic Information File) files. CBA offers Site Analysis, System | ||
Analysis for binary/ternary systems, and Coordination Analysis. The outputs are | ||
saved in `.json`, `.xlsx`, and `.png `formats. | ||
|
||
The current README.md serves as a tutorial and documentation. | ||
|
||
## Value | ||
|
||
CBA simplifies crystal structure analysis by automating the extraction of minimum bond lengths, which are crucial for understanding geometric configurations and identifying irregularities. Histograms and figures assist in identifying distinct bond lengths and structural patterns. | ||
|
||
CBA simplifies crystal structure analysis by automating the extraction of | ||
minimum bond lengths, which are crucial for understanding geometric | ||
configurations and identifying irregularities. Histograms and figures assist in | ||
identifying distinct bond lengths and structural patterns. | ||
|
||
## Demo | ||
|
||
|
@@ -30,7 +39,8 @@ $ pip install -r requirements.txt | |
$ python main.py | ||
``` | ||
|
||
Once the code is executed using `python main.py`, the following prompt will appear, asking you to choose one of the three analysis options: | ||
Once the code is executed using `python main.py`, the following prompt will | ||
appear, asking you to choose one of the three analysis options: | ||
|
||
```text | ||
Welcome! Please choose an option to proceed: | ||
|
@@ -55,51 +65,76 @@ Would you like to process each folder above sequentially? | |
(Default: Y) [Y/n]: | ||
``` | ||
|
||
You may then choose to process folders either sequentially or select specific folders by entering numbers associated with the folders prompted. | ||
For each folder, CBA generates site pair data saved in `site_pairs.json` or `site_pairs.xlsx`. | ||
You may then choose to process folders either sequentially or select specific | ||
folders by entering numbers associated with the folders prompted. For each | ||
folder, CBA generates site pair data saved in `site_pairs.json` or | ||
`site_pairs.xlsx`. | ||
|
||
## Preprocess | ||
|
||
The following discusses formatting, supercell generation, and atomic mixing information. | ||
The following discusses formatting, supercell generation, and atomic mixing | ||
information. | ||
|
||
### 1. Format files | ||
|
||
CBA uses the `CifEnsemble` object from `cifkit` to conduct preprocessing automatically. | ||
CBA uses the `CifEnsemble` object from `cifkit` to conduct preprocessing | ||
automatically. | ||
|
||
- CBA standardizes the site labels in `atom_site_label`. Some site labels may contain a comma or a symbol such as `M` due to atomic mixing. CBA reformats each `atom_site_label` so it can be parsed into an element type that matches `atom_site_type_symbol`. | ||
- CBA standardizes the site labels in `atom_site_label`. Some site labels may | ||
contain a comma or a symbol such as `M` due to atomic mixing. CBA reformats | ||
each `atom_site_label` so it can be parsed into an element type that matches | ||
`atom_site_type_symbol`. | ||
|
||
- CBA removes the content of `publ_author_address`. This section often has an incorrect format that otherwise requires manual modifications. | ||
- CBA removes the content of `publ_author_address`. This section often has an | ||
incorrect format that otherwise requires manual modifications. | ||
|
||
- CBA relocates any ill-formatted files, such as those with duplicate labels in `atom_site_label`, missing fractional coordinates, or files that require supercell generation. | ||
- CBA relocates any ill-formatted files, such as those with duplicate labels in | ||
`atom_site_label`, missing fractional coordinates, or files that require | ||
supercell generation. | ||
|
||
### 2. Supercell generation | ||
|
||
For each `.cif` file, a unit cell is generated by applying the symmetry operations. A supercell is generated by applying ±1 shifts from the unit cell. | ||
For each `.cif` file, a unit cell is generated by applying the symmetry | ||
operations. A supercell is generated by applying ±1 shifts from the unit cell. | ||
|
||
### 3. Atomic mixing info | ||
|
||
Each bonding pair is defined with one of four atomic mixing categories: | ||
|
||
- **Full occupancy** is assigned when a single atomic site occupies the fractional coordinate with an occupancy value of 1. | ||
- **Full occupancy with mixing** is assigned when multiple atomic sites collectively occupy the fractional coordinate to a sum of 1. | ||
- **Deficiency without mixing** is assigned when a single atomic site occupying the fractional coordinate with a sum less than 1. | ||
- **Deficiency with atomic mixing** is assigned when multiple atomic sites occupy the fractional coordinate with a sum less than 1. | ||
- **Full occupancy** is assigned when a single atomic site occupies the | ||
fractional coordinate with an occupancy value of 1. | ||
- **Full occupancy with mixing** is assigned when multiple atomic sites | ||
collectively occupy the fractional coordinate to a sum of 1. | ||
- **Deficiency without mixing** is assigned when a single atomic site occupying | ||
the fractional coordinate with a sum less than 1. | ||
- **Deficiency with atomic mixing** is assigned when multiple atomic sites | ||
occupy the fractional coordinate with a sum less than 1. | ||
|
||
## Analysis Options | ||
|
||
CBA provides three options for analysis. | ||
|
||
### Option 1. Site Analysis | ||
|
||
- **Purpose:** Site Analysis determines the shortest distance and its nearest neighbor for each label in `atom_site_label`. | ||
- **Purpose:** Site Analysis determines the shortest distance and its nearest | ||
neighbor for each label in `atom_site_label`. | ||
|
||
- **Process:** For each atom in the unit cell, Euclidean distances are calculated from the atom to all atoms in the supercell. The position of the atom in the unit cell for each site label is determined based on the atom with the greatest number of shortest distances to its neighbors. | ||
- **Process:** For each atom in the unit cell, Euclidean distances are | ||
calculated from the atom to all atoms in the supercell. The position of the | ||
atom in the unit cell for each site label is determined based on the atom with | ||
the greatest number of shortest distances to its neighbors. | ||
|
||
- **Example:** If a `.cif` file under `atom_site_label` contains four site labels: `Er1`, `Er2`, `Er3`, and `Er4`. The bonding pair from the site label `Er4` and its nearest neighbor `Er2` is unique and recorded. The bonding pair from `Er3` to `Er2` is also considered unique. However, the pairs `Er4-Er2` and `Er2-Er4` are considered identical. Out of the two pairs, the pair with the shorter distance is recorded below. | ||
- **Example:** If a `.cif` file under `atom_site_label` contains four site | ||
labels: `Er1`, `Er2`, `Er3`, and `Er4`. The bonding pair from the site label | ||
`Er4` and its nearest neighbor `Er2` is unique and recorded. The bonding pair | ||
from `Er3` to `Er2` is also considered unique. However, the pairs `Er4-Er2` | ||
and `Er2-Er4` are considered identical. Out of the two pairs, the pair with | ||
the shorter distance is recorded below. | ||
|
||
#### Output 1.1 Excel and JSON | ||
|
||
Data for each folder is saved in `site_pairs.json` or `site_pairs.xlsx`. Below is an example of the JSON structure for bond pairs: | ||
Data for each folder is saved in `site_pairs.json` or `site_pairs.xlsx`. Below | ||
is an example of the JSON structure for bond pairs: | ||
|
||
```json | ||
{ | ||
|
@@ -133,7 +168,8 @@ Data for each folder is saved in `site_pairs.json` or `site_pairs.xlsx`. Below i | |
} | ||
``` | ||
|
||
The minimum bond pair for each file is saved in `element_pairs.json` and `element_pairs.xlsx`. | ||
The minimum bond pair for each file is saved in `element_pairs.json` and | ||
`element_pairs.xlsx`. | ||
|
||
```json | ||
{ | ||
|
@@ -166,7 +202,8 @@ Here is a screenshot of `element_pairs.xlsx`. | |
|
||
#### Output 1.2 text summary | ||
|
||
A summary text file, `summary_element.txt`, lists the shortest bonding pairs and identifies missing pairs across selected folders: | ||
A summary text file, `summary_element.txt`, lists the shortest bonding pairs and | ||
identifies missing pairs across selected folders: | ||
|
||
```txt | ||
Summary: | ||
|
@@ -194,17 +231,22 @@ Fe-Co | |
|
||
#### Output 1.3 histograms | ||
|
||
`histogram_element_pair.png` and `histogram_site_pair.png` are used visualize data, with colors indicating atomic mixing types. | ||
`histogram_element_pair.png` and `histogram_site_pair.png` are used visualize | ||
data, with colors indicating atomic mixing types. | ||
|
||
- To modify the x-axis, run `python plot-histogram.py`. This script allows you to interactively specify parameters such as the bin width and x-axis range: | ||
- To modify the x-axis, run `python plot-histogram.py`. This script allows you | ||
to interactively specify parameters such as the bin width and x-axis range: | ||
|
||
![Histograms for label pair](https://s9.gifyu.com/images/SViMv.png) | ||
|
||
### Option 2. System Analysis | ||
|
||
- **Purpose:** System Analysis provides an overview of bond fractions acquired from Option 1: Site Analysis, or bond fractions in coordination number geometries. | ||
- **Purpose:** System Analysis provides an overview of bond fractions acquired | ||
from Option 1: Site Analysis, or bond fractions in coordination number | ||
geometries. | ||
|
||
- **Scope:** System Analysis is applicable for folders containing either 2 or 3 unique elements. | ||
- **Scope:** System Analysis is applicable for folders containing either 2 or 3 | ||
unique elements. | ||
|
||
4 types of folders are applicable for System Analysis. | ||
|
||
|
@@ -213,7 +255,6 @@ Fe-Co | |
- Type 3. Ternary files, 3 unique elements | ||
- Type 4. Ternary and binary combined, 3 unique elements | ||
|
||
|
||
Here is an example of CBA detecting folders containing 2 or 3 unique elements. | ||
|
||
````` | ||
|
@@ -227,14 +268,14 @@ Available folders containing 2 or 3 unique elements: | |
|
||
#### Output 2.1 Binary/ternary figures | ||
|
||
|
||
For Types 2, 3, and 4: | ||
|
||
![ternary](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/7496f433-c218-49ac-8372-cb75a369e409) | ||
|
||
To customize the legend position in the ternary diagram, you may modify the values of `X_SHIFT = 0.0` and `Y_SHIFT = 0.0` in `core/configs/ternary.py`. | ||
To customize the legend position in the ternary diagram, you may modify the | ||
values of `X_SHIFT = 0.0` and `Y_SHIFT = 0.0` in `core/configs/ternary.py`. | ||
|
||
For Type 1: | ||
For Type 1: | ||
|
||
![binary_single](https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/21f25fb3-79ea-4cd1-931d-ad5b3ea55189) | ||
|
||
|
@@ -256,20 +297,25 @@ Bond count per each `cif` file is recorded in `system_analysis_files.xlsx`. | |
|
||
<img width="753" alt="SA_main" src="https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/024b9f0f-5a5f-43ae-8e70-86031db9d26a"> | ||
|
||
Average bond lenghts, count, and statistical values are recorded in `system_analysis_main.xlsx`. | ||
Average bond lenghts, count, and statistical values are recorded in | ||
`system_analysis_main.xlsx`. | ||
|
||
<img width="1025" alt="SA_file" src="https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/420193ec-081a-4df2-b56e-9cddcefa00cb"> | ||
|
||
|
||
### Option 3. Coordination Analysis | ||
|
||
- **Purpose:** This option determines the best coordination geometry using four methods provided in `cifkit`. Excel files and JSON are saved with nearest neighbor information. | ||
- **Purpose:** This option determines the best coordination geometry using four | ||
methods provided in `cifkit`. Excel files and JSON are saved with nearest | ||
neighbor information. | ||
|
||
- **Customization:** The Excel contains `Δ`, which is defined as the interatomic distance subtracted by the sum of atomic radii. You may provide your radii values by modifying the `radii.xlsx` file. | ||
- **Customization:** The Excel contains `Δ`, which is defined as the interatomic | ||
distance subtracted by the sum of atomic radii. You may provide your radii | ||
values by modifying the `radii.xlsx` file. | ||
|
||
#### Ouput 3.1 JSON | ||
|
||
For each site, the nearest neighbors within the coordination number geometry are recorded in `CN_connections.json`. | ||
For each site, the nearest neighbors within the coordination number geometry are | ||
recorded in `CN_connections.json`. | ||
|
||
```python | ||
{ | ||
|
@@ -311,7 +357,8 @@ For each site, the nearest neighbors within the coordination number geometry are | |
|
||
#### Output 3.2 Excel | ||
|
||
For each `.cif` file, the nearest neighbor information is wrriten in each sheet within `CN_connections.xlsx`. | ||
For each `.cif` file, the nearest neighbor information is wrriten in each sheet | ||
within `CN_connections.xlsx`. | ||
|
||
<img width="842" alt="CN_excel" src="https://github.com/bobleesj/cif-bond-analyzer/assets/14892262/6322cacf-5ab0-4855-90e3-56aaddf6ab1f"> | ||
|
||
|
@@ -347,9 +394,16 @@ Please feel free to reach out via [email protected] for any questions. | |
|
||
## Changelog | ||
|
||
- 20240623 - Implement CN bond fractions, add GitHub CI. See [Pull #22](https://github.com/bobleesj/cif-bond-analyzer/pull/22). | ||
- 20240330 - Add sequential folder processing and customizable histogram generation. See [Pull #16](https://github.com/bobleesj/cif-bond-analyzer/pull/16). | ||
- 20240311 - Integrate PEP8 linting with `black`. See [Pull #12](https://github.com/bobleesj/cif-bond-analyzer/pull/12). | ||
- 20240310 - Enhance output options to include both element-based and label-based data for Excel, JSON, and histograms. See [Pull #11](https://github.com/bobleesj/cif-bond-analyzer/pull/11). | ||
- 20240301 - Display atom counts and execution time per file in Terminal; adds CSV logging. | ||
- 20240623 - Implement CN bond fractions, add GitHub CI. See | ||
[Pull #22](https://github.com/bobleesj/cif-bond-analyzer/pull/22). | ||
- 20240330 - Add sequential folder processing and customizable histogram | ||
generation. See | ||
[Pull #16](https://github.com/bobleesj/cif-bond-analyzer/pull/16). | ||
- 20240311 - Integrate PEP8 linting with `black`. See | ||
[Pull #12](https://github.com/bobleesj/cif-bond-analyzer/pull/12). | ||
- 20240310 - Enhance output options to include both element-based and | ||
label-based data for Excel, JSON, and histograms. See | ||
[Pull #11](https://github.com/bobleesj/cif-bond-analyzer/pull/11). | ||
- 20240301 - Display atom counts and execution time per file in Terminal; adds | ||
CSV logging. | ||
- 20240229 - Expand file support to include all CIF files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"exec": "ruff format && npx prettier --write **/*.md && python -m pytest", | ||
"ext": "py", | ||
"ignore": ["*.pyc", "*__pycache__*"], | ||
"watch": ["*.*"] | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters