Skip to content

Commit

Permalink
Merge pull request #58 from bobleesj/review-doc
Browse files Browse the repository at this point in the history
Use GitHub CI to conduct GitHub release, PyPI upload, CHANGELOG update
  • Loading branch information
bobleesj authored Oct 28, 2024
2 parents e523e1a + 3911701 commit f798ea7
Show file tree
Hide file tree
Showing 16 changed files with 319 additions and 74 deletions.
22 changes: 0 additions & 22 deletions .codespell/ignore_lines.txt

This file was deleted.

11 changes: 0 additions & 11 deletions .codespell/ignore_words.txt

This file was deleted.

2 changes: 0 additions & 2 deletions .github/workflows/check-news-item.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ name: Check for News

on:
pull_request_target:
branches:
- main

jobs:
build:
Expand Down
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,9 @@ repos:
name: Prevent Commit to Main Branch
args: ["--branch", "main"]
stages: [pre-commit]
- repo: https://github.com/codespell-project/codespell
rev: v2.2.4
hooks:
- id: codespell
additional_dependencies:
- tomli
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![CI](https://github.com/bobleesj/cifkit/actions/workflows/tests-on-pr.yml/badge.svg?branch=main)](https://github.com/bobleesj/cifkit/actions/workflows/tests-on-pr.yml)
[![codecov](https://codecov.io/gh/bobleesj/cifkit/graph/badge.svg?token=AN2YAC337A)](https://codecov.io/gh/bobleesj/cifkit)
![Python - Version](https://img.shields.io/pypi/pyversions/quacc)
![Python - Version](https://img.shields.io/pypi/pyversions/cifkit)
[![PyPi version](https://img.shields.io/pypi/v/cifkit.svg)](https://pypi.python.org/pypi/cifkit)
[![Conda version](https://img.shields.io/conda/vn/conda-forge/cifkit)](https://anaconda.org/conda-forge/cifkit)

Expand Down
3 changes: 1 addition & 2 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# API reference

This section contains the automatic API reference for all modules in the
`cifkit` package.
This section contains the automatic API reference for `Cif` and `CifEnsemble` modules in the `cifkit` package.

::: cifkit
55 changes: 27 additions & 28 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,27 @@
## Statement of need

`cifkit` uses .cif files by offering higher-level functions and variables that
enable users to perform complex tasks efficiently with a few lines of code.
`cifkit` facilitates the visualization of coordination geometry from each site
but also extracts physics-based features like volume and packing efficiency,
crucial for structural analysis in ML tasks. Additionally, it extracts atomic
mixing information at the bond pair level—tasks that would otherwise require
extensive manual effort using GUI-based tools like VESTA, Diamond, and
CrystalMaker, due to the lack of readily available higher-level functions.

Further enhancing its utility, cifkit excels in sorting, preprocessing, and
understanding the distribution of underlying CIF files. Common issues in CIF
files from databases, such as incorrect loop values and missing fractional
coordinates, are systematically addressed as cifkit standardizes and filters out
ill-formatted files. It also preprocesses atomic site labels, transforming
labels such as 'M1' to 'Fe1' in files with atomic mixing. Beyond error
correction, cifkit provides functionalities to copy, move, and sort files based
on attributes like coordination numbers, space groups, unit cells, and shortest
distances. It also excels in visualizing and cataloging CIF files, organizing
them based on supercell size, tags, coordination numbers, elements, and atomic
mixing, among other parameters.
enable users to perform complex tasks efficiently with a few lines of code. `cifkit` distinguishes itself from existing libraries by offering higher-level
functions and variables that allow solid-state synthesists to obtain intuitive and
measurable properties impactful properties. It facilitates the visualization of
coordination geometry from each site using four coordination determination
methods and extracts physics-based features like volume and packing
efficiency—crucial for structural analysis in machine learning tasks. Moreover,
`cifkit` extracts atomic mixing information at the bond pair level, tasks that
would otherwise require extensive manual effort using GUI-based tools like
VESTA, Diamond, and CrystalMaker.

`cifkit` further enhances its utility by providing functions for sorting,
preprocessing, and analyzing the distribution of underlying CIF files. It
systematically addresses common issues in CIF files from databases, such as
incorrect loop values and missing fractional coordinates, by standardizing and
filtering out ill-formatted files. The package also preprocesses atomic site
labels, transforming labels like 'M1' to 'Fe1' in files with atomic mixing for
improved visualization and pattern matching. Beyond error correction, `cifkit`
offers functionalities to copy, move, and sort files based on attributes such as
coordination numbers, space groups, unit cells, and shortest distances. It
excels in visualizing and cataloging CIF files, organizing them by supercell
size, tags, coordination numbers, elements, and atomic mixing.

## TL;DR

Expand All @@ -44,10 +46,7 @@ mixing, among other parameters.

## Processing speed expectation

Based on the Apple M1 iMac chip, processing a .cif file of 150 files takes
around 50 seconds, but this depends on the size of the unit cell in the .cif
files. Processing 10,000 .cif files or more may take about an hour, but this
also depends on the laptop.
Processing approximately 10,000 .cif files on a standard laptop (iMac with M1 chip) may take about 30 to 60 minutes. At this rate, we can process nearly all .cif files within 1–2 days.

## Installation

Expand All @@ -59,14 +58,14 @@ Python 3.10, 3.11, 3.12 are supported.

Option 1. conda install

The preferred method is to install `cifkit` using Conda.
The preferred method is to install `cifkit` using Conda:

```bash
conda create cifkit_env cifkit
conda create -n cifkit_env cifkit
conda activate cifkit_env
```

If the above option does not work, please feel free to use pip install.
If the above option does not work, please feel free to use pip install:

Option 2. pip install

Expand Down Expand Up @@ -100,7 +99,7 @@ for label in site_labels:
cif.plot_polyhedron(label, is_displayed=True)
```

![Polyhedron generation](assets/img/ErCoIn_polyhedron.png)
![Polyhedron generation](assets/img/ErCoIn-polyhedron.png)

For more, visit
[https://bobleesj.github.io/cifkit/notebooks/01_cif](https://bobleesj.github.io/cifkit/notebooks/01_cif)
Expand All @@ -118,7 +117,7 @@ The following example generates a distribution of structure.
```python
from cifkit import CifEnsemble

ensemble = CifEnsemble("cif_containing_folder_path")
ensemble = CifEnsemble("your_cif_containing_folder_path")
ensemble.generate_structure_histogram()
```

Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/02_cif_ensemble.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@
"ensemble.generate_site_mixing_type_histogram(output_dir=\"path/to/directory\")\n",
"\n",
"# Optional: Call plt.show() to display the histogram on screen.\n",
"ensemble.generate_site_mixing_type_histogram(display=False)\n",
"ensemble.generate_site_mixing_type_histogram(display=True)\n",
"'''"
]
}
Expand Down
82 changes: 82 additions & 0 deletions paper-review-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
## Response to @espottesmith

**Paper comments:**

> The summary is generally appropriate. From context, a non-expert might realize what *.cif files are (since crystal structures are mentioned in the second sentence), but this could be more explicitly stated in the first sentence.
Thank you for your suggestion. The summary now incorporates what *.cif files are:

```markdown
`cifkit` provides higher-level functions and properties for coordination
geometry and atomic site analysis from .cif files, which are standard file
formats for storing crystallographic data such as atomic fractional coordinates,
symmetry operations, and unit cell dimensions.
```

> I come from a computational background, so I'm not the target audience, but I am unconvinced that the basic functionality (i.e., parsing CIF files and extracting information like structures and formulas) is easier in cifkit than in pymatgen/ASE. Similar numbers of lines of code would be required, at least comparing pymatgen and the example given in the cifkit paper (see https://pymatgen.org/usage.html#reading-and-writing-structuresmolecules). Considering higher-level features of cifkit, including filtering groups of CIFs and visualization, I think that cifkit distinguishes itself from other codes. In this area, the authors' argument is more convincing. Perhaps a slight reframing of the Statement of Need is appropriate?
This is a valid point. I have modified the "Statement of Need" and emphasized the utility of higher-level functions by modifying the section as follow:

```markdown
`cifkit` distinguishes itself from existing libraries by offering higher-level
functions and variables that allow solid-state synthesists to obtain intuitive and
measurable properties impactful properties. It facilitates the visualization of
coordination geometry from each site using four coordination determination
methods and extracts physics-based features like volume and packing
efficiency—crucial for structural analysis in machine learning tasks. Moreover,
`cifkit` extracts atomic mixing information at the bond pair level, tasks that
would otherwise require extensive manual effort using GUI-based tools like
VESTA, Diamond, and CrystalMaker. These functions can be further developed
on-demand, as demonstrated by `cifkit`'s ability to extract coordination
geometry information based on four coordination number determination methods for
a newly discovered phase [@tyvanchuk_crystal_2024].
```

> Otherwise, the paper looks good!
Thank you!

**Code comments:**

> The folder "[occupacny"](https://github.com/bobleesj/cifkit/tree/main/src/cifkit/occupacny) should probably be "occupancy"
Modified. Thank you.

> I initially tried installing with Python 3.9 (as far as I can tell, the "pyproject.toml" and the docs don't specify a version). This caused problems, as cifkit is using type annotation syntax that is not supported in Python 3.9. I now see on PYPI that cifkit only supports Python 3.10, 3.11, and 3.12. Please make this more clear for users.
In `pyproject.toml`, I have added the following `requires-python = '>=3.10, <3.13` to support 3.10, 3.11, 3.12. Per your suggestion, I have included the supported Python badges in the documentation under the Getting Started section.

> In terms of code style, more extensive docstrings (for instance, explaining what the inputs and outputs should be) would be appreciated. It seems there are already some "TODO"s to this effect (see e.g., models/cif.py), so perhaps the authors are planning on making these changes soon. Function bodies are typically well commented (note that I have not read every file)
I have added numpy-style docstrings for the main classes `Cif` and `CifEnsemble` here: https://github.com/bobleesj/cifkit/pull/49. I have also identified some code methods that could be refactored, as noted here: https://github.com/bobleesj/cifkit/issues/53. My next step is to further refine these docstrings by consolidating them into reusable methods.

**Documentation comments:**

> Examples in the docs aren't the most consistent. For instance, some example lines of code use "ensemble" as the variable, and some use "ensemble_test". It may also be better if the commented outputs reflected the example folder ("Example.ErCoIn_big_folder_path") so users can more easily verify correctness. Finally, incorporating the visualization tools in the docs (currently they're demonstrated only in the README) would be appreciated.
Thank you for your suggestion. I have now maintained the instance name `ensemble` consistently. The actual output using the Jupyter notebook has been used for all examples.

> The statement of need could be made more clear in the docs
I have added the statement of need in the beginning of the docs: https://bobleesj.github.io/cifkit/.

> Community guidelines basically amount to "submit a PR or talk to the lead author", but for a project at this scale, perhaps that's appropriate.
Thank you. I have created a PR template https://github.com/bobleesj/cifkit/blob/main/.github/pull_request_template.md and instructions on how to contribute here https://github.com/bobleesj/cifkit/blob/main/CONTRIBUTING.md.

**Overall comments:**

> In terms of "substantial scholarly effort", this is a reasonably small, simple, and young project. That said, it is clearly already being used for scholarly research, and it is somewhat unique in terms of the features that it offers. As such, I think it should be considered substantial.
Thank you. For one of the projects mentioned, SAF (Structure/Composition Analyzer), which uses approximately 100 features generated using cifkit, we have a pre-print version available: https://chemrxiv.org/engage/chemrxiv/article-details/670aa269cec5d6c142f3b11a. I will continue to maintain the software for on-going research efforts.

> The paper does not contain original data or results
Yes, .cif files must be provided for us to conduct analysis.

> The authors do not make significant performance claims
I have added information to the documentation stating that processing approximately 10,000 .cif files on a standard laptop (iMac with M1 chip) takes about 30 to 60 minutes. At this rate, we can process nearly all .cif files within 1–2 days. However, I have decided not to include performance metrics in the manuscript. Our code may evolve further—for instance, by using matrix multiplications to compute distances, as in ASE—and performance hasn't been a major bottleneck for our research.

> With some relatively small tweaks, I think this will make a good addition to JOSS.
Thank you for your review!
Loading

0 comments on commit f798ea7

Please sign in to comment.