-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLEP024 Guideline for external posts on scikit-learn blog #92
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
slep012/proposal | ||
slep017/proposal | ||
slep019/proposal | ||
slep024/proposal | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
.. _slep_024: | ||
|
||
=========================================================================== | ||
SLEP024: Guideline for external contributions to the scikit-learn blog post | ||
=========================================================================== | ||
|
||
:Author: Guillaume Lemaitre, François Goupil | ||
:Status: Draft | ||
:Type: Standards Track | ||
:Created: 2024-08-09 | ||
|
||
Abstract | ||
-------- | ||
|
||
This SLEP proposes some guidelines for writing and reviewing external contributions | ||
to the scikit-learn blog post. | ||
|
||
Detailed description | ||
-------------------- | ||
|
||
Scikit-learn has a blog post available at the following URL: | ||
https://blog.scikit-learn.org/. Since its origin, the blog post is used to relay | ||
information related to diverse subject such as sprints, interviews of contributors, | ||
collaborations, and technical content. | ||
|
||
When it comes to technical content, up to now, the content is only limited to the | ||
scikit-learn library. However, the scikit-learn community is going beyond the | ||
library itself and had developed compatible tools for years. As an example, the | ||
scikit-learn-contrib repository [2]_ is hosting a collection of tools which are not | ||
part of the main library but are still compatible with scikit-learn. | ||
|
||
This SLEP proposes to extend the scope of the technical content of the blog post to | ||
accept contributions in link with the scikit-learn ecosystem but not limited to the | ||
scikit-learn library itself. However, it is necessary to define some guidelines to | ||
manage expectations of contributors and readers. | ||
|
||
Here, we define the guidelines for external contributions that should be used to | ||
write and review external contributions to the scikit-learn blog post. | ||
|
||
Guidelines | ||
---------- | ||
|
||
In this section, we provide a set of guidelines to ease the discussions when reviewing | ||
external contributions to the scikit-learn blog post. It should help both the authors | ||
and the reviewers. | ||
|
||
Inclusion criteria | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
To accept an external contribution, the blog post should be related to the scikit-learn | ||
ecosystem. When it comes to presenting a compatible tool, the criteria are the | ||
following: | ||
|
||
- The tool should be compatible with scikit-learn. | ||
- The tool should be under an open-source license. | ||
- The tool should be actively maintained. | ||
- The tool should have a clear documentation. | ||
- The tool should be well tested. | ||
- The tool should not be a commercial product or serve advertisement for a company. | ||
Comment on lines
+50
to
+59
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is very similar to what we had when I was in NumFocus's affiliated project committee. These criteria are quite hard to assess, and also quite hard to maintain. They seem like a low bar, but they're indeed quite high of a passing bar. I would probably modify this to something like: The scikit-learn blog is not an opinionated place when it comes to tools. Posts are included if somebody takes the effort of writing them. However, we don't want the blog to be a place where it's flooded by companies trying to advertise their products. Therefore we have the following requirements:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current content focusses a lot on the tool/library that the blog post is covering. Adrin's suggestion is more about the content of the blog post. I think that is the right direction to go. I've been thinking about how to express my thoughts and so far the best I've come up with is that we should try and highlight what we do want to see in the blog posts (as opposed to focussing on what we don't want). What do we want? My thoughts: the blog, like the rest of the documentation, should aim to be recognised as authoritative and high quality. Better to not write a blog post if it would be just average or a repeat of what can be read elsewhere. The content should be obviously correct. By this I mean a statement should be easy to read, easy to understand and easy to come to the conclusion of the statement. As opposed to statements which are hard to parse, might make you conclude A when in reality you should be concluding B and the author really rather you conclude A. I guess a phrase for this kind of language is "technically correct, but misleading". Like scikit-learn the blog should cover things which are well established and "old". They might be less well known or been forgotten, but it shouldn't be "trail blazing". Like a wikipedia article there should be sources you can link to to back up your claims and give the reader more details, etc. For me it would be fine to link to code that you can run as "sources" for say benchmarks. You wouldn't have to publish your results elsewhere first. You should stick to the spirit of these guidelines, sticking to the "letter of the law" is not enough. This means there will be an element of human judgement. Clearly mark "paid content"? I am a bit less sure about this one. I know in a lot of newspapers, magazines, youtube, instagram, etc content people are obviously and not so obviously rewarded for talking about something. This can be a ski resort inviting a reporter and paying for the trip, with the hope that they write about their time in the resort, a free pizza oven sent to a cooking YouTuber, straight up paid for advertising in a newspaper. I don't think there is a fundamental problem with this kind of "paid content" (using a broad definition here). And you could argue that it is covered under "take care that the reader understands" from above. Depending on how big the influence is something very prominent like "I work for company X and this post is about things we make, so take what I say with a grain of salt" or a note at the end "Project Y invited and paid for me to travel to their sprint" is appropriate. As long as the facts are with you, there seems to be very little downside to declaring your possible conflicts. It might even make you more credible. |
||
|
||
Reproducibility requirements | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In the scikit-learn documentation, we ensure that our examples are reproducible and can | ||
be executed by using our continuous integration. When it comes to the scikit-learn blog | ||
post, it is not possible (or rather difficult) to have the same level of integration. | ||
|
||
However, we should ensure that the given examples or code snippets are reproducible by | ||
the readers. We therefore recommend the following: | ||
|
||
- Provide a link to a repository where the code or notebook is available that is used | ||
as a baseline for the blog post. | ||
- The repository should contain a system to reproduce the environment (e.g. | ||
`requirements.txt`, `environment.yml`, or `pixi.toml`). | ||
- If possible, a continuous integration should make sure that the code or notebook can | ||
be executed. We understand that this step is sometimes impossible due to limit of | ||
resources. | ||
|
||
References and Footnotes | ||
------------------------ | ||
|
||
.. [1] Each SLEP must either be explicitly labeled as placed in the public | ||
domain (see this SLEP as an example) or licensed under the `Open | ||
Publication License`_. | ||
|
||
.. [2] `scikit-learn-contrib repository <https://github.com/scikit-learn-contrib>`__ | ||
|
||
.. _Open Publication License: https://www.opencontent.org/openpub/ | ||
|
||
Copyright | ||
--------- | ||
|
||
This document has been placed in the public domain. [1]_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening the discussion.
If I can interject, some content is also exposed on external website but only linked on the blog post. This is for instance the case of the series on performance improvements.
I think it would be better to have this part of scikit-learn's blog post directly.
This was not done originally because of lack of time and for convenience (it was easy for me to publish it on my website and to iterate on it), but I do no mean to be the sole owner of the knowledge shared there.
Should such external content be discussed as part of this SLEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are faster than me writing the SLEP :).
For this particular case, this is not even a question since this already linked to some internal of scikit-learn. So it would go to facto in the scikit-learn blog if you ask me.
But let's imagine that this is a topic that is related to scikit-learn but somehow outside of the library itself. Then, I would consider this case as part of the SLEP. The guidelines should answer to some questions with some extend, notably if this is eligible for inclusion.
Note, that my first thought here with this SLEP was more on: someone has a shiny compatible package and search some visibility; is it possible to advertise it and if so, what are the couple of requirements from our side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I was unsure my remark was relevant with regards to the subject of this SLEP (and it is not based on your remark).
Should I open an issue or PRs directly to integrate its content in scikit-learn's blog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I would find it relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have opened scikit-learn/blog#191.