Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekday & time of day #177

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Weekday & time of day #177

wants to merge 2 commits into from

Conversation

tokee
Copy link
Contributor

@tokee tokee commented May 29, 2018

We have two timestamps: crawl_date, which is authoritative from the crawler, and last_modified, which is extracted by Tika from the source data. This pull request adds weekday (Monday, Tuesday, Wednesday...) and time_of_day (16:44:30) to both of these fields. As Solr has no concept of time without a full date, the time_of_day is prefixed with 0001-01-01. Indexing this ways means that date math works as expected. An alternative way would be to have second_of_day or something like that, but that requires translation from the user interface.

The use-cases for last_modified are fairly obvious: e.g. with this, it is possible to find images taken from Friday evening to Saturday morning.

It is more dubious for crawl_date as that timestamp does not say much about when the material was created. It might be useful for debugging crawls? I would appreciate input on whether the crawl_date-additions should be included or not.

This pull request closes #161.

@tokee tokee self-assigned this May 29, 2018
@tokee
Copy link
Contributor Author

tokee commented Aug 21, 2018

I just realized that weekday is highly locale dependent. In this pull request it is fixed at UTC, but that would have to be configurable to be really usable.

Or (the heavy option): Index 24 different terms (one for each 1-hour timezone offset) in the field for each document.

@tokee tokee added the question label Aug 21, 2018
@anjackson
Copy link
Contributor

I think this is a good step to help explore this kind of thing. If the locale is made configurable, that should make it usable by folks outside of GMT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index weekday
2 participants