Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better college attendance? #9

Open
theSage21 opened this issue Jul 11, 2017 · 33 comments
Open

Better college attendance? #9

theSage21 opened this issue Jul 11, 2017 · 33 comments

Comments

@theSage21
Copy link
Member

It would be cool to pull data from the college site and perform attendance analysis on it for public display on the Compsoc site.

@libhide
Copy link
Member

libhide commented Jul 11, 2017

+1.
@deadbeatfour, thoughts?

@abhishekchak52
Copy link

I like the idea.
Do you mean run a script and publish a post every month?
I guess what I'm asking is should it be manually updated?

@theSage21
Copy link
Member Author

Nope nope. We integrate a page where it does a weekly pull on it's own. Automated end to end.

@abhishekchak52
Copy link

That would be very cool indeed. Let's do this

@theSage21
Copy link
Member Author

Awesome. A new branch then? @libhide how about it?

@abhishekchak52
Copy link

Branch from the level-up branch please

@libhide
Copy link
Member

libhide commented Jul 12, 2017

@theSage21 I'm so in! How are we going about this then?

@theSage21
Copy link
Member Author

theSage21 commented Jul 12, 2017

Let's set up a todo list of sorts. Here's my proposal @libhide @deadbeatfour. Perhaps @utk-dev would want to join in? I don't know anyone else from college other than you guys who are also on Github.

  • How to get the data?
    • Server or Webpage? (server seems more efficient since we can set up a cron or something)
  • What is missing in the college attendance services?
    • A last updated date. That would be useful.
  • What all do we want to show in the analysis?
    • Bunk meter to measure how many classes can you safely bunk?
    • Average attendance of batches?
    • What do people care about in their attendance?
  • Personalization after some time?
    • People can create accounts and manage their attendance?
    • They might want to create priorities for classes. "I don't want to miss this class at all!"

@theSage21
Copy link
Member Author

I remember doing the data extraction for my analysis of college attendance. I can take that up. Could you guys make up a list of what we want to show? @libhide @deadbeatfour

@anshulabraham
Copy link

anshulabraham commented Jul 13, 2017 via email

@theSage21
Copy link
Member Author

theSage21 commented Jul 13, 2017

@anshulabraham done. Available at https://github.com/theSage21/notebooks/tree/master/support. The SSCattendance notebook was used to download this.

Actually, I'm thinking of keeping a history of attendance over years. That should lead to some interesting analysis too.

@ghost
Copy link

ghost commented Dec 27, 2017

@anshulabraham
They moved from SSCATTENDANCE to here

@libhide
Copy link
Member

libhide commented Dec 28, 2017

Of course they did 😝.
@theSage21 I get a deja-vu moment every time there is a new feature added to the college site.

@theSage21
Copy link
Member Author

Ha.... Ha.... Ha....

Damn. 😢 See... humaara ped kitna bada ho gaya hai bro.

@theSage21
Copy link
Member Author

Anyone working on this feature? I remember @deadbeatfour created a new repo related to this.

@libhide
Copy link
Member

libhide commented Dec 28, 2017

Doubt this needs to be worked on now that the attendance stuff is being done by Koush. We can close this issue, I guess.

@abhishekchak52
Copy link

Yeah. I am. I'm doing things in a separate repo first. Integration into the site shouldn't be a problem. The problem with Koush's implementation is VERY slow fetch times, especially for total semester attendance. It pretty much crashes towards the end of the semester. I assume he's calculating totals every time it's requested.
I know how to pull the data.
How do you suggest we store things in the backend? And since we can't filter by roll no. anymore ( data isn't available ), let's filter by name and class?

@libhide
Copy link
Member

libhide commented Dec 28, 2017

Oh, of course Koush's approach is shit. sigh.
Yeah, name and class should work. Or, maybe, we can assign students with our own IDs of some sort? Idk, thoughts?

@theSage21
Copy link
Member Author

Final purpose is to have people use this right? That means they should be able to quickly find their own attendance. Hashing the name seems ok in the front.

Plus, with SQL as the workhorse aggregating requests per call should be fast. Maybe set it up as a verrrryyyy flat table?

Name lecture month year total attended

This lets us obtain anyone's attendance with a quick filter->select->aggregate query. Sprinkle some nice JS chart rendering and projections on top of that and voila!

Or maybe I'm just getting carried away.. 😄

The real problem would be when to ping koush's service for updates? A nightly cron job should work just ok with pythonanywhere unless we're planning to move to some other service.

@theSage21
Copy link
Member Author

The idea is to get as fine grained a set of information as possible since when you have aggregated info it's very difficult to get back the fine grained info.

@abhishekchak52
Copy link

I was thinking along similar lines. So the structure of the table is what I'm stuck at. 1st and 3rd years have 4 papers per semester. 2nd years have 5. The best we can do is get lecture/labs/tuts attended and held numbers for each paper. 6 columns per paper. And then tutorial groups are separated out by letter like A, B, C and so on. So individual attendance analysis shouldn't be too difficult. I'm thinking about aggregate analysis, of we want to do that. Per paper. Also arts papers don't have labs so empty column there. Stuff like that. So should we just make a table like

Subject 1 name | LD | LA | PH | PA | TH | TA | Subject 2 name | ...

Can this be done more elegantly?

@abhishekchak52
Copy link

Also, if we want to move to heroku, I think it's a good idea to port the website to the latest libraries? Django 2.0 is out. So I'm thinking in January, we'll rewrite the entire site from the ground up. Basically make sure a feature works before moving onto the next one?

@theSage21
Copy link
Member Author

That sounds right. When recording datasets we try to make them as flat as possible so that later analysis is not hindered. Slight change to the structure then.

name paper year month LD LA TH TA PH PA

We enforce uniqueness for the (name, paper, year, month) tuple. For multiple papers being taken by a person you have multiple rows with the same person's name. This resolves the variability in the number of papers per course. Each row represents a student's attendance in a particular paper delivered in a certain timeframe instead of each row representing a single student's entire record for a particular time period.

To facilitate search we can have the name column in this table point to a different table containing people's names/courses/food habits etc.

@theSage21
Copy link
Member Author

The more I use heroku the more I like it over pythonanywhere. PyAnywhere's appeal now to me is the ability to share a terminal over network. For web serving we could pick up heroku as it takes away a lotttt of pain regarding balancing and hosting.

@ghost
Copy link

ghost commented Dec 28, 2017

@theSage21 how about automating the process of deployment? Right now we have to do it manually everytime master branch gets something new.

@theSage21
Copy link
Member Author

With heroku you can set up deploy hooks. No issue there. Pythonanywhere we can set up a cron job which pulls at midnight everyday from the master.

@abhishekchak52
Copy link

Maybe we could also set up Travis CI with some tests. Deployment on heroku is just so much more convenient. Also, I was thinking we could set a scheduled job using python itself? Something like celery with RabbitMQ? Thoughts? @theSage21

@theSage21
Copy link
Member Author

Scheduled job for deployment? I think heroku auto deploy would be a better way for that. Clean and more explicit.

@abhishekchak52
Copy link

I meant let's use celery for pulling the data nightly. Heroku auto deploys got me hooked.

@theSage21
Copy link
Member Author

Ah accha. That sounds good.

@theSage21
Copy link
Member Author

I've lost touch with Django to be honest. You guys keeping up with it? I usually use bottle now.

@abhishekchak52
Copy link

I used bottle for a project recently and it's just so easy. Haven't written django code in months. But I'll dive back in soon.

@libhide
Copy link
Member

libhide commented Dec 28, 2017

Auto-deploy should not be an issue with both PyAnywhere and Heroku. That said, Heroku is definitely the way to go. Migrate ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants