Nov 10 After reviewing the draft reports and our subsequent meetings with the project teams, here are some general points that all teams must keep in mind while writing their reports.
- Use complete sentences with correct grammar and appropriate words. Simpler is often better. This does not mean one should use informal language. Also, pay attention to punctuation (including spaces before references), and the formatting of your bibliography section.
- Stick to the structure that was provided. You may create subsections, but do not create additional sections (except References).
- Make sure you include all the important technical details. Could someone read your report and reproduce your (process and) results? The answer should be yes.
- When discussing the data collection, data preprocessing and training, include descriptive statistics (number of queries, documents, indexing time, training/validation/test splits, etc.).
- Unless you use a standard retrieval method like BM25, you should include the ranking formula and discuss the parameter settings. For machine learning approaches, you can simply state the algorithm, but you need to detail the features used and how your training set was constructed. (If you use implementation of a standard method from a given toolkit/library, state the version and whether you used the default parameter settings or deviated from that.)
- Throughout the report, explain what choices you made and why.
- Any claims about performance should be substantiated either with references to literature or to your own results.
- Make sure to include a results overview table, which should minimally contain the baseline and advanced methods as rows and the official evaluation metric(s) for the task as columns. Make sure to state the evaluation metrics and use the official evaluation scripts.
You’ll need to submit the code along with your report. It can either be a zip file that you send to [email protected], or you can share your GitHub repository with us (with @kbalog and @tlinjordet). We only need your code, not the data files. Having a README explaining the contents of your zip file/repository will contribute to the Code quality aspect of the grade.
The objective of project work is to apply the knowledge gained in the course, in a group setting, to a selected (open) information retrieval problem. Specifically, you'll first need to establish a reasonable baseline, then develop one or multiple advanced methods aiming to improve over the baseline. Using a standard test collection, you'll need to experimentally compare the baseline and advanced solutions.
Each group is allocated a dedicated 15mins slot (during what would normally be lecture hours) to discuss their progress. It is expected that at least one member of the group is present in person.
Team | Slot |
---|---|
Team-001 | Monday 14:15-14:30 |
Team-002 | Monday 14:30-14:45 |
Team-003 | Monday 14:45-15:00 |
Team-004 | Monday 15:00-15:15 |
Team-005 | Monday 15:15-15:30 |
Team-006 | Monday 15:30-15:45 |
Team-007 | Monday 15:45-16:00 |
Team-008 | Tuesday 14:15-14:30 |
Team-009 | Tuesday 14:30-14:45 |
Team-010 | Tuesday 14:45-15:00 |
Team-011 | Tuesday 15:00-15:15 |
Team-012 | Tuesday 15:15-15:30 |
Team-013 | Tuesday 15:30-15:45 |
- Nov 6 12:00 Delivery of draft project report for feedback. The report is to be submitted in PDF format to [email protected] with "Team-xxx draft project report" as subject. Feedback will be given during the Monday/Tuesday meeting the week after.
- Nov 16 16:00 Delivery of final project report. The report is to be submitted in PDF format to [email protected] with "Team-xxx project report" as subject. The file should also be named "Team-xxx-...-report.pdf", where xxx should be changed to fit your team ID as provided, and the part shown as ellipses can be chosen freely.
You may choose one from the following projects:
- MS Marco Document re-ranking
- Given a candidate top-100 document as retrieved by BM25, re-rank documents by relevance. This task has also run as part of the TREC 2019 Deep Learning track.
- Document collection: MS MARCO document corpus (3.2M documents, 22GB)
- Resources
- TREC 2019 Conversational Assistance
- Given a series of conversational utterances, identify relevant passages for each user utterance that satisfies the user's information need.
- Document collection: MARCO Ranking passages, Wikipedia, and News (Washington Post)
- Resources
- Semantic Answer Type Prediction
- Given a question in natural language, the task is to predict type of the answer using a set of candidates from a target ontology (DBpedia).
- Document collection: DBpedia (dump 2016-10) and/or Wikidata
- Resources
- You'll need to hand in a report using the specified project template. The page limit for the report is 4 A4 pages (in double column format).
- You'll also need to submit your code and the generated output files so that we can verify your solution and results.
- There are no restrictions on the programming language, toolkits/libraries used, etc. with the default choice being Python and the packages used in the exercises/assignments.
- You are free to use any external data collections or resources (e.g., pre-trained embeddings) in addition to the 'official' problem datasets.
- Some of the problems involve working with large datasets. If you need server access, you'll need to contact the Unix system administrator.
- While you are expected to work independently as a group, you'll have the possibility to get feedback on your ideas in a regular weekly basis. For each group, there will be a weekly dedicated 15 mins slot to discuss the project with the lecturer. Also, there will be an internal intermediate deadline for getting feedback on the draft of your report.
While the following is merely a recommendation, it may help you to stay on course.
- Week 1:
- Understand the problem (and ideally complete the corresponding sections of the report)
- Preprocess and index the document collection
- Implement a baseline method
- Week 2
- Run and evaluate the baseline method
- Implement an advanced method
- Write up your progress so far and submit the report for feedback
- Week 3
- Run and evaluate your advanced method
- Experiment with additional methods or refinements of your advanced method
- Finalize your report
- Use the ACM two-column template for the report.
- It is highly recommended that you use LaTeX. The ACM template is also available on Overleaf.
- NB: Include the Team ID assigned to you in the report title or subtitle.
Structure your paper according to the following sections:
- Introduction - Explain the context of the problem that you are tackling, including references to relevant literature.
- Problem Statement - Formalize the task (in terms of input and output) and specify important details about the data collection.
- Baseline Method - Explain what you are taking as your baseline method, as well as why this is a reasonable baseline, and why you are making specific implementation choices.
- Advanced Method - Explain what you are taking as your advanced method(s), as well as why this is a promising attempt to outperform the baseline method, and why you are making specific implementation choices.
- Results - With tables and graphs, make a clear, concise, and digestible presentation of the data produced by your experiments. This includes describing the key facts and trend from your results.
- Discussion and Conclusions - Summarize and discuss different challenges you faced and how you solved those. Include interpretations of the key facts and trends you observed and pointed out in the Results section. Which method performed best, and why? Speculate: What could you have done differently, and what consequences would that have had?
Project submissions will be graded with respect the following categories:
Category | Points |
---|---|
Problem understanding | 5 |
Baseline method | 10 |
Advanced method(s) | 15 |
Report | 15 |
Code quality | 5 |
Total | 50 |
The evaluation will not be automated like in the assignments, and therefore it will primarily proceed based on qualitative categories, rather than operationalized criteria. On the one hand, this means less certainty for you regarding what level of quality is good enough for a certain grade. On the other hand, if your work is truly excellent in some aspects within a category, this can make up for more deficient aspects in the same category. In short, do your best and the evaluation will look out for opportunities to justify awarding points.
Nevertheless, keep in mind that these key aspects of each grading category are some of the ways your project submission may score points:
- Problem understanding
- Demonstrating your understanding of the chosen project and associated task.
- Clearly explaining the problem at hand, and the challenges it may entail.
- Identifying main families of approaches developed for the task at hand (a literature review).
- Baseline method
- Selecting a sensible baseline, implementing and evaluating it experimentally.
- Advanced method(s)
- Selecting an interesting or performant advanced method, implementing and evaluating it experimentally.
- Motivating, designing and implementing one or multiple advanced approaches.
- Either extending the baseline,
- Employing a completely different approach found in the literature, or
- Designing a method of your own.
- Clarity of argumentation
- Creativity
- Demonstrating understanding of the advanced methods
- Extensiveness of the experiments
- Overall performance (improvements over baseline).
- Report
- Clearly explaining the motivation or rationale behind the choices made.
- Documenting key technical decisions to support future reproducibility.
- This requires details with accessible wording and structure in a way that your results are reproducible based on the provided description.
- Producing insight into the process through analysis and discussion of results.
- Effectively using visual tools, such as illustrations, plots and tables to support and communicate your findings.
- Code quality
- Clearly structured and readable code.
- Readability includes sensible variable/method naming conventions and adding docstrings/comments where necessary.