Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify number of references on Authorities Page vs. Search Results #4865

Open
elisa-a-v opened this issue Dec 30, 2024 · 14 comments
Open

Clarify number of references on Authorities Page vs. Search Results #4865

elisa-a-v opened this issue Dec 30, 2024 · 14 comments
Assignees

Comments

@elisa-a-v
Copy link
Contributor

elisa-a-v commented Dec 30, 2024

Noticed after doing #4134

On the “Authorities” page for a given docket, we display the references to opinions in the docket's documents. However, each "authority" in the list is not actually an opinion, but an element of a Docket's authorities:

    @property
    def authorities(self):
        """Returns a queryset that can be used for querying and caching
        authorities.
        """
        return OpinionsCitedByRECAPDocument.objects.filter(
            citing_document__docket_entry__docket_id=self.pk
        )
class OpinionsCitedByRECAPDocument(models.Model):
    citing_document = models.ForeignKey(
        RECAPDocument, related_name="cited_opinions", on_delete=models.CASCADE
    )
    cited_opinion = models.ForeignKey(
        Opinion, related_name="citing_documents", on_delete=models.CASCADE
    )
    depth = models.IntegerField(
        help_text="The number of times the cited opinion was cited "
        "in the citing document",
        default=1,
    )

This means an opinion cited by multiple documents in the same docket will be listed multiple times, like the authorities in this docket. Notice Norman v. United States, 429 F.3d 1081 (Fed. Cir. 2005) being listed 14 times, where all links point to the same search which yields 14 docket entries.
Image
Image

This also means the counts are confusing. Each "depth" represents only the number of times the opinion is cited in one citing document, not the total number of references across all documents in the docket. We could:

  1. Make the counts less confusing by making the authorities list include the citing document and changing the text to something like 5 references to <CITED_OPINION> in <CITING_DOC>, with a link to the citing doc.

  2. Group authorities by opinion, then aggregate the depth. This way we would get the summary of all references to a given opinion in a given docket, instead of having the same opinion repeated several times. This does sound like it could be a lot more work, but potentially more informative if we could also include a sub-list of all citing docs with their links.

    So instead of:

    • 5 references to Norman v. United States, 429 F.3d 1081 (Fed. Cir. 2005)
      Court of Appeals for the Federal Circuit Nov. 18, 2005
    • 5 references to Norman v. United States, 429 F.3d 1081 (Fed. Cir. 2005)
      Court of Appeals for the Federal Circuit Nov. 18, 2005
    • 2 references to Norman v. United States, 429 F.3d 1081 (Fed. Cir. 2005)
      Court of Appeals for the Federal Circuit Nov. 18, 2005

    It would be:

    • 12 references to Norman v. United States, 429 F.3d 1081 (Fed. Cir. 2005)
      Court of Appeals for the Federal Circuit Nov. 18, 2005
      • 5 references in citing doc 1
      • 5 references in citing doc 2
      • 2 references in citing doc 3
  3. Update the search results to display the depth of treatment when a cites query is made, so that each result shown says something like, "22 references to case XYZ".

Whatever the option, we should make sure that the opinions' authorities page doesn't get broken since authorities_list.html is used in both docket and opinion authorities, which are not OpinionsCitedByRECAPDocument but OpinionsCited so we don't always have the same attributes available.

@mlissner
Copy link
Member

Fun, so this issue is indeed worse than we thought. I didn't realize that we're repeating the same case many times in the list of authorities. That's not great.

The easiest thing is to aggregate on the depth and have it say something like:

3 filings make 14 references to XYZ

We don't need to say which filings do that on this page, so we can spare ourselves that nested layout you describe.

Aggregating the depth should be easy. I can't think offhand how to get the filing count, but I'm guessing it's not too hard either and can be done at the query-level as well.

If we do the above, I think that fixes the confusion issue too, since it says the number of documents and then when you click on it, that's how many show up in the search results.

One last thing: We're introducing a second count to this page. Currently, it's ordered by the citation depth, but I think if we make this change, we should order by the number of filings citing a document instead. I guess a later feature could be to allow users to choose which ordering they prefer.

@mlissner mlissner moved this to Backlog Jan 13 - Jan 24 in Sprint (Web Team) Dec 31, 2024
@mlissner mlissner moved this to CourtListener Backlog in Volunteer backlog Jan 4, 2025
@tactipus
Copy link
Contributor

tactipus commented Jan 7, 2025

lol sup, i'm gonna look at this soon

@mlissner
Copy link
Member

@tactipus, we're planning to fix this in our coming sprint, which starts Monday. Are you still thinking about helping with this one?

@tactipus
Copy link
Contributor

@tactipus, we're planning to fix this in our coming sprint, which starts Monday. Are you still thinking about helping with this one?

yeah. just wanted to know if elisa still wanted to do paired programming so it can be scheduled. i'm looking at the Docket class rn

@mlissner
Copy link
Member

Up to you guys. I'll get out of the way. :) @elisa-a-v?

@elisa-a-v
Copy link
Contributor Author

elisa-a-v commented Jan 10, 2025

Oh I'd love to! @tactipus if you're down, let me know when would be a good time for you and I can probably adjust my schedule :)

@tactipus
Copy link
Contributor

@elisa-a-v I am good after 17:00 EST, usually. i can also do 13:00 to 16:00 EST

@elisa-a-v
Copy link
Contributor Author

@tactipus that's great, 13:00 EST generally works for me as well, but I think we should probably move this discussion over to email so we don't keep cluttering up the issue thread 😅 my address is [email protected]—feel free to reach out!

@mlissner mlissner moved this from Backlog Jan 13 - Jan 24 to To Do in Sprint (Web Team) Jan 13, 2025
@tactipus
Copy link
Contributor

tactipus commented Jan 16, 2025

Good morning,

This comment is more a record than anything. Per our conversation yesterday, we will focus on views.py & map out the implementation @mlissner discussed. That way, we can avoid tinkering too much with the models.py.

The solution is to map out the references using an array or a query set IIRC. It was all @elisa-a-v's idea, I just took notes ;p

async def docket_authorities(
    request: HttpRequest,
    docket_id: int,
    slug: str,
) -> HttpResponse:
    docket, context = await core_docket_data(request, docket_id)
    if not await docket.ahas_authorities():
        raise Http404("No authorities data for this docket at this time")

    context.update(
        {
            # Needed to show/hide parties tab.
            "parties": await docket.parties.aexists(),
            "docket_entries": await docket.docket_entries.aexists(),
            "authorities": docket.authorities_with_data,
            
        }
    )
    return TemplateResponse(request, "docket_authorities.html", context)

@elisa-a-v
Copy link
Contributor Author

That's correct, we basically need two things:

  1. Map the authorities in the context of that view so that it's an iterable of Opinion instances instead of OpinionsCitedByRECAPDocuments, with the number of filings (how many RECAPDocuments cite this Opinion in this Docket) and the number of references (the aggregated depth of all those OpinionsCitedByRECAPDocument instances)
    • We now list Opinions instead of OpinionsCitedByRECAPDocuments so we only list each opinion once.
    • I believe the iterable could either be a QuerySet if you know your way around annotations, or simply a list. I imagine something like:
      >>> context["authorities"]
      [
          {
              "opinion": Opinion1,
              "filings": OpinionsCitedByRECAPDocument.filter(cited_opinion=Opinion1, ...).count(),
              "references": OpinionsCitedByRECAPDocument.filter(cited_opinion=Opinion1, ...).aggregate(...)["sum"],
          },
          {
              "opinion": Opinion2,
              "filings": OpinionsCitedByRECAPDocument.filter(cited_opinion=Opinion2, ...).count(),
              "references": OpinionsCitedByRECAPDocument.filter(cited_opinion=Opinion2, ...).aggregate(...)["sum"],
          },
      ]
  2. Update the template to display the info in the new format without breaking the document authorities view (they both use the same authorities_list.html template) which should probably remain unchanged:
    Image

@elisa-a-v
Copy link
Contributor Author

@mlissner I did notice a small thing when looking through the authorities_list.html template: in line 14 it says {% if authority.blocked %}, but I don't see how that's ever True. I wonder if maybe that's a mistake and it should be authority.cited_opinion.cluster.blocked instead?

<a href="{{ authority.cited_opinion.cluster.get_absolute_url }}{% querystring %}" {% if authority.blocked %}rel="nofollow" {% endif %}>

@tactipus
Copy link
Contributor

I just emailed @elisa-a-v about this. To recap that email, I suggested that we use annotate() to get counts. annotate() returns QuerySet objects, which are iterable while aggregate() provides a dictionary, which is iterable but...not as flexible I believe.

@mlissner
Copy link
Member

I wonder if maybe that's a mistake and it should be authority.cited_opinion.cluster.blocked instead?

That's certainly possible. The idea here is to tell crawlers not to waste their time crawling pages that we already know are blocked. Google and other crawlers give you a budget of links that they'll follow (called the "Crawl Budget"), so it's important not to send them to pages they can't index anyway.

I think you can get all the information you need from the queryset with something like this:

cited = (
    OpinionsCitedByRECAPDocument.objects
        .filter(citing_document__docket_entry__docket_id=4214664)
        .values("cited_opinion_id")
        .annotate(opinion_count=Count('cited_opinion_id'), total_depth=Sum('depth'))
)

That returns objects like:

{'cited_opinion_id': 9339585, 'opinion_count': 1, 'total_depth': 1}

And if you iterate over the entire thing you get results like:

In [27]: for c in cited:
    ...:     print(f"citation_count: {c["opinion_count"]}; total depth: {c["total_depth"]}")
    ...
    citation_count: 1; total depth: 1
    citation_count: 4; total depth: 35
    citation_count: 1; total depth: 1
    citation_count: 1; total depth: 1
    citation_count: 1; total depth: 2
    ...

Does this work as a solution (if you pick the fields carefully, do the prefetches, etc)?

@elisa-a-v
Copy link
Contributor Author

That's certainly possible. The idea here is to tell crawlers not to waste their time crawling pages that we already know are blocked. Google and other crawlers give you a budget of links that they'll follow (called the "Crawl Budget"), so it's important not to send them to pages they can't index anyway.

I understand, well from what I saw I don't think the authority actually has any blocked attribute so I'd bet it's not working, unless I'm missing something 🤔

Does this work as a solution (if you pick the fields carefully, do the prefetches, etc)?

Yes that makes a lot of sense to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To Do
Status: CourtListener Backlog
Development

No branches or pull requests

3 participants