-
-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lib): Enhances get_parties_from_case_name
method
#4971
feat(lib): Enhances get_parties_from_case_name
method
#4971
Conversation
192f1e2
to
7a79fa8
Compare
This commit enhances the get_parties_from_case_name method by removing common strings from bankruptcy case names before extracting party information. This improves the accuracy of party identification. - Adds a new separator character to the list of valid separators for identifying parties in bankruptcy cases.
7a79fa8
to
e31d9f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in the approach, but I think I might have explained the goal poorly. I made a handful of suggestions to the tests should help. Sorry for the miscommunication.
"case_name": 'Saucedo and Green Dream International, LLC <b> <font color="red"> Case Consolidated under 23-03142 </font> </b>', | ||
"output": ["Saucedo", "Green Dream International, LLC"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one looks like it's actually wrong, but not sure we can do much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also from #4802 (comment) it seems that there might be cases where the words In
or of
are not part of the party names.
For instance something like:
In re: Advantage LLC
Is this possible in bankruptcy?
If so, in these cases, the indexed party would be In re: Advantage LLC
, which doesn't seem correct. In district courts, we simply ignore anything that doesn't have a valid separator, but here, it seems more complicated since we're performing cleanup before splitting parties.
Perhaps, in these cases, we could completely ignore anything that contains In
or of
? Or we could look for examples of these case names and check if we can identify a common pattern for cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible in bankruptcy?
I looked into how often "In re" appears in case names. After refining the dataset to include more records (2 million total records with RECAP source or a derived one) and searching, I found only 36 instances (0.0018%) where a case name begins with "In re." A few examples are:
I think we should add a step to the cleanup process that removes "In re" before we try to figure out the party names.
For reference, here's a CSV file containing these 36 instances:
@albertisfu Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yeah, it seems like this type of case name is not very common.
I think we can try removing in re
or in re:
before splitting the parties; however, that would also require removing other common terms that seem to be typical in this type of case name structure but do not appear to be part of the parties, such as:
Matter of
Receivership of
Appearances of
Not sure if it's possible to compile a list of all potential terms that might appear in a bankruptcy case name but are not part of the parties.
Additionally, some case names don't seem to contain parties at all.
In re Matter of Ascendium Replacement Filings
In re: Proceedings to Review Attorney Usage of CM/ECF Filing Credentials
In re: Proceedings to Enforce Fed.R.Bankr.9036
In Re: Proceedings to Enforce Fed. R. Bankr. P. 9036 as to various high-volume paper-notice recipients relating to cases pending within the District of Connecticut.
In re Matter of Proof of Claim Replacement Filings
In re Appointments and Reappointments of Ohio Sout
In these cases, if we remove "in re," it might not be correct to treat the remaining text as a party.
Another option is to simply ignore any case name that contains in re
or in re:
and not index parties from those cases. Perhaps @mlissner has an opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For bankruptcy, I'm fine with just not indexing anything that starts with in re
or in the matter of
, etc.
They're very helpful. I'll update the helper function based on your feedback.
I apologize, I just reread the GitHub issue and it was all there. I was just trying to use the same approach as for district cases. |
a06f20d
to
ca278cd
Compare
@mlissner I've extracted the case name cleaning logic into a dedicated method. This incorporates your feedback and resolves the associated test now passes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple little things. Let's see if Alberto catches anything else since he's been in this area of the code most recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ERosendo this looks great! I only left some comments that might be worth confirming.
"case_name": 'Saucedo and Green Dream International, LLC <b> <font color="red"> Case Consolidated under 23-03142 </font> </b>', | ||
"output": ["Saucedo", "Green Dream International, LLC"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also from #4802 (comment) it seems that there might be cases where the words In
or of
are not part of the party names.
For instance something like:
In re: Advantage LLC
Is this possible in bankruptcy?
If so, in these cases, the indexed party would be In re: Advantage LLC
, which doesn't seem correct. In district courts, we simply ignore anything that doesn't have a valid separator, but here, it seems more complicated since we're performing cleanup before splitting parties.
Perhaps, in these cases, we could completely ignore anything that contains In
or of
? Or we could look for examples of these case names and check if we can identify a common pattern for cleanup?
field_value = get_parties_from_case_name( | ||
main_instance.case_name | ||
field_value = ( | ||
get_parties_from_case_name_bankr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're adding a special method for splitting parties in bankruptcy cases both here and in prepare_parties
, I’d suggest adding a test case similar to those in test_index_party_from_case_name_when_parties_are_not_available
to confirm that the correct method is selected for bankruptcy.
I think two additional test cases should be enough:
- Splitting parties from the
case_name
when creating a bankruptcy docket (which will use the logic inprepare_parties
). - Splitting parties when updating a
case_name
(which will use the logic indocument_fields_to_update
).
Currently, in test_index_party_from_case_name_when_parties_are_not_available
, the factory docket_with_no_parties
comes from a bankruptcy court. To differentiate the method get_parties_from_case_name_bankr
, it would be necessary to change the court in this factory to a district court and create a new factory for bankruptcy. You could rely on the expected parties for the assertion, considering that get_parties_from_case_name_bankr
performs some cleanup, or simply confirm that the correct method is being called using a mock. The same approach can be applied for the case_name
update test case for bankruptcy.
I don’t think it'd necessary to replicate the rest of the assertions from test_index_party_from_case_name_when_parties_are_not_available
for bankruptcy since they share common logic that hasn’t changed.
This commit introduces a new helper function, `is_bankruptcy_court`, which checks if a given court ID corresponds to a bankruptcy court.
0e94712
to
7442b1e
Compare
5728ca7
to
41ea9cd
Compare
This commit introduces logic to handle bankruptcy case names that begin with "in re" or "in the matter of". These types of case names typically don't contain party information in the standard format, so the function now returns an empty list in these cases. This prevents incorrect parsing and ensures more accurate extraction of party names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ERosendo changes look great!
Just one last thing. While running a test, I noticed a case in my database with the title:
Toby Edward Torres - Adversary Proceeding
I was surprised that these words didn't appear in your sample. Maybe the reason is that there are only about 5,000 cases with this title.
The "- Adversary Proceeding" part is not actually in the docket title; it is added by Juriscraper.
So, I think we should remove "- Adversary Proceeding" before extracting the parties.
I also just realized that Mike mentioned this in one of his earlier comments.
Additionally, while reviewing Juriscraper, I noticed that some cases have the title "Unknown Case Title" I think we should ignore these completely.
Hey @albertisfu Thanks for catching that.
I investigated the development database and found 3,000 instances of "Adversary Proceeding." However, my random dataset only contained 20 instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @ERosendo this looks great now.
This PR enhances the
get_parties_from_case_name
method by removing common strings from bankruptcy case names before extracting party information. This improves the accuracy of party identification.This PR also adds a new separator character to the list of valid separators for identifying parties in bankruptcy cases.
Fixes #4802