Import from bean-extract / beancount.ingest? #23

wadabum · 2019-07-21T22:53:41Z

So, I´ve spend hours writing a bean-extract compatible parser using beancount.ingest.importer.ImporterProtocol

Can one somehow string those things together?

I do NOT mean parsing a written-our .beancount file,
but "run bean-extract" and use that result as data_source - similar to Fava?

something along the lines of the usual:

# beancount.config
from importers import my_importer
configured_my_importer = my_importer.Importer(
  currency='USD',
)
CONFIG = [
  configured_my_importer,
]
# ....
# ....
# ....
    data_sources = [
        dict(
            module='beancount_import.source.beancount-ingest',
            class=configured_my_importer,
        )
# or
    data_sources = [
        dict(
            module='beancount_import.source.beancount-extract',
            config='path/to/beancount.config', # parse&read CONFIG from there
            dirs='documents/',
        )

or however that´s gonna map out... ;-)

(think that could proxy-close #18 as well...)

The text was updated successfully, but these errors were encountered:

jbms · 2019-07-23T04:32:24Z

There isn't any existing way to do this, but some sort of convergence with beancount.ingest is something I'm very interested in exploring. However, there are some challenges to overcome:

beancount.ingest is based on generating partial transactions (that don't necessarily balance) with missing postings. beancount-import requires full transactions, with Expenses:FIXME postings to indicate unknown accounts (a single Expenses:FIXME posting is allowed to correspond to multiple unknown postings). This is simple to handle, though --- just generate Expenses:FIXME postings to balance the transaction.
With just step 1, you could have a partial integration with beancount-import, but to get the automatic tracking of which transactions have already been imported, and automatic classification of unknown accounts, the beancount-import data source needs to provide more information than is available in the beancount.ingest model.

dumbPy · 2020-07-26T06:49:26Z

How about exporting this required info in meta?
I am currently exporting transaction ref/ cheque No. and jumbled description into meta.
ML models that can use this to predict postings and use your awesome UI for manual corrections would be great. currently exploring smart_importer but it doesn't have a way to fix interactively like this project.
I too want this to work.

jbms · 2020-07-26T18:20:49Z

If you specified which metadata fields indicate a unique transaction id, and which metadata fields to use for automatic classification, then I think it would work.

dumbPy · 2020-07-26T18:37:58Z

Do you have any plans for defining API for the UI part?

I am currently using smart_importer and I think having an interactive UI makes a lot of difference.
Having a clearly defined API would mean making UI compatible with all the future ML models.
My statements have a lot of numbers and plain text and comments and I would surely love to try some simple NLP models like Spacy's classifier.

I personally loved the idea of being able to make some manual entries and later merge them effortlessly with the statement.

jbms · 2020-07-26T18:59:11Z

I guess you want to use the beancount-import UI with the account prediction provided by smart_importer and ingesters defined using the benacount.ingest framework?

I don't have plans to try to make that work myself, but if you're interested in making that happen, it does sound like it could be very useful, and I'm happy to offer advice and would be interested in merging it into beancount-import, for example.

The datasource API provided by beancount-import is already sufficient, I think, for integrating with beancount.ingest. Currently there is no API for plugging in a different account prediction mechanism, but that could be added.

Or maybe it would make sense for beancount-import to merge into smart_importer entirely, but not sure if the smart_importer authors are interested.

Zburatorul · 2020-07-27T05:47:01Z

Correct me if I'm wrong, but it sounds like the stumbling block of this program (ingester->smart_importer->beancount-import UI) is that the prediction/ML part needs to be factored out in beancount-import?
Is there any other issue related to matching existing and currently imported transactions? It looks to me like smart_importer is capable of detecting duplication.

Is it possible then to get the smart_importer to do all the deduplication and account prediction, while simply turning off this functionality in beancount-import?

Your posts from two years ago seem relevant:
https://www.mail-archive.com/[email protected]/msg01628.html
https://www.mail-archive.com/[email protected]/msg01630.html

jbms · 2020-07-27T18:51:47Z

Currently beancount-import relies on an account name starting with Expenses:FIXME to indicate an "unknown" account, which has 3 effects:

when matching to existing transactions in the journal, such an account can match any account name
automatic prediction may be done
the user may manually enter the correct account name in the UI

Individual beancount-import datasources are already responsible for doing deduplication, so it is no problem for smart_importer to do that. To let smart_importer perform the automatic prediction, while still preserving effects (1) and (3), beancount-import could be modified to support an extra metadata key on postings that would indicate the predicted account while still allowing the account to be considered "unknown".

This would be a pretty simple change to make, e.g. in _get_unknown_account_predictions in reconcile.py.

Zburatorul · 2020-07-28T19:23:59Z

@dumbPy, are you interested in trying to get your smart_importer to work with beancount-import along the lines of #60 ?
You'd have to make it add two meta field to each predicted posting.

addisonklinke · 2023-06-20T05:22:46Z

@jbms Individual beancount-import datasources are already responsible for doing deduplication

Is this still accurate?

If I understand correctly, that means an ImporterProtocol subclass needs to confirm each transaction it generates isn't a duplicate of an existing entry in the transactions.beancount file?

jbms · 2023-06-20T05:27:06Z

The generic importer handles deduplication based on (account, date, units, narration), so individual ImporterProtocol subclasses should not need to do that.

beancount-import/beancount_import/source/generic_importer_source.py

Line 122 in b8a7806

return (self.account,

addisonklinke · 2023-06-20T14:15:36Z

Hmm is there an option (aside from subclassing the source) to avoid requiring the narration match? Usually I tidy these up in the import UI, so what's in the transactions file is more concise and/or better formatted than the raw importer output

jbms · 2023-06-20T17:38:12Z

There isn't currently any such option, but I agree that matching on narration is problematic. Potentially you could match on just date, account, and amount alone. But it would be better to store the original narration as a metadata field, so that it can be used for training and for deduplication

addisonklinke · 2023-06-20T17:40:26Z

Yes currently I'm storing the original in the source_desc metadata field. I think it'd make more sense to use that as part of the deduplication key than the final narration

Zburatorul mentioned this issue Jul 27, 2020

Towards making smart-importer and beancount-import more cooperative. #60

Open

dumbPy mentioned this issue Aug 2, 2020

Add generic importer source #62

Merged

jbms closed this as completed in #62 Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import from bean-extract / beancount.ingest? #23

Import from bean-extract / beancount.ingest? #23

wadabum commented Jul 21, 2019 •

edited

Loading

jbms commented Jul 23, 2019

dumbPy commented Jul 26, 2020

jbms commented Jul 26, 2020

dumbPy commented Jul 26, 2020 •

edited

Loading

jbms commented Jul 26, 2020

Zburatorul commented Jul 27, 2020 •

edited

Loading

jbms commented Jul 27, 2020

Zburatorul commented Jul 28, 2020

addisonklinke commented Jun 20, 2023

jbms commented Jun 20, 2023

addisonklinke commented Jun 20, 2023 •

edited

Loading

jbms commented Jun 20, 2023

addisonklinke commented Jun 20, 2023

Import from bean-extract / beancount.ingest? #23

Import from bean-extract / beancount.ingest? #23

Comments

wadabum commented Jul 21, 2019 • edited Loading

Can one somehow string those things together?

jbms commented Jul 23, 2019

dumbPy commented Jul 26, 2020

jbms commented Jul 26, 2020

dumbPy commented Jul 26, 2020 • edited Loading

jbms commented Jul 26, 2020

Zburatorul commented Jul 27, 2020 • edited Loading

jbms commented Jul 27, 2020

Zburatorul commented Jul 28, 2020

addisonklinke commented Jun 20, 2023

jbms commented Jun 20, 2023

addisonklinke commented Jun 20, 2023 • edited Loading

jbms commented Jun 20, 2023

addisonklinke commented Jun 20, 2023

wadabum commented Jul 21, 2019 •

edited

Loading

dumbPy commented Jul 26, 2020 •

edited

Loading

Zburatorul commented Jul 27, 2020 •

edited

Loading

addisonklinke commented Jun 20, 2023 •

edited

Loading