Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universe data frames improvements #8433

Conversation

jhonabreul
Copy link
Collaborator

@jhonabreul jhonabreul commented Nov 29, 2024

Description

This improves universe data frames formats and fixes some bugs:

  • Common universes like ETFs were keeping a "data" column with empty lists only; it's filtered out now.
  • Folding collection custom Python universes are normalized with the same format shown in Universe data frames normalization #8385
ETF constituents universe example:

Before:

                              data lastupdate period   sharesheld    weight
time       symbol
2020-12-01 A RPTMYV3VC57P       [] 2020-11-27 1 days    3313891.0  0.001170
           AAL VM9RIYHM8ACL     [] 2020-11-27 1 days    5331000.0  0.000247
           AAP SA48O8J43YAT     [] 2020-11-27 1 days     741951.0  0.000344
           AAPL R735QTJ8XC9X    [] 2020-11-27 1 days  172212400.0  0.062138
           ABBV VCY032R250MD    [] 2020-11-27 1 days   18911224.0  0.006139
           AAS R735QTJ8XC9X     [] 2020-11-27 1 days    1575257.0  0.000503
           ABMD R735QTJ8XC9X    [] 2020-11-27 1 days     480480.0  0.000404
           ABT R735QTJ8XC9X     [] 2020-11-27 1 days   18974188.0  0.006320
           ACN S6HA7SVNXLYD     [] 2020-11-27 1 days    6818158.0  0.005278
           ADBE R735QTJ8XC9X    [] 2020-11-27 1 days    5140614.0  0.007589
           ADI R735QTJ8XC9X     [] 2020-11-27 1 days    3937902.0  0.001679
           ADM R735QTJ8XC9X     [] 2020-11-27 1 days    5947830.0  0.000929

After:

                              lastupdate period   sharesheld    weight
time       symbol
2020-12-01 A RPTMYV3VC57P     2020-11-27 1 days    3313891.0  0.001170
           AAL VM9RIYHM8ACL   2020-11-27 1 days    5331000.0  0.000247
           AAP SA48O8J43YAT   2020-11-27 1 days     741951.0  0.000344
           AAPL R735QTJ8XC9X  2020-11-27 1 days  172212400.0  0.062138
           ABBV VCY032R250MD  2020-11-27 1 days   18911224.0  0.006139
           AAS R735QTJ8XC9X   2020-11-27 1 days    1575257.0  0.000503
           ABMD R735QTJ8XC9X  2020-11-27 1 days     480480.0  0.000404
           ABT R735QTJ8XC9X   2020-11-27 1 days   18974188.0  0.006320
           ACN S6HA7SVNXLYD   2020-11-27 1 days    6818158.0  0.005278
           ADBE R735QTJ8XC9X  2020-11-27 1 days    5140614.0  0.007589
           ADI R735QTJ8XC9X   2020-11-27 1 days    3937902.0  0.001679
           ADM R735QTJ8XC9X   2020-11-27 1 days    5947830.0  0.000929
Example of Python custom universe:
class CustomUniverseData(PythonData):

    def get_source(self, config: SubscriptionDataConfig, date: datetime, is_live_mode: bool) -> SubscriptionDataSource:
        # Define the location and format of the data file.
        return SubscriptionDataSource(
            "portfolio-targets.csv", SubscriptionTransportMedium.OBJECT_STORE, FileFormat.FOLDING_COLLECTION
        )

    def reader(self, config: SubscriptionDataConfig, line: str, date: datetime, is_live_mode: bool) -> BaseData:
        # Skip the header row.
        if not line[0].isnumeric():
            return None
        # Split the line by each comma.
        items = line.split(",")
        # Parse the data from the CSV file.
        data = CustomUniverseData()
        data.end_time = datetime.strptime(items[0], "%Y-%m-%d")
        data.time = data.end_time - timedelta(1)
        data.symbol = Symbol.create(items[1], SecurityType.EQUITY, Market.USA)
        data["weight"] = float(items[2])
        return data

qb = QuantBook()
qb.set_start_time(2015, 1, 15)

universe = qb.add_universe(CustomUniverseData, "CustomUniverse", Resolution.DAILY, lambda alt_coarse: [x.symbol for x in alt_coarse])

Sample "portfolio-targets.csv" file contents:

Date,Symbol,Weight
2015-01-13,TLT,0.6403554273566532
2015-01-13,GLD,0.2966005853128983
2015-01-13,IWM,0.06304398733044848
2015-01-14,USO,0.5873635006180897
2015-01-14,GLD,0.19451676316704644
2015-01-14,TLT,0.2181197362148639
2015-01-15,IWM,0.563722959965805
2015-01-15,SPY,0.3327542780145993
2015-01-15,TLT,0.10352276201959563

Flattened history:

qb.history(universe, 3, flatten=True)
                               weight
time       symbol                    
2015-01-13 TLT 2T            0.640355
           GLD 2T            0.296601
           IWM RV0PWMLXVHPH  0.063044
2015-01-14 USO THORT68ZZSYT  0.587364
           GLD 2T            0.194517
           TLT 2T            0.218120
2015-01-15 IWM RV0PWMLXVHPH  0.563723
           SPY R735QTJ8XC9X  0.332754
           TLT 2T            0.103523

Unflattened history:

qb.history(universe, 3, flatten=False) # or qb.history(universe, 3)
time
2015-01-13    [TLT 2T: ¤0.00, GLD 2T: ¤0.00, IWM RV0PWMLXVHP...
2015-01-14    [USO THORT68ZZSYT: ¤0.00, GLD 2T: ¤0.00, TLT 2...
2015-01-15    [IWM RV0PWMLXVHPH: ¤0.00, SPY R735QTJ8XC9X: ¤0...

NOTES:

Requires new Pythonnet 2.0.41. See QuantConnect/pythonnet#96 and QuantConnect/pythonnet#97

Related Issue

Motivation and Context

Requires Documentation Change

How Has This Been Tested?

Unit tests

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor (non-breaking change which improves implementation)
  • Performance (non-breaking change which improves performance. Please add associated performance test and results)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Non-functional change (xml comments/documentation/etc)

Checklist:

  • My code follows the code style of this project.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • My branch follows the naming convention bug-<issue#>-<description> or feature-<issue#>-<description>

The data collection will be assigned only if needed. This allows data column to be filtered from dataframes since it will always be null for all constituents.
…ection

After instatiating the collection type, fall back to the base BaseDataCollection to aggregate data if the type is not a base data collection.
Copy link
Member

@Martin-Molinero Martin-Molinero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👍

@jhonabreul jhonabreul merged commit a28d1f2 into QuantConnect:master Dec 3, 2024
7 checks passed
@jhonabreul jhonabreul deleted the bug-data-column-in-universe-dataframes branch December 17, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants