Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Online Data Loading #155

Open
GreenlandZZY opened this issue May 3, 2024 · 1 comment
Open

Feature Request: Online Data Loading #155

GreenlandZZY opened this issue May 3, 2024 · 1 comment

Comments

@GreenlandZZY
Copy link

Features requests are well-received but will probably be answered with a
suggestion that you develop them and contribute.

Specifications

  • Problem: Feature request
  • OS: Windows 11
  • Cvxportfolio version: 1.3.1
  • Python version: 3.11.0
  • Cvxpy version: 1.4.3
  • Pandas version: 2.0.3
  • Data: User-provided Factor Model and Return Data

Description

In cvxportfolio, for now it seems to require loading the pandas dataframe before the optimization and backtesting. This could be a issue to have huge universe with a large set of factors and require a huge memory usage for a exploration on a long history.

  • Is it possible to create a query class that can be fed as data, so that at every time t that data is used, conducting the query online?
  • And also export the optimization results and log incrementally to a file at each time point, so that recovering from any crash will be easier to conduct.

If there have already been base classes available to implement these features in cvx-portfolio, could you provide some suggestions? If not, can you add these features? Thank you so much.

@enzbus
Copy link
Collaborator

enzbus commented May 4, 2024

Short answer is yes, it can be done relatively easy. Long answer is that it may create problems down the line, and limits the flexibility of the system.

First you need to implement your data loading mechanism, like a query to your database table, in a subclass of MarketData. The methods you need to implement are documented here https://www.cvxportfolio.com/en/stable/data.html#cvxportfolio.data.MarketData . The heavy lifter is the serve method, which takes the timestamp in the back-test and returns a view of the past market data (past open-to-open returns, past market volumes, ...), which are used both by the market simulator and the trading policies, and the current data for the simulator. Two other methods are trivial, only other tricky one might be the trading_calendar which needs to know the future trading times. Your custom MarketData server is then passed to the initializer of MarketSimulator, and that part is done.

For saving incremental results, you should subclass BacktestResult and implement whatever DB logic you wish. What it does now essentially are incremental table inserts in a few Pandas DataFrames (initial positions, target weigths, ...) and some Series like realized costs per iteration, a few timers, .... You could also redirect the Python log stream to some persistent storage, now it's also saved in memory. It should all be possible, only issue might be that BacktestResult was only recently opened up for extension (had private interface before) and I still might need to do some cleaning there.

Now, the negatives. Some might be debatable, but are my opinion at least. You lose multiprocessing. You probably lose reproducibility unless you are very careful to make sure the tables you refer to aren't modified. It's harder to debug, since potentially faulty operations like DB queries are done down in the internals.

enzbus added a commit that referenced this issue May 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants