Certificate in Quantitative Finance
Final Project Brief
量化金融国际证书代写 This document outlines each available topic together with submission requirements for the project report and code.
This document outlines each available topic together with submission requirements for the project report and code. By-step instructions offer a structure not a limit to what you can implement. Project Workshops I and II will provide focus to the implementation of each topic.
CQF Final Project is numerical techniques and backtest or sensitivity-test of model output (prices, portfolio allocations, pairs trading) as appropriate. Some numerical techniques call for being implemented in code from first principles. There might be numerical methods either too involved or auxiliary which you would not need to implement if good ready functionality is available. Re-use and adoption of code permitted. Marks earned will strongly depend on coding of numerical techniques and presentation of how you explored and tested a quantitative model.
A capstone project require a degree of own study and ability to work with documentation on packages that implement numerical methods in your coding environment eg, Python, R, Matlab, C#, C++, Java, Scala, VBA+NAG libraries.
Exclusively for current CQF delegates. No distribution.
1 Topics 量化金融国际证书代写
To complete the project, you must implement one topic from this list – according to the tasks of up to date Brief. If you continue from a previous cohort, please review topic description because tasks are regularly reviewed. It is not possible to submit past topics.
1.Portfolio Construction with Views and Robust Covariance (PC)
2.Deep Learning for Time Series (DL)
3.Long/Short Trading Strategy Design & Backtest (TS)
4.Credit Spread for a Basket Product (CR)
1.1 Project Report and Submission Requirements 量化金融国际证书代写
- Submit working code together with a well-written report and originalitydeclaration.
- There is no set page length. Report must have an analytical quality and discussion of results/robustness/sensitivity/backtesting as appropriate to the topic.
Use charts, test cases and comparison to empirical research papers where available.
- Report must contain sufficient mathematical model, numerical methods with attention to their convergence/accuracy/computational properties.
Please feature the numerical techniques you coded – make a table.
- Mathematical sections can be prepared using LaTeX or Equation Editor (Word). Printing out Python notebook code and numbers on multiple pages, without your own analysis text, explained plots, relevant maths is not an acceptable format.
Must be submitted by 23:59 GMT on submission date.
Work done must match the Brief. There is no extension to the Final Project.
Projects without declaration or working code are incomplete and will be returned.
All projects are checked for originality. We reserve an option of a viva voce before a qualification to be awarded.
1.2 CQF Electives 量化金融国际证书代写
We ask you to choose two Electives in order to preserve focus. All electives will be available later for your advancement. Project Workshops remain the main source, while Electives canvass knowledge areas, and several viable combinations for each Project topic are possible.
The effective approach is to select one ‘topical elective’ and one ‘coding elective’ (end of Brief).
- For portfolio construction type of topic, Risk Budgeting elective is your primary choice.
Adv Portfolio Management considers a dynamic programming approach over a set of stochastic SDEs, which was never taken up in asset management.
- Counterparty Credit Risk is for credit instrument pricing.
- Curve Construction and Modelling (HJM) relies on the core lectures on PCA and HJM and tutorials (Python tutorials, Multicurve Modelling).
- Trading Strategy topic utilises Cointegration for pairs/basket trading, again relying on core Cointegration lecture/Project Workshop, and Python Tutorial.
- Deep Learning for Time Series topic relies on two recent ML Modules of material. Recurrent NNs / LTSM example provided in Python Tutorial.
Electives description continue at the end of this Brief: make a note to review Risk Management, Behavioural Finance, Pricing with PDEs for your professional advancement at a later stage.
1.3 Coding for Quant Finance
- Choose programming environment that has appropriate strengths andfacilities to implement the topic (pricing model). Common choices range from VBA to Python to C++, please exercise judgement as quants: which language has libraries to allow you to code faster, validate easier.
- Use of R/Matlab/Mathematica is encouraged – as a main environment or an addition. Sometimes there is a need to do time series/covariance matrix/rank correlation computation, or to present results and the formulae in coherent form.
- Project Brief give links to nice demonstrations in Matlab, and Webex sessions demonstrate Python notebooks – doesn’t mean your project to be based on that ready code.
Python with pandas,
matplotlib, sklearn, and tensorflow forms aconsiderable challenge to Matlab, even for visualisation. Matlab plots editor is clunky and it is not that difficult to learn various plots in Python.
- ‘Scripted solution’ means the ready functionality from toolboxes and libraries is called, but the amount of own coding of numerical methods is minimal or non-existent. This particularly applies to Matlab/R.
- Projects done using Excel spreadsheet functions only are not robust,notoriously slow, and do not give understanding of the underlying numerical methods. CQF-supplied Excel spreadsheets are a starting point and help to validate results but coding of numerical techniques/use of industry code libraries is expected.
- The aim of the project is to enable you to code numerical methods and develop model prototypes in a production environment. Spreadsheets-only or scripted solutions are below the expected standard for completion of the project.
What should I code?
Delegates are expected to re-code numerical methods that are central to the model and exercise judgement in identifying them. Balanced use of libraries is at own discretion as a quant.
- Produce a small table in report that lists methods you implemented/adjusted. If using ready functions/borrowed code for a technique, indicate this anddescribe the limitations of numerical method implemented in thatcode/standard library.
- It is up to delegates to develop their own test cases, sensibility checks and validation. It is normal to observe irregularities when the model is implemented on real life data. If in doubt, reflect on the issue in the project report.
- The code must be thoroughly tested and well-documented: each function must be described, and comments must be used. Provide instructions on how to run the code.
This page intentionally left blank.
Portfolio Construction with Robust Covariance 量化金融国际证书代写
Covariance matrix computed from sample returns is likely to contain the exact kind of random noise that perturbs the mean-variance optimiser the most. First, we focus on robust covariance input by at least one method: Marcenko-Pastur denoising OR Ledoit-Wolf nonlinear shrinkage. After stabilising the covariance matrix, we turn to Black-Litterman framework to derive equilibrium returns and impose views. The ultimate optimisation will utilise µBL and the robust covariance.
Kinds of optimisaiton: mean-variance, Min VaR, Tracking Error, Max Sharpe Ratio, higherorder moments (min coskewness, max cokurtosis) – implement at least two. Computation by ready formula or ready functions that are specialised for quadratic programming/cone optimisation. Adding constraints improves robustness: most investors have margin constraints/limited ability to borrow/limited short positions. Risk Contributions can be computed ex ante for any optimal allocation, whereas computing ERC Portfolio requires solving a system of risk budget equations (non-linear). While latter problem is not optimisation, it can be ‘converted’ into one – ERC solution can be found by sequential quadratic programming (SQP).
Portfolio Choice and Data 量化金融国际证书代写
The choice of portfolio assets must reflect optimal diversification. The optimality depends on the criterion. For the max possible decorrelation among assets, it is straightforward to choose the least correlated assets. For exposure/tilts to factor(s) – you need to know factor betas a priori, and include assets with either high or low beta, depending on purpose.
A naive portfolio of S&P500 large caps is fully exposed to one factor, the market index itself, which is not sufficient. Specialised portfolio for an industry, emerging market, credit assets should have 5+ names, and > 3 uncorrelated assets, such as commodity, VIX, bonds, credit, real estate.
Factor portfolio is more of a long/short strategy, e.g., momentum factor means going long top 5 raising stocks and short top 5 falling. Factor portfolios imply rebalancing (time diversification) by design.
- mean-variance optimisation was specified by Harry Markowitz for simple returns (not log) which are in excess of the rf. For risk-free rate, 3M US Treasury from pandas FRED dataset/ECB website rates for EUR/some small constant rate/zero rate – all are acceptable. Use 2-3 year sample, which means > 500 daily returns.
- Source for prices data is Yahoo!Finance (US equities and ETFs). Use code libraries to access that, Google Finance, Quandl, Bloomberg, Reuters and others. If benchmark index not available, equilibrium weights computed from the market cap (dollar value).
Step-by-Step Instructions 量化金融国际证书代写
Part I: Robust Covariance
1.Implement Portfolio Choice based on your approach to optimaldiversification: introduce an exogenous asset, check for the lesser correlated assets, long/short. See Q&A.
2.Decide which method you use to make to make a covariance matrix robust:
- Marcenko-Pastur denoising is arguably the best method to deal withnoise-induced instability. Denoising and detoning recipes (see de Prado, 2020) are affine and, beneficial over shrinkage because of superior preservation of the signal carried by top eigenvectors. While no ready code provided, it should not be a problem for a quant to implement the recipes.
- Ledoit-Wolf nonlinear shrinkage has code, which can be applied directly to assets data. In addition to that ready recipe, explore trace and minimum covariance determinant (both are in sklearn package of Python).
- Basic linear shrinkage or EGARCH+correlations are no longer acceptable methods to produce robust covariance.
3.Produce supporting representations: heatmaps/3D of covariance matrices, plots of eigenvalues of naive sample covarance vs. robust covariance.
Part II: Imposing Views. Comparative Analysis 量化金融国际证书代写
1.Plan your Black-Litterman application. Find a ready benchmark or construct the prior: equilibrium returns can come from a broad-enough market index. Implement computational version of BL formulae for the posterior returns.
2.Imposing too many views will make seeing impact of each individual view difficult.
3.Describe analytically and compute optimisation of at least two kinds. Optimisation is improved by using sensible constraints, eg, budget constraint, ‘no short positions in bonds’ but such inequality constraints ∀wi> 0 trigger numerical computation of allocations. e.
4.You will end up with multiple sets of optimal allocations, even for a classic mean-variance optimisation (your one of two kinds). Please make your own selection on which results to focus your Analysis and Discussion – the most feasible and illustrative comparisons.
- Naïve covariance vs. robust: remember to compute allocations for simple covariance too (before denoising/shrinkage).
- Optimal allocations (your) vs benchmark for active risk. Expected returns (naïve) vs implied equilibrium returns (alike to Table 6 in BL Guide by T. Idzorek.)
- BL views are not affected by covariance matrix – therefore, to compute allocations shifted by views (through Black-Litterman model) with naive or robust covariance is your choice.
- Three levels of risk aversion – while it is optional to use all three, it is recommended that you explore at least for min variance.
Part III: Backtesting CHOOSE FROM 量化金融国际证书代写
1.P&L: optimal allocations × price series. If holdout dataset not separated, then backtest against a period in the past! Backtesting packages allow historic simulation (crossvalidation over many periods or shuffled samples) anddistributional analysis.)
2.Insightful comparison: how your optimal allocations (with robust covariance) performed vs 1/N. Can also compare to Diversification Ratio portfolio.
3.Alternatively, Naive Risk Parity kind of portfolio allocations are easilycomputed (refer to Risk Budgeting Elective).
4.Which phenomena have you encountered, eg, why would investors end up allocating into riskier (high beta) assets? Does equal risk contribution work? Would ‘the most diversified portfolio’ kind of approach work?
Deep Learning for Time Series 量化金融国际证书代写
If one believes the data carries autoregressive structure: a recurrent neural network model can be a successful alternative to time series regression. For this topic you will run Long short-term memory classification (up/down moves) with features more advanced than ML Assignment (Exam 3). Volatility can be a feature itself, together with an interesting addition of drift-independent volatility. One specific model of recurrent NN is Long Short Term Memory, it can come out as one of best possible predicting models from features, such as:
- financial ratios/adv technical indicators/volatility estimators.
- OPTIONALLY if you can access data, you can enhance prediction with (a) credit spreads – traded CDS or indices and (b) news indicators – FactSet,RavenPack offer API for professional subscription or trial.
Dealing with the arbitrary length of sequence is the major characteristic of LTSM – that means you can attempt to use frequency longer than 1D. Certain time series, such as interest rates or economic indicators are more characterised by long memory and stationarity, and therefore modelled with power law autocorrection/Hurst exponent fractional/Markov processes. If you attempt the prediction of 5D or 10D return for equity or 1W, 1M for FF factor the challenge is twofold. First is increase data requirement, nearing 7-10 years to begin with. Second is isolation of autocorrelation in positive 5D/10D return in equity time series.
Before using RNNs,
conduct exploratory data analysis (EDA) in order to create features map (not tickers map highlighted in tutorial).1 You can proceed straight to autoencoding for NNs or utilise self-organising maps (SOM)/decision tree regressor (DTR). Multi scatterplots presenting relationships among features is always a good idea, however if initially you have 50-70 features (including repeated kind, different lookback) – you need to reduce dimensionality in features. SOM analysis can be on non-lagged features, daily data and lookback computational periods of 5D/10D/21D/50D for features as appropriate – immediately generates 4 columns of the same feature – and dense map areas should help you to answer such questions as to choose EMA 5D or 10D, Volatility 50D or ATR 10D. Alternatively, top-level splits of DTR with large numbers in leaf should reveal which features to prioritise.
1 EDA helps dimensionality reduction via better understanding of relationships between features, uncovers underlying structure, and invites detection/explanation of the outliers. EDA should save you from ‘brute force’ GridSearchCV runs calling a NN/RNN Classifier again and again.
Part I: Features Engineering 量化金融国际证书代写
Please revisit ML Lab II (ANNs) for basic discussion on feature scaling. Be careful about sklearn feature selection by F-test.
Past moving averages of the price, simple or exponentially weighted (decaying in time), so SMA, EMA. Technical indicators, such as RSI, Stochastic K, MACD, CCI, ATR, Acc/Dist). Interestingly, Bollinger Bands stand out as a good predictor. Remember to vary lookback period 95D/10D/21D/50D, even 200D for features as appropriate. Non-overlapping periods means you need data over long periods.
Volume information and Volume Weighted Average Price appear to be immediate-term signals, while we aim for prediction.
Use of features across assets are permitted but be tactical about design: eg, features from commodity price impacting an agricultural stock (but not oil futures price on an integrated oil major), features from cointegrated equity pair. Explore Distance Metrics among features (KNN) and potential K-means clustering as yet another alternative to SOM.
OPTIONAL Interestingly credit spreads (CDS) can be a good predictor for price direction. Think out of the box what other securities have ‘credit spread’ affecting their price.
OPTIONAL Historical data for financial ratios is good if you can obtain the data via your own professional subscription. Other than that, history of dividends,purchases/disposals by key stakeholders (director dealings) or by large funds, or Fama-French factor data is better available.
Part II: Pipeline Formation (considerations)
- Your implementation is likely be folded into some kind of ML Pipeline, to allow you re-use of code (eg, on train/test data) and aggregating the tasks. Ensemble Methods present an example of such pipleline: Bagging Classier is an umbrella name for the process of trying several parametrisations of the specific classifer (eg Logistic Regression). AdaBoost over Decision Tree Classifier is another case. However, please do not use these for DL Topic.
- Empirical work might find that RNNs/Reinforcement Learning might work better WITHOUT past returns! Alternatively, if you are predicting 5D/10D move there will be a significant autocorrelation effect – your prediction will work regardless of being a good model or not.
- Please limit your exploration to 2-3 assets and focus on features, their SOM (if possible), and LTSM Classifier to make the direction prediction. If you are interested in the approach to choose a few from a large set of assets – can adopt a kind of diversified portfolio selection (see Portfolio Topic Q&A).
- You are free to make study design choices to make the task achievable. Substitutions:
– present relationship between features with simple scatterplots (vs SOMs) or K-means clustering;
– use MLP classifer if recurrent neural nets or LTSM is particular challenge;
– re-define task and predict Momentum sign (vs return sign) or direction of volatility.
Under this topic you do not re-code decision trees or optimisation to compute NNs weights and biases.
Pairs Trading Strategy Design & Backtest 量化金融国际证书代写
Estimation of cointegrated relationship between prices allows to arbitrage a mean-reverting spread. Put trade design and backtesting in the centre of the project, think about your signal generation and backtesting of the P&L. You will have a hands-on experience with regression but will not run the regression on returns. The numerical techniques are regression computation in matrix form, Engle-Granger procedure, and statistical tests. You are encouraged to venture into A) multivariate cointegration (Johansen Procedure) and B) robustness checking of cointegration weights, ie, by adaptive estimation of your regression parameters with statistical filters.
Contegrating weights that you use to enter the position form the long/short allocation that produces a mean-reverting spread. Signal generation and suitability of that spread for trading depend on its fitting to OU process recipe. For optimisation, comparative backtesting, rolling ratios and other industry-level backtesting analytics use the ready code libraries. However, project that solely runs pre-programmed statistical tests and procedures on data is insufficient. It is not recommended using VBA for this topic due to lack of facilities.
Signal Generation and Backtesting 量化金融国际证书代写
- Be inventive beyond equity pairs: consider commodity futures, instruments on interest rates, and aggregated indices.
- Arb is realised by using cointegrating coefficients βCointas allocations ω. That creates a long-short portfolio that generates a mean-reverting spread. All project designs should include trading signal generation (from OU process fitting) and backtesting (drowdown plots, rolling SR, rolling betas).
- Does cumulative P&L behave as expected for a cointegration arb trade? Is P&L coming from a few or many trades, what is half-life? Maximum Drawdown and behaviour of volatility/VaR?
- Introduce liquidity and algorithmic flow considerations (a model of order flow). Any rules on accumulating the position? What impact bid-ask spread and transaction costs will make?
Can utilise the ready multivariate cointegration (R package urca) to identify your cointegrated cases first, especially if you operate with the system such as four commodity futures (of different expiry but for the period when all traded. 2-3 pairs if analysing separate pairs by EG.
Part I: ‘Learning’ and Cointegration in Pairs. Trade Design 量化金融国际证书代写
1.Even if you work with pairs, re-code regression estimation in matrix form – your own OLS implementation which you can re-use. Regression between stationary variables (such as DF test regression/difference equations) has OPTIONAL model specification tests for (a) identifying optimal lag p with AIC BIC tests and (b) stability check.
2.Implement Engle-Granger procedure for each your pair. For Step 1 use Augmented DF test for unit root with lag 1. For Step 2, formulate bothcorrection equations and decide which one is more significant.
3.Decide signals: common approach is to enter on bounds µe± Zσeqand exit on et everting to about the level µe.
4.At first, assume Z = 1. Then change Z slightly upwards and downwards – compute P&L for each case of widened and tightened bounds that give you a signal. Alternatively run an optimisation that varies Zoptfor µe±Zoptσeq and either maximises the cumulative P&L or another criterion.
Caution of the trade-off: wider bounds might give you the highest P&L and lowest Ntrades however, consider the risk of co-integration breaking apart.
Part II: Backtesting
It is your choice as a quant to decide which elements you need to argue successfully that your trading strategy (a) will not fall apart and (b) provides ‘uncorrelated return’.
4.Industry backtesting practice includes splitting data into train/test subsets. For your forward testing periods you can use Quantopian platform to produce drawdown plots, rolling SR and rolling beta vs chosen factors.
Part III Multivariate Cointegration OPTIONAL
- Your project can take another turn from the start: look into Johansen Procedure for multivariate cointegration and apply to futures, rates, etc. Five ‘deterministic trends’ in coint residual are possible but, in practice only need a constant inside the residual et−1.
- Interpret Maximum Eigenvalue and Trace statistical tests, both are based on Likelihood Ratio principle, eg, how you decided the number of cointegrated relationships?
- Efficient implementation outlined in Jang & Osaki (2001) but you might need Ch 12 from Zivot (2002) book. If coded Johansen Procedure, validate using R/Matlab libraries.
Time Series Project Workshop, Cointegration Lecture and Pairs Trading tutorial are your key resources.
Credit Spread for a Basket Product 量化金融国际证书代写
Price a fair spread for a portfolio of CDS for 5 reference names (Basket CDS), as an expectation over the joint distribution of default times. The distribution is unknown analytically and so, co-dependent uniform variables are sampled from a copula and then converted to default times using a marginal term structure of hazard rates (separately for each name). Copula is calibrated by estimating the appropriate default correlation (historical data of CDS differences is natural candidate but poses market noise issue). Initial results are histograms (uniformity checks) and scatter plots (co-dependence checks). Substantial result is sensitivity analysis by repricing.
A successful project will implement sampling from both, Gaussian and t copulae, and price all k-th to default instruments (1st to 5th). Spread convergence can require the low discrepancy sequences (e.g., Halton, Sobol) when sampling. Sensitivity analysis wrt inputs is required.
Data Requirements 量化金融国际证书代写
Two separate datasets required, together with matching discounting curve data for each.
1.A snapshot of credit curves on a particular day. A debt issuer likely to havea USD/EUR CDS curve – from which a term structure of hazard rates is bootstrapped and utilised to obtain exact default times, ui→ τi . In absence of data, spread values for each tenor can be assumed or stripped visually from the plots in financial media. The typical credit curve is concave (positive slope), monotonically increasing for 1Y, 2Y, . . . , 5Y tenors.
2.Historical credit spreads time series taken at the most liquid tenor 5Y for each reference name. Therefore, for five names, one computes 5 × 5 defaultcorrelation matrix. Choosing corporate names, it is much easier to compute correlation matrix from equity returns.
Corporate credit spreads are unlikely to be in open access; they can be obtained from Bloomberg or Reuters terminals (via your firm or a colleague). For sovereign credit spreads, time series of ready bootstrapped PD5Y were available from DB Research, however, the open access varies. Explore data sources such as www.datagrapple.com and www.quandl.com.
Even if CDS5Y and PD5Y series are available with daily frequency, the co-movement of daily changes is market noise more than correlation of default events, which are rare to observe. Weekly/monthly changes give more appropriate input for default correlation, however that entails using 2-3 years of historical data given that we need at least 100 data points to estimate correlation with the degree of significance.
If access to historical credit spreads poses a problem remember, default correlation matrix can be estimated from historic equity returns or debt yields.
Model Validation 量化金融国际证书代写
- The fair spread for kth-to-default Basket CDS should be less than k-1 todefault. Why?
- Project Report on this topic should have a section on Risk and Sensitivity Analysis of the fair spread w.r.t.
1.default correlation among reference names: either stress-test by constant high/low correlation or ± percentage change in correlation from the actual estimated levels.
2.credit quality of each individual name (change in credit spread, credit delta) as well as recovery rate.
Make sure you discuss and compare sensitivities for all five instruments.
- Ensure that you explain historical sampling of default correlation matrix and copula fit (uniformity of pseudo-samples) – that is, Correlations Experiment and Distribution Fitting Experiment as will be described at the ProjectWorkshop. Use histograms.
Copula, CDF and Tails for Market Risk 量化金融国际证书代写
The recent practical tutorial on using copula to generate correlated samples is available at: https://www.mathworks.com/help/stats/copulas-generate-correlated-samples.html
Semi-parametric CDF fitting gives us percentile values with fitting the middle and tails. Generalised Pareto Distribution applied to model the tails, while the CDF interior is Gaussian kernel-smoothed. The approach comes from Extreme Value Theory that suggests correction for an Empirical CDF (kernel fitted) because of the tail exceedances.
Relevant Additional Sources 2021-Jun 量化金融国际证书代写
The list puts together some initial – you will find more topical resources inside Additional Material.zip, provided for each Topic within the relevant Project Workshop I or II, or within the anchor core lecture, such as on Neural Nets. Please do not email tutors for a copy.
Reading List: Credit Portfolio
- Very likely you will revisit CDO & Copula Lecture material, particularly slides 48-52 that illustrate Elliptical copula densities and discuss Choleskyfactorisation.
- Sampling from copula algorithm is in relevant Workshop and Monte Carlo Methods in Finance textbook by Peter Jaekel (2002) – see Chapter 5.
- Rank correlation coefficients are introduced Correlation Sensitivity Lecture and P. Jaekel (2002) as well. CR Topic Q&A document gives the clarified formulae and explanations.
Reading List: Portfolio Construction
- CQF Lecture on Fundamentals of Optimization and Application to Portfolio Selection
- A Step-by-step Guide to The Black-Litterman Model by Thomas Idzorek, 2002 tells the basics of what you need to implement.
- The Black-Litterman Approach: Original Model and Extensions Attilio Meucci, 2010. http://ssrn.com/abstract=1117574
- On LW nonlinear shrinkage / Marcenko-Pastur denoising, either method to make a covariance matrix robust, resources and certain code provided with the relevant Workshop and Tutorial.
Reading List: Cointegrated Pairs
- Modeling Financial Time Series, E. Zivot & J. Wang, 2002 – onerecommended textbook, we distribute Chapter 12 on Cointegration.
- Instead of a long econometrics textbook, read up Explaining Cointegration Analysis: Parts I and II by David Hendry and Katarina Juselius, 2000 and 2001. Energy Journal.
- Appendices of this work explain key econometric and OU process maths links, Learning and Trusting Cointegration in Statistical Arbitrage by Richard Diamond, WILMOTThttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=2220092.