My Quantum Endeavors in 2023 (paper implementation, further interests to pursue in 2024)
I just realized I am writing a blog after a year. Though I haven’t written anything for such a long time, I have done a few things in the Quantum Computing field in 2023. I always feel that I should do a lot more, especially when I see so many things happening in this Emerging tech. If you happen to read my blogs and would like to connect with me regarding any ideas you might have, please get in touch. I will surely be interested.
In Jan, a friend asked me if we could try something of our own in Quantum Finance. I liked the idea and we started actively thinking about what we can do. At that point of time, I had come across and even read a paper titled QAOA-in-QAOA: solving large-scale MaxCut problems on small quantum machines. I came across this paper since someone in the quantum community, had a question related to this paper. This shows how useful it is to be a part of a community and how it can keep one motivated, especially when one doesn’t actively work in the field. The Quantum Computing community is extremely welcoming.
So we decided that we will do a large scale portfolio optimization using an approach similar to the one given in the paper I linked above. The paper specifically focusses on MaxCut, so it had to be tweaked a little to cater to portfolio optimization, but the approach is quite similar. Later in March 2023, we came across another paper for time series predictions, titled Optimizing quantum noise-induced reservoir computing for nonlinear and chaotic time series prediction. I just realized while writing this blog that this paper has undergone some changes and the latest revised version was published in November 2023. I don’t see any major difference in the approach given in the paper from March based on a very brief read of the Nov 2023 version. Nevertheless, I will be going through the latest version more intently to check the differences.
This blog is primarily about our understanding and implementations of these two papers using Qiskit. It is important to note that Qiskit undergoes massive changes every few months and one of the biggest that’s going to happen is the release of Qiskit 1.0 in February 2024. So necessary changes might have to be made so that it is compatible with later versions of Qiskit.
Before I deep dive into these 2 papers, I want to mention about a few more things I read about in 2023 and my topics of interest for 2024. One thing I really loved learning about was Tensor Networks with a focus on Matrix Product States (MPS). It is a vast area and I still need to read and understand a lot more. I might write about this next year. The other thing I went deep into was Quantum Error Correction, specifically topics like 2D surface codes, 3D surface codes, gate transversal, toric codes. This is again a very vast area. I started reading more about Quantum Error Correction after QuEra released their results from a bunch of experiments recently. I also read several articles and news about Post Quantum Cryptography (PQC). Cryptography is one of my favorite topics because it takes me back to my grad days when Number Theory was one of my favorite subjects. In 2024, I would specifically want to read and do more of PQC, Quantum Error Correction, Quantum Error Mitigation and maybe, Quantum Benchmarking, apart from the topics I learn when I participate in Quantum events.
Coming back to the two papers, our idea was to build a large scale portfolio optimization and also perform time series predictions for the financial instruments that got selected as a result of the optimization. For optimization, we decided to go with QAOA in QAOA approach and for time series predictions, we decided to use Quantum Reservoir Computing (links for papers for both approaches have been given above).
Briefly, the QAOA in QAOA approach for MaxCut problems is given below:
First step is to just divide the graph (in our case, we would split our portfolio into different sub-portfolios) and perform QAOA on each sub-graph.
The merge process:
1. Each subgraph is considered as 1 node. So in the example in the paper, the original graph has 100 nodes. This graph is partitioned into 10 sub-graphs, each with 10 nodes. In the merge part of the algorithm, each sub-graph is taken as 1 node, so we have 10 nodes in the merged graph.
2. To compensate for not taking every individual node separately, like in the original graph, the graph from point 1 above is a weighted graph. Here is how the weight is calculated between any 2 sub-graphs subg_1 and subg_2:
- for every vertex i in subg_1 and for every vertex j in subg_2, 2 types of weights are incremented:
- w_pos is incremented if i and j are not in the same class (+1 or -1 which is represented as 0 or 1 in bits) and have an edge between them in the original graph.
- w_neg is incremented if i and j are in the same class and have an edge between them in the original graph.
- the cumulative edge weight between subg_1 and subg_2 is taken as the difference between w_pos and w_neg. This is the edge weight taken in the adjacency matrix of this merged graph.
- these edge weights are used in the coefficients while framing the Problem Hamiltonian for QAOA.
3. After QAOA on the merged graph, we get a solution with 10 bits because we are dealing with 10 nodes where each node is a sub-graph. This is what they refer to as Global solution in the paper. The solutions of sub-graphs are called Local solutions in the paper.
4. But we would like to interpret the solution in terms of the 100 nodes of the original graph. This is done by combining solutions from the subgraphs in the following way:
- if the bit value corresponding to a sub-graph in the solution of merged graph is 1, then the solution of the sub-graph is taken as it is in the overall global solution,
else if it is 0, it is taken as the complement i.e if in the QAOA solution of sub-graph, a node is 0, then it would be taken as 1 in the global solution, else it would be taken as 0.
This point can be seen in the figure from the paper, look closely at the orange and green nodes before and after arriving at the solution of the merged graph.
5. It is to be noted that because the default policy option for partitioning the graph is random, one would get slightly different solutions each time one runs the code.
Now, for portfolio optimization, we take each time series as a node. The objective function uses the mean of each time series and the covariance for the entire set of time series data. To do something similar to the approach given in the QAOA in QAOA paper (also described above), we first decide on the number of qubits we can use. Say, we want to use a hardware that has 5 qubits. Now, we will split the entire dataset into subsets of 5 time series each. We will perform QAOA on each of these subsets. Now, we will replicate the same merging process described above. For portfolio optimization, our approach for merging involves treating each subset of 5 time series as a single time series or a single node. We decided to do this by taking a day-wise average of the 5 time series which resulted into a single time series representing the entire subset. For these new set of merged time series, we will further perform QAOA on newer subsets of 5 time series each and the similar merging process will follow till the last QAOA step is run on exactly 1 set of 5 time series. The final interpretation at the level of original time series is carried out in the similar way as described above in the MaxCut case. You can see our implementation here. The results seemed good and comparable to the results from a classical approach. We used the most accessible Yahoo Finance data. The uncertainty in portfolio optimization also arises due to the different risk appetites of different investors.
Now, let’s get into the other paper for time series prediction. The authors described Quantum Reservoir Computing in this paper. I actually wasn’t aware of the classical Reservoir Computing algorithm too. So this was a good learning from both classical and Quantum standpoint. Unfortunately, after we implemented this, the results weren’t quite promising. So I kept wondering if there is some loophole in my understanding. I also cannot find my code implementation since we stopped this endeavor in June 2023. I am planning to revisit this paper and the implementation again next year. However, below is our understanding of the paper.
I would first like to talk about the classical Reservoir Computing algorithm:
We have a multivariate time series u(t) of dimension M, where t=1,2,3,4…….T (discrete time steps). We want to predict the time series at time step T+1.
Steps:
- We want to map the univariate time-series into a high dimensional space. So we create a weight matrix (Win) of dimension (N X M) where N is the dimension we want to map u(t) to. N>M. We would do a matrix multiplication between u(t) and Win. This Win is called a reservoir coupler.
- Next we also need a reservoir, let’s call that W. This would be a square matrix of dimension N. You can think of it as having N nodes in a neural network, but the best part is that we are not going to train a neural network here. W is similar to an adjacency matrix where we basically define the connections between the N nodes.
- For both W and Win, the values are drawn randomly from a uniform distribution, but there are certain conditions that give good results. For W, it is important that it is sparse to some extent and the spectral radius < 1, as explained in this kaggle notebook: https://www.kaggle.com/code/utahrandall/reservoir-computing-and-generalized-learning/notebook
4. Further, here’s how we train and predict the time series:
5. In Above, v is the prediction for the next step, but above assumes we have an input u(t) for the previous step. So when we want to make predictions further, we would be using the predicted output from the previous step as the u(t) in the equation.
Now, let’s talk about the Quantum analog of the above approach as described in the paper. Since I do not have the my code implementation, I will go with providing references from the Qiskit documentation that we had used while implementing. As mentioned earlier, it is worth noting that Qiskit will undergo several changes in the coming months. So if you visit this blog later, you might find that the links do not exist or might point to some deprecated documentation. Anyway, let me start now:
- In the Quantum Analog, the multivariate regression for training and prediction remains the same as in the classical approach. The only difference now is that Win and W from above are arrived at using Quantum circuits. We will be working with univariate time series here.
- For using a window of N time steps, we will need N quantum circuits, each with say, n qubits. Below is how we construct the circuits:
● In each circuit, initialize in uniform superposition (with Hadamards on all qubits).
● For the 1st time step in our training window, we would be using the 1st circuit, where we will apply RX(theta) and RZZ(theta) where theta will be some scaled version of u(i) (Our window is u(i) to u(i+N) and we want to predict for u(i+N+1) onwards). RZZ(theta) would be using the pairwise entanglement scheme. The authors tried other entanglement schemes like linear and found this to be the best.
● For the 2nd time step in our training window, we would do the same and in addition will also repeat RX(theta2) and RZZ(theta2) with theta2 being the scaled version of u(i+1).
● We would repeat the previous 2 steps till u(i+N).
● Further, we will be adding noise channels in a probabilistic manner. The authors settled by reset noise channel after trying other types of noise channels: https://qiskit.org/documentation/stubs/qiskit_aer.noise.reset_error.html. They also mention that with Dynamic circuits, one can easily perform resets by doing a mid-circuit measurement and applying a classically controlled X gate. I think they just mentioned that and did not actually use Dynamic circuits.
● The probabilities we set for noise channels are like hyperparameters and we need to tune those, which the authors did using Scipy’s dual annealing: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.dual_annealing.html
● Further, we calculate expectation value for every qubit in all the N circuits. The paper has this:
Our understanding is that we need to get the reduced density matrices and then, find the trace: https://qiskit.org/documentation/stubs/qiskit.quantum_info.partial_trace.html.
● The authors use Density simulator: ‘aer_simulator_density_matrix’.
● Once we get expectations for all qubits in each circuit, we would get something like a N*n matrix of expectation values. For eg: if n=3, N=5, the 1st circuit’s expectation values would be [z11, z12, z13] (expectation values for qubits 1,2 and 3 in the 1st circuit or 1st time step). Similarly, we would have [z21,z22,z23] for the 2nd time step and so on.
● The expectation values matrix from the above step and the actual time series values will be used for multivariate linear regression.
● For further predictions, we would be using the previous prediction value and creating a new circuit with the addition of RX and RZZ for angle corresponding to this predicted value, get the expectation values for all qubits and just predict with the weights we got as a result of linear regression. The predicted values would have to be scaled back.
With this, I believe I can end the blog. As mentioned before, I will be revisiting this paper again and will implement this again next year.
I would love to receive any feedback on this blog. Hope we all have a great year ahead!