The following is the full Q&A for the interview from Phys.org.

**1. Could you explain to our readers how your recent paper published in Nature Physics came about, what main ideas or theories it was based on, and what its main objectives were?**

This work started from my intellectual pursuit of trying to understand how machines can perceive, manipulate, and process quantum systems and quantum information.

During my undergraduate, my research centers around statistical machine learning and deep learning. A central basis for the current machine learning era is the ability to utilize highly parallelized hardware, such as graphical processing units (GPU) or tensor processing units (TPU). It is natural to wonder how an even more powerful learning machine capable of harnessing quantum-mechanical processes could emerge in the far future. This has been my aspiration when I started my Ph.D. at Caltech.

In order to approach such an ambitious future on firm ground, the first step is to understand how machines can perceive, manipulate, and process quantum systems and quantum information. The standard technique known as quantum state tomography learns the entire description of a quantum system, which requires an exponential number of measurements, memory, and time. This characteristic makes machines unable to perceive quantum systems with more than tens of qubits. Recently, neural network approaches have been proposed, demonstrating surprisingly strong empirical performance in several cases, but lacks a clear understanding of when it would work or fail.

To build a rigorous foundation for how machines can perceive quantum systems, we combined my previous knowledge about statistical learning theory with Richard Kueng and John Preskill’s expertise on a beautiful mathematical theory known as unitary t-design. Statistical learning theory is the theory that underlies how the machine could learn an approximate model about how the world behaves. Unitary t-design is a mathematical theory that underlies how quantum information scrambles, which is central to understand quantum many-body chaos, in particular, quantum black holes. Together, it results in a rigorous and efficient procedure for a classical machine to construct an approximate classical description of a quantum many-body system. The classical description allows accurate prediction of many properties of the quantum system from only performing a minimal amount of quantum measurements.

**2. In relatively simple terms, could you explain how the method for constructing an approximate classical description of a quantum state works, outlining its key advantages/unique characteristics? **

To construct an approximate classical description of the quantum state, we perform a randomized measurement procedure given as follows. We sample a few random quantum evolutions that would be applied to the unknown quantum many-body system. These random quantum evolutions are typically chaotic and would scramble the quantum information stored in the quantum system. This random quantum evolution is where the connection to the mathematical theory on unitary t-design used in the study of quantum many-body chaos, such as quantum black holes, comes about. Then we look at each of the randomly scrambled quantum systems with a measurement apparatus, which would result in a wave function collapse that turns the quantum system into a classical system. The data characterizing the random quantum evolutions and the resulting classical systems after the measurement are combined to form an approximate classical description of the quantum system.

Intuitively, one could think of this procedure as follows. We have an exponentially-high dimensional object, the quantum many-body system, that is very hard to grasp by a classical machine. We perform several random projections of this extremely high dimension object to a much lower dimensional space through the use of random/chaotic quantum evolution. The set of random projections provides a rough picture of how this exponentially high dimensional object looks like. And the classical representation allows us to predict various properties of the quantum many-body system.

Building on statistical learning theory and the theory of quantum information scrambling, we prove a surprising statement showing that this procedure could accurately predict M properties of the quantum system from only log(M) measurements. For example, by measuring the quantum system for a number of times, we can predict an exponential number of properties for the quantum system.

On the other hand, the traditional understanding is that when we want to measure M properties, we have to measure the quantum system M times. This is because after we measure one property of the quantum system, the quantum system would collapse and become classical. After the quantum system has turned classical, we can not measure other properties with the resulting classical system. Our approach avoids this by performing randomly generated measurements and infer the desired property by combining these measurement data.

This provides a rigorous understanding of some of the surprisingly strong performance seen in recent machine learning approaches. Furthermore, building on this theoretical foundation, the proposed method can be orders of magnitude faster than existing machine learning approaches and provide a more accurate prediction for existing highly specialized quantum information techniques in measuring various properties of the quantum many-body system.

**3. What do you feel are the most meaningful achievements of your study, and what insight do these bring to the Physics field? **

Our study rigorously shows that there is much more information hidden in the data obtained from quantum measurements than we originally expected. By suitably combining these data, we can infer this hidden information and gain significantly more knowledge about the quantum system. This implies the importance of data science techniques for the development of quantum technology. Furthermore, we show that to fully utilize the power of machine learning, an understanding of the strange behavior intrinsic to quantum physics is decisive. The direct application of standard machine learning techniques could be partially successful. But the full potential only becomes evident when we combine the mathematics underlying machine learning and quantum physics in an organic fashion.

**4. What are your plans for future research? **

Given a rigorous ground for perceiving quantum systems with classical machines, my personal plan is to take the next step towards creating a learning machine capable of manipulating and harnessing quantum-mechanical processes. In particular, we want to provide a solid understanding of how machines could learn to solve quantum many-body problems, such as classifying quantum phases of matter or finding quantum many-body ground states. The ability to construct efficient classical representations of quantum systems opens a gateway for classical machine learning to tackle these challenging quantum many-body problems from a rigorous footing. However, perceiving is not enough for solving these quantum problems. The machines would also have to learn to simulate certain computations to succeed. This would be a further synthesis between the underlying mathematics of machine learning and quantum physics.

At the same time, we are also working on refining and developing new tools for inferring hidden information from the data collected by quantum experimentalists. The physical limitation in the actual systems provides interesting challenges for developing more advanced techniques. This would further allow experimentalists to see what they originally could not and help advance the current state of quantum technology.

__Publication:__

Predicting many properties of a quantum system from very few measurements (Nature Physics 2020).

https://www.nature.com/articles/s41567-020-0932-7

While quantum mechanics can accurately describe our universe, the equation governing the quantum system is too complicated to be soluble through human analysis. To apply this great theory to real world scenarios, such as in the investigation of electronic structure or in the prediction of complex system in atomic scale, we have to resort to computational approach. In this project, I studied and implemented two well-known approach for solving the properties of quantum systems: **the Fourier grid Hamiltonian method** and **Quantum Monte Carlo method**.

Fourier grid Hamiltonian (FGH) method elegantly discretized the problem through momentum space representation into an finite Eigenvalue problem. The eigenvalue decomposition is solved using Eigen Library in my implementation. The above artistic figures are electron excited states computed using this method. The lowest 150 excited states for 2D simple harmonic oscillator and for a special potential can be found here and here on DropBox.

Quantum Monte Carlo (QMC) method is an interesting algorithm that can efficiently solve for the ground state property of any multiple-Boson system. The idea is to propagate in the imaginary time through Feynman's Path Integral Formulation. And by viewing the imaginary time propagation as a birth-death process with random walking replicas. A visualization of this method is available on YouTube. I have also implemented some real-world situations to demonstrate the effectiveness of this method.

Over four undergraduate years, I have been fully immersing in machine learning research. From statistical learning theory, recommender system optimization to deep learning application in natural language processing, I have touched a wide range of different flavors in this field. It has been a fun ride, and it would be equally great to continue in this line of research. However, after a long time of introspection, I decided not to continue in the popular machine learning research. This means refusing the well-regarded CMU machine learning department for Ph.D. study.

Within these one or two years, a lot of people become interested in machine learning, and a large portion of them jump into this field. I believe one of the biggest reason is that the entry bar for deep learning has become relatively low. No advanced math or brilliant algorithmic design is needed to do cutting-edge research. I see this as a sign of maturity in the mainstream machine learning research (i.e., deep learning). This is great as machine learning can now have the maximum impact possible. However, I do not like participating in a mature and popular field.

- As the field grows mature, there is not much room left for revolutionary improvement. And I would be a contributor rather than a pathfinder.
- Since many people are interested in doing machine learning (deep learning) research, there is no urgent need for me to help the field grow.

These two reasons lead me to feel less exciting and motivated about the research I will be conducting in the field of deep learning. A similar opinion is also expressed by the highly-respected Caltech alumnus, Donald Knuth, as the single advice for young people.

Most people do not venture into problems where it will not work instantly. But I believe the true scientists should tackle these visionary and potentially revolutionary problems. The process would be arduous, e.g., a problem not working instantly may not work forever, papers would be harder to publish, etc. But it is extremely crucial for people to tackle these futuristic mysteries. If the problems being tackled by humanity degenerates to the subspace of problems which has a direct use, then we would not be able to accumulate enough knowledge that would lead us to a glorious and bright future.

A new paradigm of computing, quantum computing, starts to emerge recently. Governments, such as China and Europe, have invested a decent amount of money in this academic field of research (e.g., see Quantum Manifesto by Europe). I believe it also opens a whole new world for data analysis and machine learning. By extrapolating from the history, we can see that mainstream AI techniques depend heavily on the underlying computing paradigm.

- Bulky and slow single-core machine: Expert system (1980 to 1990).
- Personal single-core CPU: Statistical machine learning (2000 to 2010).
- General-purpose GPU: Deep learning (2010 to now).
- Special-purpose small quantum computers: ?
- General-purpose large quantum computers: ?

As a result, I believe there is a lot of potential in the intersection of quantum technology and artificial intelligence. However, this is a field that not much computer science people are looking into.

- Most computer science people are absorbed in its own field. Since currently, people in this field can gain a lot of profit compared to people doing quantum technology.
- Most computer science people lack the interest and knowledge to conduct quantum technology research since it is relatively obscure.

On the other hand, I think it is distinctive for me, a person with a background in both computer science and physics, to jump into this field. While I have a physicist’s heart that craves for theoretical beauty, I also have an engineering heart that cares excessively about real-world practicality. Of course, I have never conducted quantum computing research. So I know doing such research would be extremely hard for me at the first few years. I have to catch up with the existing knowledge and get used to the way of research in an unfamiliar field. However, I believe that through this challenge, I will grow my ability to a whole new level.

Here are just some of the broad questions that I hope to explore during my Ph.D. study.

- What kind of machine learning paradigm would be more suitable for quantum computers? I don’t believe directly applying deep learning to quantum computers is the right thing to do.
- What kind of new capability can be achieved on quantum computers? Besides some well-known examples such as cracking RSA or quantum system simulation.
- How to transform exciting results in theoretical quantum information research into a new application in quantum technology?
- How can machine learning be used to improve the design of quantum computers?

During my Ph.D. study at Caltech, I want to challenge myself and venture into this dark forest to retrieve something precious for the advance of human technology.

]]>***This work is done during internship at Microsoft AI+Research, Redmond, USA.**

The goal is for machines to **read and understand**
an arbitrarily-given context (such as an article from the Wikipedia)
and **answers questions** about the given context.
In order to achieve this, we need to give machine the ability to think about the question when reading the context.
This is performed through **attention**, attending to relevant part in the question
as we read through the context.
A key concept in this work is that when humasn focus our attention, we take into account different levesl of meaning.
Sometimes we look for the exact details, did this event happen in 2016 or 2017?
Sometimes we think more abstractly and treat 2016, 2017, -37, 3.14159 all as just numbers.

Motivated from this concept, we propose a light-weight enhancement for attention, **fully-aware attention**,
and an end-to-end neural architecture, **FusionNet**, as illustrated above.
The performance improved significantly by replacing standard attention with fully-aware attention.
At the end of my internship (Sep 20th, 2017), we achieved a **new state-of-the-art** on
the Stanford Question Answering Dataset (SQuAD).
Furthermore, we improved the previous best-reported number by **+5%** on
an adversarial dataset for machine comprehension. Furthermore, it also shows decent improvement when applied
on natural language inference task.

__Publication:__**H.-Y. Huang**, C. Zhu, Y. Shen, W. Chen. FusionNet: Fusing via Fully-aware Attention with Application to Machine Comprehension. Sixth International Conference on Learning Representations (ICLR 2018).

The goal of recommendation system is to create an algorithm that is capable of learning to give accurate recommendation to the users, such as music recommendation on iTunes, or movie recommendation on Netflix. Most traditional approaches exploit **user on item ratings** (e.g., 1 to 5 stars) to learn a good recommendation. But in practice, people don't often leave a rating after watching a movie, making the traditional approach less useful. In this work, we assume a more natural case where only **boolean ratings** are present (e.g., user A watched item B or not), and develop effective algorithm to give accurate recommendation.

The scenario can be illustrated by the above figure. To solve this problem, an approach is to learn a **latent vector** for each people and item, where the inner-product of the latent vectors characterize the inclination of the user on the item.
Therefore to accurately **predict human preference**, we have to find the best latent representation for the users and the items.
In this work, we proposed a general framework for incorporating feature information (e.g. the age and nationality of users or the characteristic of the item) and graph information (e.g. the Facebook friends relationship of the users) to learn a better representation.
Furthermore, we investigate the use of **general convex loss** (where we are the first to design efficient optimization method for losses other than the square loss), and found that classification loss can find a more **accurate latent representation**, leading to substantially improved performance.

__Publication:__

H.-F. Yu, **H.-Y. Huang**, I. S. Dhillon, C.-J. Lin. A Unified Algorithm for One-class Structured Matrix Factorization with Side Information. AAAI Conference on Artificial Intelligence (AAAI 2017).

While quantum mechanics can accurately describe our universe, the equation governing the quantum system is too complicated to be soluble through human analysis. To apply this great theory to real world scenarios, such as in the investigation of electronic structure or in the prediction of complex system in atomic scale, we have to resort to computational approach. In this project, I studied and implemented two well-known approach for solving the properties of quantum systems: **the Fourier grid Hamiltonian method** and **Quantum Monte Carlo method**.

Fourier grid Hamiltonian (FGH) method elegantly discretized the problem through momentum space representation into an finite Eigenvalue problem. The eigenvalue decomposition is solved using Eigen Library in my implementation. The above artistic figures are electron excited states computed using this method. The lowest 150 excited states for 2D simple harmonic oscillator and for a special potential can be found here and here on DropBox.

Quantum Monte Carlo (QMC) method is an interesting algorithm that can efficiently solve for the ground state property of any multiple-Boson system. The idea is to propagate in the imaginary time through Feynman's Path Integral Formulation. And by viewing the imaginary time propagation as a birth-death process with random walking replicas. A visualization of this method is available on YouTube. I have also implemented some real-world situations to demonstrate the effectiveness of this method.

In this project, we have created a special visual effects film: Speedy X Nerdy with Photographer. To enjoy our film, please click on the project title or the film name above. The technique we used, includes complex matting and compositing, surface tracking, physical simulation, camera tracking, and many others. There are totally 16 major special effects in this films, for more details you can view the slide to the left. We have also created a making-of video demonstrating the creation for some of the visual effects, you may click on it if you are interested in how we have made this film. Feel free to contact me, if you are curious about the detail of other visual effects.

]]>While machine learning is useful in many area, it remains challenging for both experts and non-experts to apply them to a newly encountered problem. In particular, we focus on classification problem where we try to learn an algorithm to classify each feature vectors to a discrete class. One method, called **kernel classifier**, achieves the best score for most problems, but is extremely slow. Another method, called **linear classifier**, can sometimes be as good as kernel classifier (while losing for others), but is extremely fast. In this work, we tackle the problem of deciding whether you need to train the bulky kernel classifier when you arrived at a new learning problem.

Finding the solution with a single pass on the data set is extremely difficult. So we propose to make decision by training a **special classifier** that is efficient and satisfy the same relation with linear classifiers (i.e. if kernel >> linear, then special classifier >> linear; if kernel ~ linear, then special classifier ~ linear). Thus by comparing the relation between the special and the linear classifier, we can predict the behaviour of the expensive kernel classifier. Note that the usefulness of this strategy depends on the efficiency of the special classifier, where our proposed method is able to run in **similar time as linear classfier**. And has excellent prediction performance when used empirically.

__Publication:__**H.-Y. Huang**, C.-J. Lin. Linear and Kernel Classification: When to Use Which?, SIAM International Conference on Data Mining (SDM 2016).

In this project, we wrote a program that can automatically stitch and blend several photos to create panorama. First of all, we warp the images onto a cylinder so they can be joined. Then we apply feature detection and feature matching algorithm (using MSOP and KD-tree) to find the corresponding pixels in consecutive images. Next, we use RANSAC algorithm to robustly stitch the images by exploiting the pixel matching informations. Then we perform a greedy graph cut algorithm to find good seamlines between consecutive images. And at last, we wrote a conjugate gradient solver to solve for the partial differential equation (with the seamline found using graph cut as the boundary condition) used in Poisson Blending to seamlessly combine the images. Due to our sophisticated blending technique, our produced panorama is robust and natural. For example, it can handle the case when people have moved during the shooting of panorama. Therefore our artifacts have **won the first place** when taking this course with ~130 competitors. We have wrote everything in C++ with openCV for image I/O.

The light intensity in our world ranges across several magnitudes. However, our usual camera can not capture images with such rich information. In this project, we aim to reconstruct the actual light intensity by taking several normal images under different exposure time. We have implemented the following components: (1) Image Alignment using Ward's MTB algorithm; (2) HDR reconstruction algorithm, where we have investigated Debevec's and Robertson's algorithm; and (3) Our own ghost removal algorithm. The created high dynamic range (HDR) image is then stored in EXR format. And the final artifact is created after applying tone mapping to the HDR image. We have wrote everything in C++ with openCV for image I/O.

]]>