Chairman : Rita FIORESI
In this paper, we explore the fundamental role of the Monge–Ampère equation in deep learning, particularly in the context of Boltzmann machines and energy-based models. We first review the structure of Boltzmann learning and its relation to free energy minimization. We then establish a connection between optimal transport theory and deep learning, demonstrating how the Monge–Ampère equation governs probability transformations in generative models. Additionally, we provide insights from quantum geometry, showing that the space of covariance matrices arising in the learning process coincides with the Connes–Araki–Haagerup (CAH) cone in von Neumann algebra theory. Furthermore, we introduce an alternative approach based on renormalization group (RG) flow, which, while distinct from the optimal transport perspective, reveals another manifestation of the Monge–Ampère domain in learning dynamics. This dual perspective offers a deeper mathematical understanding of hierarchical feature learning, bridging concepts from statistical mechanics, quantum geometry, and deep learning theory.
A Markov chain is called lumpable if it can be aggregated into a simpler form by merging certain states, while still preserving the Markov property in the reduced chain. In this paper, we initiate the problem of characterizing exponential families of lumpable Markov chains with respect to a fixed lumping map, and we provide the first set of necessary and sufficient conditions for this characterization.
In the context of information geometry, the concept known as left-invariant statistical structure on Lie groups is defined by Furuhata–Inoguchi–Kobayashi [Inf.Geom.(2021)]. In this poster presentation, we introduce the notion of the moduli space of left-invariant statistical structures on a Lie group. We study the moduli spaces for three particular Lie groups, each of which has a moduli space of left-invariant Riemannian metrics that is a singleton. As applications, we classify left-invariant conjugate symmetric statistical structures and left-invariant dually flat structures on these three Lie groups. A characterization of the Amari–Chentsov $\alpha$-connection on the Takano Gaussian space is also given. This talk is based on joint work with Yu Ohno, Takayuki Okuda (Hiroshima University), and Hiroshi Tamaru (Osaka Metropolitan University).
We consider order preserving $C^3$ circle maps with a flat piece, Fibonacci rotation number, and negative Schwarzian derivative where the critical exponents (the degrees of the singularities at the boundary of the flat piece) might be different.
In this work, we assume that critical exponents belong to $(1,2)^2$. As a consequence, the geometry of the wondering (fractal) set of a map our class is degenerate (that is, the scaling ratio goes to zero), and we prove that its renormalization diverges.
The Value of Information (VoI) framework, as developed by Ruslan Stratonovich, bridges Claude Shannon’s information theory with economics, in particular the fields of utility and decision theory. This paper revisits the VoI concept within the Boolean setting well known from hypothesis testing. Then the paper explores examples and results that extend the Bayesian VoI framework beyond the Boolean case.
The research examines how Dual Quaternion Variational Autoencoders (DQVAE) advance robotic mobility by applying them in practice. Quaternion Variational Autoencoders (QVAE) operates normally to monitor rotational movements yet lacks the capability to include translational movement. The development of QVAE into DQVAE enhances its ability to understand motion that combines simultaneous rotation with translation movements.
The study aims to develop a Riemann (Study) Manifold structure inside SE(3) latent space for generating optimal end effector route trajectories through geodesics. The proposed solution applies Riemannian manifold geometry to develop flexible and accurate motion learning tasks. We can use this dual quaternion manifold, to find out geodesic for optimal motion planning. It can be used even to avoid dynamic obstacles.
Improvement in many kinds of human use technology are currently heavily dependent on finding new materials with enhanced properties. In this paper we employ a progressive computational physics approach to continually search and discover new materials with enhanced desired properties. Our search process is a progressive and controlled updating of stochastic Hamiltonian that dictate ever improving dynamical pathways with continually enhanced explore and exploit parameters for guided discovery. We demonstrate our approach on discovering various new candidate materials with enhanced properties for optoelectronics and photovoltaics.
We present SGHD (Stochastic Gradient Hamiltonian Dynamics), a new fast and convergent stochastic process gradient optimization using second order stochastic Hamiltonian dynamics. Our SGHD optimizer couples to a potential energy gradient estimator term, to achieve a convergence rate of $O(\epsilon^{-4})$ in non-convex optimization. To achieve this convergence rate, we assume access to a queryable Laplacian operator of the potential energy function of the Hamiltonian. Furthermore, we show a new generalization error bound, for SGHD optimized non-convex machine learning task objective functions. Such generalization bound had not been fully explored with respect to stochastic Hamiltonian dynamics.
Thanks to the interpretation of diffusion-based generative models as an ordinary differential equation (ODE), recent works have started leveraging numerical methods to accelerate inference.
An important metric for generative models is the so-called negative log-likelihood (NLL), which compares the distribution of generated data with that of the base data via cross-entropy. One advantage of the ODE approach is that it allows for an efficient computation of this metric, but this stops being true with discrete time-stepping, which is probably why the NLL is not reported for acceleration methods.
In this paper, we study the impact of numerical methods on this metric in the context of score-matching. This requires inverting the numerical scheme and efficiently computing the log-determinant of its Jacobian matrix. We illustrate our findings on Gaussian mixtures, for which the score function is known analytically at all times, and for Fashion-MNIST, a dataset of small images.
Contrails, aircraft-induced cirrus clouds, likely have a largecontribution on climate change. Their presence can be verified via re-mote sensors (e.g. geostationary satellites), which helps in establishing climate impact calculations. However, contrail detection is a challenging, highly unbalanced computer vision task, similar to medical imagery, where most of the pixels constitute the background. In this work, we propose ECoNet – a rotation-equivariant U-Net exploiting these naturalsymmetries present in satellite imagery. We show that equivariant net-works bring benefits in detection accuracy with less parameters to trainand training time reduced by a factor of 3. We also explore the trans-fer learning on European geostationary satellites to assess its scalability potential.
Chentsov’s Theorem is foundational in information geometry and characterizes the Riemannian metrics satisfying Markov invariance. Amari showed that f-divergences are the unique divergences that satisfy both information monotonicity and decomposibility. In this paper, by relating information monotonicity to Markov invariance, and by generalizing Amari’s result, we give a short proof of Chentsov’s Theorem. To that end, we introduce two auxiliary theorems. The first auxiliary theorem (Theorem 3) introduces a concept called local-decomposibility (as opposed to decomposibility), and uses it to characterize f-divergences. The second auxiliary theorem (Theorem 4) states that decomposibility is necessary for Markov-invariant Riemannian metrics on probability simplices, the proof of which utilizes a key idea from Campbell’s proof for a generalization of Chentsov’s Theorem. Finally to complete the proof, we apply an elementary result in Riemannian geometry, where the bilinear form given by the second-order Taylor series of the squared geodesic distance recovers the underlying Riemannian metric. As an application of Chentsov’s Theorem and as a concrete example of the interplay between divergences and Riemannian metrics on probability simplices, we introduce a new characterization of the Fisher-Rao distance (Theorem 5).
Integrating symmetry priors is crucial for robust machine learning. While equivariant neural networks, which enforce symmetry through specialized layer design (e.g., GNNs, $E(n)$-equivariant networks), represent the dominant paradigm, they can be complex to design for arbitrary groups. This paper proposes an alternative, complementary approach rooted in classical Invariant Theory. Instead of relying on intrinsically equivariant architectures, we introduce a novel symmetry loss function added to the standard supervised objective. This loss penalizes deviations from desired invariance or equivariance properties under a specified group action, effectively regularizing the model towards symmetric solutions without mandating architectural constraints. We formalize this concept, presenting a framework combining standard feature extractors with the symmetry loss, and discuss its theoretical underpinnings. This approach offers flexibility in applying symmetry principles to diverse models. This work presents a promising new direction for leveraging symmetry in supervised learning via loss-based regularization.
In information geometry, a pair of a primal affine connection and its dual affine connection plays an important role. It is natural to consider tensor fields that reflect the geometry of mutually dual affine connections, not just the curvature tensor field and the torsion tensor field. From this motivation, statistical relative curvature tensor fields and statistical relative torsion tensor fields are introduced.