88, respectively, for the A phase classification, while for the non-rapid eye movement estimation, the results were 88% and 0.95, respectively. The cyclic alternating pattern cycle classification accuracy was 79% for the same model, while the cyclic alternating pattern rate percentage error was 22%.Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state-of-the-art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We demonstrate that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.Federated learning is a framework for multiple devices or institutions, called local clients, to collaboratively train a global model without sharing their data. For federated learning with a central server, an aggregation algorithm integrates model information sent from local clients to update the parameters for a global model. Sample mean is the simplest and most commonly used aggregation method. However, it is not robust for data with outliers or under the Byzantine problem, where Byzantine clients send malicious messages to interfere with the learning process. Some robust aggregation methods were introduced in literature including marginal median, geometric median and trimmed-mean. In this article, we propose an alternative robust aggregation method, named γ-mean, which is the minimum divergence estimation based on a robust density power divergence. This γ-mean aggregation mitigates the influence of Byzantine clients by assigning fewer weights. This weighting scheme is data-driven and controlled by the γ value. LTGO-33 research buy Robustness from the viewpoint of the influence function is discussed and some numerical results are presented.A computational technique for the determination of optimal hiding conditions of a digital image in a self-organizing pattern is presented in this paper. Three statistical features of the developing pattern (the Wada index based on the weighted and truncated Shannon entropy, the mean of the brightness of the pattern, and the p-value of the Kolmogorov-Smirnov criterion for the normality testing of the distribution function) are used for that purpose. The transition from the small-scale chaos of the initial conditions to the large-scale chaos of the developed pattern is observed during the evolution of the self-organizing system. Computational experiments are performed with the stripe-type patterns, spot-type patterns, and unstable patterns. It appears that optimal image hiding conditions are secured when the Wada index stabilizes after the initial decline, the mean of the brightness of the pattern remains stable before dropping down significantly below the average, and the p-value indicates that the distribution becomes Gaussian.Shannon's entropy is one of the building blocks of information theory and an essential aspect of Machine Learning (ML) methods (e.g., Random Forests). Yet, it is only finitely defined for distributions with fast decaying tails on a countable alphabet. The unboundedness of Shannon's entropy over the general class of all distributions on an alphabet prevents its potential utility from being fully realized. To fill the void in the foundation of information theory, Zhang (2020) proposed generalized Shannon's entropy, which is finitely defined everywhere. The plug-in estimator, adopted in almost all entropy-based ML method packages, is one of the most popular approaches to estimating Shannon's entropy. The asymptotic distribution for Shannon's entropy's plug-in estimator was well studied in the existing literature. This paper studies the asymptotic properties for the plug-in estimator of generalized Shannon's entropy on countable alphabets. The developed asymptotic properties require no assumptions on the original distribution. The proposed asymptotic properties allow for interval estimation and statistical tests with generalized Shannon's entropy.Purpose In this work, we propose an implementation of the Bienenstock-Cooper-Munro (BCM) model, obtained by a combination of the classical framework and modern deep learning methodologies. The BCM model remains one of the most promising approaches to modeling the synaptic plasticity of neurons, but its application has remained mainly confined to neuroscience simulations and few applications in data science. Methods To improve the convergence efficiency of the BCM model, we combine the original plasticity rule with the optimization tools of modern deep learning. By numerical simulation on standard benchmark datasets, we prove the efficiency of the BCM model in learning, memorization capacity, and feature extraction. Results In all the numerical simulations, the visualization of neuronal synaptic weights confirms the memorization of human-interpretable subsets of patterns. We numerically prove that the selectivity obtained by BCM neurons is indicative of an internal feature extraction procedure, useful for patterns clustering and classification. The introduction of competitiveness between neurons in the same BCM network allows the network to modulate the memorization capacity of the model and the consequent model selectivity. Conclusions The proposed improvements make the BCM model a suitable alternative to standard machine learning techniques for both feature selection and classification tasks.When rotating machinery fails, the consequent vibration signal contains rich fault feature information. However, the vibration signal bears the characteristics of nonlinearity and nonstationarity, and is easily disturbed by noise, thus it may be difficult to accurately extract hidden fault features. To extract effective fault features from the collected vibration signals and improve the diagnostic accuracy of weak faults, a novel method for fault diagnosis of rotating machinery is proposed. The new method is based on Fast Iterative Filtering (FIF) and Parameter Adaptive Refined Composite Multiscale Fluctuation-based Dispersion Entropy (PARCMFDE). Firstly, the collected original vibration signal is decomposed by FIF to obtain a series of intrinsic mode functions (IMFs), and the IMFs with a large correlation coefficient are selected for reconstruction. Then, a PARCMFDE is proposed for fault feature extraction, where its embedding dimension and class number are determined by Genetic Algorithm (GA). Finally, the extracted fault features are input into Fuzzy C-Means (FCM) to classify different states of rotating machinery. The experimental results show that the proposed method can accurately extract weak fault features and realize reliable fault diagnosis of rotating machinery.We present a new class of estimators of Shannon entropy for severely undersampled discrete distributions. It is based on a generalization of an estimator proposed by T. Schürmann, which itself is a generalization of an estimator proposed by myself.For a special set of parameters, they are completely free of bias and have a finite variance, something which is widely believed to be impossible. We present also detailed numerical tests, where we compare them with other recent estimators and with exact results, and point out a clash with Bayesian estimators for mutual information.In 2016, Steve Gull has outlined has outlined a proof of Bell's theorem using Fourier theory. Gull's philosophy is that Bell's theorem (or perhaps a key lemma in its proof) can be seen as a no-go theorem for a project in distributed computing with classical, not quantum, computers. We present his argument, correcting misprints and filling gaps. In his argument, there were two completely separated computers in the network. We need three in order to fill all the gaps in his proof a third computer supplies a stream of random numbers to the two computers representing the two measurement stations in Bell's work. One could also imagine that computer replaced by a cloned, virtual computer, generating the same pseudo-random numbers within each of Alice and Bob's computers. Either way, we need an assumption of the presence of shared i.i.d. randomness in the form of a synchronised sequence of realisations of i.i.d. hidden variables underlying the otherwise deterministic physics of the sequence of trials. Gull's proof then just needs a third step rewriting an expectation as the expectation of a conditional expectation given the hidden variables.Learning the relationship between the part and whole of an object, such as humans recognizing objects, is a challenging task. In this paper, we specifically design a novel neural network to explore the local-to-global cognition of 3D models and the aggregation of structural contextual features in 3D space, inspired by the recent success of Transformer in natural language processing (NLP) and impressive strides in image analysis tasks such as image classification and object detection. We build a 3D shape Transformer based on local shape representation, which provides relation learning between local patches on 3D mesh models. Similar to token (word) states in NLP, we propose local shape tokens to encode local geometric information. On this basis, we design a shape-Transformer-based capsule routing algorithm. By applying an iterative capsule routing algorithm, local shape information can be further aggregated into high-level capsules containing deeper contextual information so as to realize the cognition from the local to the whole. We performed classification tasks on the deformable 3D object data sets SHREC10 and SHREC15 and the large data set ModelNet40, and obtained profound results, which shows that our model has excellent performance in complex 3D model recognition and big data feature learning.State-of-the-art speech watermarking techniques enable speech signals to be authenticated and protected against any malicious attack to ensure secure speech communication. In general, reliable speech watermarking methods must satisfy four requirements inaudibility, robustness, blind-detectability, and confidentiality. We previously proposed a method of non-blind speech watermarking based on direct spread spectrum (DSS) using a linear prediction (LP) scheme to solve the first two issues (inaudibility and robustness) due to distortion by spread spectrum. This method not only effectively embeds watermarks with small distortion but also has the same robustness as the DSS method. There are, however, two remaining issues with blind-detectability and confidentiality. In this work, we attempt to resolve these issues by developing an approach called the LP-DSS scheme, which takes two forms of data embedding for blind detection and frame synchronization. We incorporate blind detection with frame synchronization into the scheme to satisfy blind-detectability and incorporate two forms of data embedding process, front-side and back-side embedding for blind detection and frame synchronization, to satisfy confidentiality.LTGO-33 research buy