OZAWA Seiichi | ![]() |
Center for Mathematical and Data Sciences | |
Professor | |
Electro-Communication Engineering |
Seiichi Ozawa received Dr. Eng. in computer science from Kobe University. He is currently the deputy director of The Center for Mathematical and Data Sciences along with a full professor with Department of Electrical and Electronic Engineering, Graduate School of Engineering and The Center for Advanced Medical Engineering Research & Development, Kobe University, Japan. His current research interests are machine learning, incremental learning, big data analytics, cybersecurity, text mining, computer vision, and privacy-preserving machine learning. He published more than 160 journal and conference papers, and book chapters/monographs. He is currently an associate editor of IEEE Trans. on Cybernetics and 2 international journals. He is the Vice-President of International Neural Network Society and the Immdiate-Past President of Asia Pacific Neural Network Society. He is a member of Neural Networks TC and Smart World TC of IEEE CI Society.
Oct. 2022 Information Processing Society of Japan, CSEC Excellent Research Award, 機械学習を用いた悪性TLS通信の検知と通信特徴の推移に関する考察
Oct. 2020 Information Processing Society of Japan, CSS2020 Concept Research Award, Darknet Scan Packet Analysis Using Port Embedding Vector
Dec. 2019 Asia Pacific Neural Network Society, APNNS Excellent Service Award
Apr. 2011 IEEE Computational Intelligence Society, EAIS 2011 Outstanding Paper Award, "Incremental Recursive Fisher Linear Discriminant for Online Feature Extraction"に関する研究
Scientific journal
Privacy protection has attracted increasing attention, and privacy concerns often prevent flexible data utilization. In most industries, data are distributed across multiple organizations due to privacy concerns. Federated learning (FL), which enables cross-organizational machine learning by communicating statistical information, is a state-of-the-art technology that is used to solve this problem. However, for gradient boosting decision tree (GBDT) in FL, balancing communication efficiency and security while maintaining sufficient accuracy remains an unresolved problem. In this paper, we propose an FL scheme for GBDT, i.e., efficient FL for GBDT (eFL-Boost), which minimizes accuracy loss, communication costs, and information leakage. The proposed scheme focuses on appropriate allocation of local computation (performed individually by each organization) and global computation (performed cooperatively by all organizations) when updating a model. It is known that tree structures incur high communication costs for global computation, whereas leaf weights do not require such costs and are expected to contribute relatively more to accuracy. Thus, in the proposed eFL-Boost, a tree structure is determined locally at one of the organizations, and leaf weights are calculated globally by aggregating the local gradients of all organizations. Specifically, eFL-Boost requires only three communications per update, and only statistical information that has low privacy risk is leaked to other organizations. Through performance evaluation on public data sets (ROC AUC, Log loss, and F1-score are used as metrics), the proposed eFL-Boost outperforms existing schemes that incur low communication costs and was comparable to a scheme that offers no privacy protection.
Corresponding, IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 22 Apr. 2022, IEEE Access, 10, 43954 - 43963, English[Refereed]
Scientific journal
International conference proceedings
Scientific journal
[Refereed]
Scientific journal
[Refereed]
Scientific journal
In recent years, top referred methods on object detection like R-CNN have implemented this task as a combination of proposal region generation and supervised classification on the proposed bounding boxes. Although this pipeline has achieved state-of-the-art results in multiple datasets, it has inherent limitations that make object detection a very complex and inefficient task in computational terms. Instead of considering this standard strategy, in this paper we enhance Detection Transformers (DETR) which tackles object detection as a set-prediction problem directly in an end-to-end fully differentiable pipeline without requiring priors. In particular, we incorporate Feature Pyramids (FP) to the DETR architecture and demonstrate the effectiveness of the resulting DETR-FP approach on improving logo detection results thanks to the improved detection of small logos. So, without requiring any domain specific prior to be fed to the model, DETR-FP obtains competitive results on the OpenLogo and MS-COCO datasets offering a relative improvement of up to 30%, when compared to a Faster R-CNN baseline which strongly depends on hand-designed priors.
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 30 Jul. 2021, IEEE Access, 9, 106998 - 107011, English[Refereed]
Scientific journal
[Refereed]
International conference proceedings
[Refereed]
International conference proceedings
[Refereed]
In book
[Refereed]
In book
Social networks have attracted the attention of psychologists, as the behavior of users can be used to assess personality traits, and to detect sentiments and critical mental situations such as depression or suicidal tendencies. Recently, the increasing amount of image uploads to social networks has shifted the focus from text to image-based personality assessment. However, obtaining the ground-truth requires giving personality questionnaires to the users, making the process very costly and slow, and hindering research on large populations. In this paper, we demonstrate that it is possible to predict which images are most associated with each personality trait of the OCEAN personality model, without requiring ground-truth personality labels. Namely, we present a weakly supervised framework which shows that the personality scores obtained using specific images textually associated with particular personality traits are highly correlated with scores obtained using standard text-based personality questionnaires. We trained an OCEAN trait model based on Convolutional Neural Networks (CNNs), learned from 120K pictures posted with specific textual hashtags, to infer whether the personality scores from the images uploaded by users are consistent with those scores obtained from text. In order to validate our claims, we performed a personality test on a heterogeneous group of 280 human subjects, showing that our model successfully predicts which kind of image will match a person with a given level of a trait. Looking at the results, we obtained evidence that personality is not only correlated with text, but with image content too. Interestingly, different visual patterns emerged from those images most liked by persons with a particular personality trait: for instance, pictures most associated with high conscientiousness usually contained healthy food, while low conscientiousness pictures contained injuries, guns, and alcohol. These findings could pave the way to complement text-based personality questionnaires with image-based questions.
MDPI, Nov. 2020, APPLIED SCIENCES-BASEL, 10 (22), English[Refereed]
Scientific journal
Obfuscation is rampant in both benign and malicious JavaScript (JS) codes. It generates an obscure and undetectable code that hinders comprehension and analysis. Therefore, accurate detection of JS codes that masquerade as innocuous scripts is vital. The existing deobfuscation methods assume that a specific tool can recover an original JS code entirely. For a multi-layer obfuscation, general tools realize a formatted JS code, but some sections remain encoded. For the detection of such codes, this study performs Deobfuscation, Unpacking, and Decoding (DUD-preprocessing) by function redefinition using a Virtual Machine (VM), a JS code editor, and a python int_to_str() function to facilitate feature learning by the FastText model. The learned feature vectors are passed to a classifier model that judges the maliciousness of a JS code. In performance evaluation, the authors use the Hynek Petrak's dataset for obfuscated malicious JS codes and the SRILAB dataset and the Majestic Million service top 10,000 websites for obfuscated benign JS codes. They then compare the performance to other models on the detection of DUD-preprocessed obfuscated malicious JS codes. Their experimental results show that the proposed approach enhances feature learning and provides improved accuracy in the detection of obfuscated malicious JS codes.
Corresponding, INST ENGINEERING TECHNOLOGY-IET, Sep. 2020, CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 5 (3), 184 - 192, English[Refereed]
Scientific journal
JavaScript is a dynamic computer programming language that has been used for various cyberattacks on client-side web applications. Malicious behaviors in JavaScript are injected on purpose as the outputs of web applications, such as redirection and pop-up texts or images. It exploits vulnerabilities by using a variety of methods such as drive-by download or cross-site scripting. To protect users from such cyberattacks, we propose a deep neural network for detecting malicious JavaScript codes by examining their bytecode sequences. We use the V8 JavaScript compiler to generate a bytecode sequence, which corresponds to an abstract form of machine codes. The benefit of using bytecode representation is that we can easily break complex obfuscation in JavaScript. To identify the attacker's malicious intention, We adopt a deep pyramid convolutional neural network (DPCNN) combining with recurrent neural network models, which can handle long-range associations in a bytecode sequence. In our experiment, various recurrent networks are testified to encode temporal features of code behaviors, and our results show that the proposed approach provides high accuracy in detection of malicious JavaScript.
Corresponding, IEEE, Jul. 2020, 2020 International Joint Conference on Neural Networks (IJCNN), 1 - 8, English[Refereed]
International conference proceedings
In this paper, a soybean flower/seedpod detection system is built for collecting growing state information by introducing convolutional neural networks, aiming that observed plant states (e.g., #flowers and #seedpods) are used to predict the crop yields of soybeans by combining the environment information in future. To predict the crop yields (i.e., quantity of seedpods) precisely, it is considered important to know how the number of flowers are translated over time and how such flower transients can affect the final yields of soybeans. However, there has not existed a way to measure the number of flowers in real environments. For this purpose, We propose a deep learning approach to automatically detect flower and seedpod regions from images which are taken in real soybean fields without environmental control. Various object detection methods are compared, including RetinaNet, Faster R-CNN, and Cascade R-CNN. Ablation studies are provided to analyze how these methods perform on both flower and seedpod across different parameters. In our experimental results, Cascade R-CNN gives the best average precision (AP) of 89.6, while RetinaNet and Faster R-CNN give AP of 83.3 and 88.7, respectively. Cascade RCNN also attains the highest accuracy in detecting small objects, which are not easily detected by other models. With accurate detection, the system is expected to contribute to constructing high-performance measurement for soybean flowers and seedpods, which ultimately leads to better pipeline in evaluating plant status.
Corresponding, IEEE, Jul. 2020, 2020 International Joint Conference on Neural Networks (IJCNN), 1 - 7, English[Refereed]
International conference proceedings
Along with the proliferation of Internet of Things (IoT) devices, cyberattacks towards these devices are on the rise. In this paper, we present a study on applying Association Rule Learning to discover the regularities of these attacks from the big stream data collected on a large-scale darknet. By exploring the regularities in IoT-related indicators such as destination ports, type of service, and TCP window sizes, we succeeded in discovering the activities of attacking hosts associated with well-known classes of malware programs. As a case study, we report an interesting observation of the attack campaigns before and after the first source code release of the well-known IoT malware Mirai. The experiments show that the proposed scheme is effective and efficient in early detection and tracking of activities of new malware on the Internet and hence induces a promising approach to automate and accelerate the identification and mitigation of new cyber threats.
SPRINGER, Feb. 2020, INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 19 (1), 83 - 92, English, International magazine[Refereed]
Scientific journal
[Refereed]
Scientific journal
[Refereed]
In book
[Refereed]
International conference proceedings
Websites attract millions of visitors due to the convenience of services they offer, which provide for interesting targets for cyber attackers. Most of these websites use JavaScript (JS) to create dynamic content. The exploitation of vulnerabilities in servers, plugins, and other third-party systems enables the insertion of malicious codes into websites. These exploits use methods such as drive-by-downloads, pop up ads, and phishing attacks on news, porn, piracy, torrent or free software websites, among others. Many of the recent cyber-attacks exploit JS vulnerabilities, in some cases employing obfuscation to hide their maliciousness and evade detection. It is, therefore, primal to develop an accurate detection system for malicious JS to protect users from such attacks. This study adopts Abstract Syntax Tree (AST) for code structure representation and a machine learning approach to conduct feature learning called Doc2vec to address this issue. Doc2vec is a neural network model that can learn context information of texts with variable length. This model is a well-suited feature learning method for JS codes, which consist of text content ranging among single line sentences, paragraphs, and full-length documents. Besides, features learned with Doc2Vec are of low dimensions which ensure faster detections. A classifier model judges the maliciousness of a JS code using the learned features. The performance of this approach is evaluated using the D3M dataset (Drive-by-Download Data by Marionette) for malicious JS codes and the JSUNPACK plus Alexa top 100 websites datasets for benign JS codes. We then compare the performance of Doc2Vec on plain JS codes (Plain-JS) and AST form of JS codes (AST-JS) to other feature learning methods. Our experimental results show that the proposed AST features and Doc2Vec for feature learning provide better accuracy and fast classification in malicious JS codes detection compared to conventional approaches and can flag malicious JS codes previously identified as hard-to-detect. (C) 2019 The Authors. Published by Elsevier B.V.
Corresponding, ELSEVIER, Nov. 2019, APPLIED SOFT COMPUTING, 84, English, International magazine[Refereed]
Scientific journal
[Refereed]
International conference proceedings
[Refereed]
International conference proceedings
Recently, smart agriculture, a new approach to farming using ICT, has been received great attention. To control cultivate condition precisely, it is important to capture the growth state of plants as well as environmental factors such as temperature, moisture, solar radiation, etc. In this paper, we propose an image sensing method to detect soy flowers and seedpods as growth fa
IEEE SMC, Oct. 2018, Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics, 1 - 6, English[Refereed]
International conference proceedings
To add more functionality and enhance usability of web applications, JavaScript (JS) is frequently used. Even with many advantages and usefulness of JS, an annoying fact is that many recent cyberattacks such as drive-by-download attacks exploit vulnerability of JS codes. In general, malicious JS codes are not easy to detect, because they sneakily exploit vulnerabilities of brow
IEEE, Jul. 2018, Proc. of 2018 International Joint Conference on Neural Networks, 1 - 7, English[Refereed]
International conference proceedings
Our research group is working on soybeans which the quantity of yielding is difficult to predict. We focus on the common characteristics observed at multiple cultivation points, in order to examine methods to acquire new knowledge in deciding the work based on the amount of yields. Our previous study has examined a method to discover optimal patterns using qualitative value of
Apr. 2018, Journal of the Institute of Industrial Applications Engineers (Web), 6 (2), 66‐72 (WEB ONLY), English[Refereed][Invited]
Scientific journal
To add more functionality and enhance usability of web applications, JavaScript (JS) is frequently used. Even with many advantages and usefulness of JS, an annoying fact is that many recent cyberattacks such as drive-by-download attacks exploit vulnerability of JS codes. In general, malicious JS codes are not easy to detect, because they sneakily exploit vulnerabilities of browsers and plugin software, and attack visitors of a web site unknowingly. To protect users from such threads, the development of an accurate detection system for malicious JS is soliciting. Conventional approaches often employ signature and heuristic-based methods, which are prone to suffer from zero-day attacks, i.e., causing many false negatives and/or false positives. For this problem, this paper adopts a machinelearning approach to feature learning called Doc2Vec, which is a neural network model that can learn context information of texts. The extracted features are given to a classifier model (e.g., SVM and neural networks) and it judges the maliciousness of a JS code. In the performance evaluation, we use the D3M Dataset (Drive-by-Download Data by Marionette) for malicious JS codes and JSUPACK for benign ones for both training and test purposes. We then compare the performance to other feature learning methods. Our experimental results show that the proposed Doc2Vec features provide better accuracy and fast classification in malicious JS code detection compared to conventional approaches.
Corresponding, IEEE, 2018, 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018-July, 1 - 8, English[Refereed]
International conference proceedings
We are living in an information age where all our personal data and systems are connected to the Internet and accessible from more or less anywhere in the world. Such systems can be prone to cyber-attacks; therefore the monitoring and identification of cyber-attacks play a significant role in preventing the abuse of our data and systems. The majority of such systems proposed in the literature are based on a model/classifiers built with the help of classical/off-line learning methods on a learning data set. Since cyber-attacks evolve over time such models or classifiers sooner or later become outdated. To keep a proper system functioning the models need to be updated over a period of time. When dealing with models/classifiers learned by classical off-line methods, this is an expensive and time-consuming task. One way to keep the models updated is to use evolving methodologies to learn and adapt the models in an on-line manner. Such methods have been developed, extensively studied and implemented for regression problems. The presented paper introduces a novel evolving possibilistic Cauchy clustering (eCauchy) method for classification problems. The given method is used as a basis for large-scale monitoring of cyber-attacks. By using the presented method a more flexible system for detection of attacks is obtained. The approach was tested on a database from 1999 KDD intrusion detection competition. The obtained results are promising. The presented method gives a comparable degree of accuracy on raw data to other methods found in the literature; however, it has the advantage of being able to adapt the classifier in an on-line manner. The presented method also uses less labeled data to learn the classifier than classical methods presented in the literature decreasing the costs of data labeling. The study is opening a new possible application area for evolving methodologies. In future research, the focus will be on implementing additional data filtering and new algorithms to optimize the classifier for detection of cyber-attacks. (C) 2017 Elsevier B.V. All rights reserved.
ELSEVIER, Jan. 2018, APPLIED SOFT COMPUTING, 62, 592 - 601, English[Refereed]
Scientific journal
Recently, smart agriculture, a new approach to farming using ICT, has been received great attention. To control cultivate condition precisely, it is important to capture the growth state of plants as well as environmental factors such as temperature, moisture, solar radiation, etc. In this paper, we propose an image sensing method to detect soy flowers and seedpods as growth factors using a state-of-the-art deep learning architecture called Single Shot MultiBox Detector (SSD). Images of soybeans were taken at Hokkaido Agricultural Research Center from Year 2015 to 2017 and we carry out the performance test for our system using a dataset of soybean images. The detection accuracy for seedpods and flowers are 0.586 and 0.646 in F-measure, respectively.
IEEE, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 1693 - 1698, English[Refereed]
International conference proceedings
In this paper, we report an interesting observation of the darknet traffic before the source code of IoT malware Mirai was first opened on September 7th 2016. In our darknet analysis, the frequent pattern mining and the association rule learning were performed to a large set of TCP SYN packets collected from July 1st 2016 to September 15th 2016 with the NICT /16 darknet sensor. The number of collected packets is 1,840,973,403 packets in total which were sent from 17,928,006 unique hosts. In this study, we focus on the frequently appeared combinations of "window sizes" in TCP headers. We successfully extracted a certain number of frequent patters and association rules on window sizes, and we specified source hosts that sent out SYN packets matched with either of the extracted rules. In addition, we show that almost all such hosts sent SYN packets satisfying the three conditions known from the source code of Mirai. Such hosts started their scan activities from August 2nd 2016, and ended on September 4th 2016 (i.e., 3 days before the source code was opened). (C) 2018 The Authors. Published by Elsevier Ltd.
Corresponding, ELSEVIER SCIENCE BV, 2018, INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 144 (144), 118 - 123, English[Refereed]
International conference proceedings
Many services for data analysis require customer's data to be exposed and privacy issues are critical in related fields. To address this problem, we propose a Privacy-Preserving Naive Bayes classifier (PP-NBC) model which provides classification results without leaking privacy information in data sources. Through classification process in PP-NBC, the operations are evaluated using encrypted data by applying fully homomorphic encryption scheme so that service providers are able to handle customer's data without knowing their actual values. The proposed method is implemented with a homomorphic encryption library called HElib and we carry out a primitive performance evaluation for the proposed PP-NBC.
Corresponding, SPRINGER INTERNATIONAL PUBLISHING AG, 2018, NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 11304, 349 - 358, English[Refereed]
International conference proceedings
[Refereed]
International conference proceedings
There is a high demand to promote efficiency in agriculture, as the number of workers engaging in agriculture has been decreasing and aging in recent years. Our research group is working on soybeans which the quantity of yielding is difficult to predict. We focus on the common characteristics observed at multiple cultivation points, in order to examine methods to acquire new kn
The Institute of Industrial Applications Engineers, Jul. 2017, The 5th IIAE International Conference on Intelligent Systems and Image Processing 2017 (ICISIP2017), 209 - 216, English[Refereed]
International conference proceedings
This paper introduces a new topological clustering approach to cluster high dimensional datasets based on t-SNE (Stochastic Neighbor Embedding) dimensionality reduction method and spectral clustering. Spectral clustering method needs to construct an adjacency matrix and calculate the eigen-decomposition of the corresponding Laplacian matrix [1] which are computational expensive and is not easy to apply on large-scale data sets. One of the issue of this problem is to reduce the dimensionality befor to cluster the dataset. The t-SNE method which performs good results for visulaization allows a projection of the dataset in low dimensional spaces that make it easy to use for very large datasets. Using t-SNE during the learning process will allow to reduce the dimensionality and to preserve the topology of the dataset by increasing the clustering accuracy. We illustrate the power of this method with several real datasets. The results show a good quality of clustering results and a higher speed.
IEEE, 2017, 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017-May, 1628 - 1632, English[Refereed]
International conference proceedings
Recently, a new ICT approach to agriculture called "Smart Agriculture" has been received great attention to support farmers' decision-making for good final yield on various kinds of field conditions. For this purpose, this paper presents two image sensing methods that enable an automatic observation to capture flowers and seedpods of soybeans in real fields. The developed image sensing methods are considered as sensors in an agricultural cyber-physical system in which big data on the growth status of agricultural plants and environmental information (e.g., weather, temperature, humidity, solar radiation, soil condition, etc.) are analyzed to mine useful rules for appropriate cultivation. The proposed image sensing methods are constructed by combining several image processing and machine learning techniques. The flower detection is realized based on a coarse-to-fine approach where candidate areas of flowers are first detected by SLIC and hue information, and the acceptance of flowers is decided by CNN. In the seedpod detection, candidates of seedpod regions are first detected by the Viola-Jones object detection method, and we also use CNN to make a final decision on the acceptance of detected seedpods. The performance of the proposed image sensing methods is evaluated for a data set of soybean images that were taken from a crowd of soybeans in real agricultural fields in Hokkaido, Japan.
IEEE, 2017, 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 1787 - 1793, English[Refereed]
International conference proceedings
It is well known that products for cyber-attacks such as exploits and malware codes are illegally traded on hidden web services called Dark Web that are not indexed by conventional search engines we usually use. In general, it is not easy to capture the whole picture of trade activities on Dark Web because special browsers and tools are needed to visit such dark market sites and forums. And they usually require us to make a registration and/or to pass a qualification test. However, to understand the trends of cyber-attacks, there is no doubt that Dark Web is one of the useful information sources. In this paper, we try to understand the sales trends of illegal products for cyber-attacks from the largest marketplace called AlphaBay, which is relatively easier to collect information without passing any qualification tests, To monitor business trades on Dark Web, we develop an AI web-contents analyzer, which consists of a Tor crawler to collect the product information and a topic analyzer to capture the trends of what people are interested in and popular products of cyber-attacks. For this purpose, we use a topic model called Latent Dirichlet Allocation (LDA) and we show that the topic analysis would be helpful for predicting new cyber-attacks.
SPRINGER INTERNATIONAL PUBLISHING AG, 2017, NEURAL INFORMATION PROCESSING, ICONIP 2017, PT V, 10638 (5), 888 - 896, English[Refereed]
International conference proceedings
Recently, computational outsourcing using cloud services is getting popular for big data analysis, and many cloud sourcing providers provide machine learning platforms where we can perform various prediction and classification tasks very easily. On the other hand, there still remains a big hurdle to analyze personal big data on cloud services because the leakage of personal information is a critical issue. As a remedy for this, we propose a privacy preserving machine learning algorithm for Extreme Learning Machine (PP-ELM), which can learn from data encrypted with an additively homomorphic encryption. In the proposed outsourcing method, we consider a three-participants model consisting of data contributors, outsourced server, and data analyst. A data contributor preprocesses and encrypts data, and an outsourced server receives encrypted data and calculate hidden layer outputs using additive operations. Then, a data analyst receives the hidden outputs of ELM from the outsourced server and they are used to obtain ELM connection weights. Since the proposed outsourcing model can learn ELM over encrypted data, it is expected to mitigate a hurdle to deal with personal data on cloud services. In addition, the proposed PP-ELM allows us to learn multiple sources of personal data in a secure way, and this might lead to a better solution for a practical problem than before.
IEEE, 2017, 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017 (2), 1350 - 1357, English[Refereed]
International conference proceedings
This paper gives the idea of large-scale monitoring for cyberattacks using evolving Cauchy possibilistic clustering (eCauchy). The idea of density based clustering is appealing when the data samples are highly noisy and when also the outliers appears frequently. The basic measure of density in recursive form can be modified in a way to be applied on classification problems such as large-scale monitoring for cyberattacks. The algorithm is in on-line form to deal with the data streams and is therefore appropriate for dealing with big-data problems. The development of density as a measure of similarity follows from Cauchy density and is similar to the typicality defined in the possibilistic clustering approach. The described eCauchy clustering deals with just few tuning parameters, such as maximal density. The algorithm evolves the structure during operation by adding and removing the clusters. This is appropriate for data granulation which is of great importance in the case of the clusters which are of different sizes and shapes. In the proposed large-scale monitoring system, darknet sensor packets within a certain period are transformed into 17 traffic features and they are categorized by eCauchy in an on-line fashion. To evaluate the proposed darknet monitoring system, a large set of TCP and UDP packets collected from January 2nd 2016 to March 1st 2016 (60 days) with the NICT /16 darknet sensor are used for evaluation. Our experimental results demonstrate that the proposed monitoring system can detect DDoS backscatter with more than 98% accuracy for TCP packets and non-DDoS backscatter with 72.8% accuracy for UDP packets. The proposed system can learn and predict quite fast, 12.6 sec. for TCP and 312.6 sec. for UDP.
Corresponding, IEEE, 2017, 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2018-January, 1 - 7, English[Refereed]
International conference proceedings
Recently, we can analyze and store big data by high performance computers. In this paper, we present a method of optimal pattern mining from soybean cultivation data for knowledge discovery by introducing an evaluation function based on differences in the frequency of high-yields and lowyields. We can discover factors affecting the growth of soybeans by analyzing optimal patterns extracted using evaluation functions. In our proposed method, optimal patterns are enumerated by eliminating elements that decrease the value of evaluation functions from frequent closed patterns. As a result, our experiment showed the efficiency of the proposed method. In addition, we can observe both general and new knowledge by analyzing extracted optimal pattern groups.
Association for Computing Machinery, 06 Dec. 2016, ACM International Conference Proceeding Series, 19 - 24, English[Refereed]
International conference proceedings
In this paper, we propose a new incremental learning algorithm of radial basis function (RBF) Network to accelerate the learning for large-scale data sequence. Along with the development of the internet and sensor technologies, a time series of large data chunk are continuously generated in our daily life. Thus it is usually difficult to learn all the data within a short period. A remedy for this is to select only essential data from a given data chunk and provide them to a classifier model to learn. In the proposed method, only data in untrained regions, which correspond to a region with a low output margin, are selected. The regions are formed by grouping the data based on their near neighbor using locality sensitive hashing (LSH), in which LSH has been developed to search neighbors quickly in an approximated way. As the proposed method does not use all training data to calculate the output margins, the time of the data selection is expected to be shortened. In the incremental learning phase, in order to suppress catastrophic forgetting, we also exploit LSH to select neighbor RBF units quickly. In addition, we propose a method to update the hash table in LSH so that the data selection can be adaptive during the learning. From the performance of nine datasets, we confirm that the proposed method can learn large-scale data sequences fast without sacrificing the classification accuracies. This fact implies that the data selection and the incremental learning work effectively in the proposed method.
SPRINGER HEIDELBERG, Sep. 2016, EVOLVING SYSTEMS, 7 (3), 173 - 186, English[Refereed]
Scientific journal
In this paper, we propose a new incremental learning algorithm of radial basis function (RBF) Network to accelerate the learning for large-scale data sequence. Along with the development of the internet and sensor technologies, a time series of large data chunk are continuously generated in our daily life. Thus it is usually difficult to learn all the data within a short period. A remedy for this is to select only essential data from a given data chunk and provide them to a classifier model to learn. In the proposed method, only data in untrained regions, which correspond to a region with a low output margin, are selected. The regions are formed by grouping the data based on their near neighbor using locality sensitive hashing (LSH), in which LSH has been developed to search neighbors quickly in an approximated way. As the proposed method does not use all training data to calculate the output margins, the time of the data selection is expected to be shortened. In the incremental learning phase, in order to suppress catastrophic forgetting, we also exploit LSH to select neighbor RBF units quickly. In addition, we propose a method to update the hash table in LSH so that the data selection can be adaptive during the learning. From the performance of nine datasets, we confirm that the proposed method can learn large-scale data sequences fast without sacrificing the classification accuracies. This fact implies that the data selection and the incremental learning work effectively in the proposed method.
Springer Verlag, 01 Sep. 2016, Evolving Systems, 7 (3), 173 - 186, English[Refereed]
Scientific journal
It is useful for many applications to find out meaningful topics from short texts, such as tweets and comments on websites. Since directly applying conventional topic models (e.g., LDA) to short texts often produces poor results, as a general approach to short texts, a biterm topic model (BTM) was recently proposed. However, the original BTM implementation uses collapsed Gibbs sampling (CGS) for its inference, which requires many iterations over the entire dataset. On the other hand, for LDA, there have been proposed many fast inference algorithms throughout the decade. Among them, a recently proposed stochastic collapsed variational Bayesian inference (SCVBO) is promising because it is applicable to an online setting and takes advantage of the collapsed representation, which results in an improved variational bound. Applying the idea of SCVBO, we develop a fast one-pass inference algorithm for BTM, which can be used to analyze large-scale general short texts and is extensible to an online setting. To evaluate the performance of the proposed algorithm, we conducted several experiments using short texts on Twitter. Experimental results showed that our algorithm found out meaningful topics significantly faster than the original algorithm.
Corresponding, IEEE, Jul. 2016, 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 60th, 3364 - 3370, English[Refereed]
International conference proceedings
Kernel principal component analysis (KPCA) is known as a nonlinear feature extraction method. Takeuchi et al. have proposed an incremental type of KPCA (IKPCA) that can update an eigen-space incrementally for a sequence of data. However, in IKPCA, the eigenvalue decomposition should be carried out for every single data, even though a chunk of data is given at one time. To reduce the computational costs in learning chunk data, this paper proposes an extended IKPCA called Chunk IKPCA (CIKPCA) where a chunk of multiple data is learned with single eigenvalue decomposition. For a large data chunk, to reduce further computation time and memory usage, it is first divided into several smaller chunks, and only useful data are selected based on the accumulation ratio. In the proposed CIKPCA, a small set of independent data are first selected from a reduced set of data so that eigenvectors in a high-dimensional feature space can be represented as a linear combination of such independent data. Then, the eigenvectors are incrementally updated by keeping only an eigenspace model that consists of the sextuplet such as independent data, coefficients, eigenvalues, and mean information. The proposed CIKPCA can augment an eigen-feature space based on the accumulation ratio that can also be updated without keeping all the past data, and the eigen-feature space is rotated by solving an eigenvalue problem once for each data chunk. The experiment results show that the learning time of the proposed CIKPCA is greatly reduced as compared with KPCA and IKPCA without sacrificing recognition accuracy.
Corresponding, SPRINGER HEIDELBERG, Mar. 2016, EVOLVING SYSTEMS, 7 (1), 15 - 27, English[Refereed]
Scientific journal
In recent years, along with the popularization of SNS, the incidents, which are called flaming, that the number of negative comments surges are on the increase. This becomes a problem for companies because flamings hurt companies' reputation. In order to minimalize the damage of reputation, we propose the method that detects flamings by estimating the sentiment polarities of SNS comments. Because of the unique SNS characteristics such as repetition of same comments, the polarities of words are sometimes wrongly estimated. To alleviate this problem, transfer learning is introduced. In this research, the sentiment polarities of words are trained in every domain. This will enable to extract the words that are domain-specific and dictate the polarity of comments. These words are occurred in retweets. Transfer learning is implemented to non-extracted words by averaging the occurrence probabilities in other domains. These processes keep the polarities of important words that dictate the polarity of comments and modify the wrongly estimated polarities of words. The experimental results show that the proposed method improves the performance of estimating the sentiment polarity of comments. Moreover, flamings can be detected without missing by monitoring time course of the number of negative comments.
Corresponding, The Institute of Electrical Engineers of Japan, Mar. 2016, 電気学会論文誌 C, 136 (3), 340 - 347, Japanese[Refereed]
Scientific journal
This paper presents a fast and large-scale monitoring system for detecting one of the major cyber-attacks, Distributed Denial of Service (DDoS). The proposed system monitors the packet traffic on a subnet of unused IPs called darknet. Almost all darknet packets are originated from malicious activities. However, it is not obvious what traffic patterns DDoS attacks have. Therefore, we adopt a classifier and train it with traffic features of known DDoS attacks using 80/TCP and 53/UDP packets which can be labeled based on the header information and payloads. The proposed system consists of the two parts: pre-processing and classifier. In the pre-processing part, darknet packets for 30 seconds are transformed into a feature vector which consists of 17 traffic features on darknet traffic. As for the classifier part, we adopt Resource Allocating Network with Locality Sensitive Hashing (RAN-LSH) in which data to be trained are selected by using LSH and fast online learning is actualized by training only selected data. The learning of RAN-LSH is carried out not only with the training data for 80/TCP and 53/UDP packets but also with new training data labeled by a supervisor. The performance of the proposed detection system is evaluated for 9,968 training data obtained from 80/TCP and 53/UDP packets and 5,933 test data obtained from darknet packets with other protocols and source/destination ports. The results indicate that the proposed system detects backscatter packets caused by DDoS attacks accurately and adapts to new attacks quickly.
Corresponding, IEEE, 2016, 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016-, 2979 - 2985, English[Refereed]
International conference proceedings
One of the dimension reduction (DR) methods for data-visualization, t-distributed stochastic neighbor embedding (t-SNE), has drawn increasing attention. t-SNE gives us better visualization than conventional DR methods, by relieving so-called crowding problem. The crowding problem is one of the curses of dimensionality, which is caused by discrepancy between high and low dimensional spaces. However, in t-SNE, it is assumed that the strength of the discrepancy is the same for all samples in all datasets regardless of ununiformity of distributions or the difference in dimensions, and this assumption sometimes ruins visualization. Here we propose a new DR method inhomogeneous t-SNE, in which the strength is estimated for each point and dataset. Experimental results show that such pointwise estimation is important for reasonable visualization and that the proposed method achieves better visualization than the original t-SNE.
SPRINGER INTERNATIONAL PUBLISHING AG, 2016, NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 9949, 119 - 128, English[Refereed]
International conference proceedings
Recently, we can analyze and store big data by high performance computers. In this paper, we present a method of optimal pattern mining from soybean cultivation data for knowledge discovery by introducing an evaluation function based on differences in the frequency of high-yields and low-yields. We can discover factors affecting the growth of soybeans by analyzing optimal patterns extracted using evaluation functions. In our proposed method, optimal patterns are enumerated by eliminating elements that decrease the value of evaluation functions from frequent closed patterns. As a result, our experiment showed the efficiency of the proposed method. In addition, we can observe both general and new knowledge by analyzing extracted optimal pattern groups.
ASSOC COMPUTING MACHINERY, 2016, PROCEEDINGS OF THE WORKSHOP ON TIME SERIES ANALYTICS AND APPLICATIONS (TSAA'16), 19 - 24, English[Refereed]
International conference proceedings
In recent years, with the popularization of SNS, the incidents called flaming, in which a large number of negative comments are retweeted and spread to many followers on SNS, are increasing. Since a flaming event sometimes causes severe criticism by public people, it is becoming a great thread to companies and therefore it is important for companies to protect their reputation from such flaming events. In order to protect companies from serious damages in reputation, we propose a machine learning approach to the detection of flaming events by monitoring the sentiment polarity of SNS comments. From the nature of SNS comments such as the spread of a large number of retweets with the same content for a short time, the word distributions are often strongly biased and it leads to poor performance in sentiment polarity prediction. To alleviate this problem, we introduce transfer learning into the conventional Naive Bayes classifier. More concretely, in the Naive Bayes classifier, the occurrence probabilities of words on a target domain are recalculated using those on other domains, where a domain corresponds to a company to be protected. The experimental results demonstrate that the proposed transfer learning contribute to the improvement in the sentiment polarity prediction for SNS comments. In addition, we show that the proposed system can detect flaming events correctly by monitoring the number of negative comments.
IEEE, 2016, PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 1 - 7, English[Refereed]
International conference proceedings
Multi-dimensional Unfolding (MU) is a method to visualize relevance data between two sets (e.g., preference data) as a single scatter plot. Usually, in the analysis of relevance data, users are interested in which elements are strongly related to each other (e.g., how much an individual likes an item), and not in which elements are irrelevant to each other. However, the convent
ACM, 2016, ICMLC 2017 Proceedings of the 9th International Conference on Machine Learning and Computing, 60th, 4p - 252, JapaneseIn this paper, we propose a new online system to detect malicious spam emails and to adapt to the changes of malicious URLs in the body of spam emails by updating the system daily. For this purpose, we develop an autonomous system that learns from double-bounce emails collected at a mail server. To adapt to new malicious campaigns, only new types of spam emails are learned by introducing an active learning scheme into a classifier model. Here, we adopt Resource Allocating Network with Locality Sensitive Hashing (RAN-LSH) as a classifier model with data selection. In this data selection, the same or similar spam emails that have already been learned are quickly searched for a hash table using Locally Sensitive Hashing, and such spam emails are discarded without learning. On the other hand, malicious spam emails are sometimes drastically changed along with a new arrival of malicious campaign. In this case, it is not appropriate to classify such spam emails into malicious or benign by a classifier. It should be analyzed by using a more reliable method such as a malware analyzer. In order to find new types of spam emails, an outlier detection mechanism is implemented in RAN-LSH. To analyze email contents, we adopt the Bag-of-Words (BoW) approach and generate feature vectors whose attributes are transformed based on the normalized term frequency-inverse document frequency. To evaluate the developed system, we use a dataset of double-bounce spam emails which are collected from March 1st, 2013 to August 29th, 2013. In the experiment, we study the effect of introducing the outlier detection in RAN-LSH. As a result, by introducing the outlier detection, we confirm that the detection accuracy is enhanced on average over the testing period.
Institute of Electrical and Electronics Engineers Inc., 28 Sep. 2015, Proceedings of the International Joint Conference on Neural Networks, 2015-, 1 - 7, English[Refereed]
International conference proceedings
This paper presents a non-destructive image sensing method to estimate the height of agricultural plants. In this method, several images are taken by moving a digital camera attached to a single-axis robot, and the two consecutive images with a plant tip are automatically matched using SIFT keypoints. Then, the plant height is estimated from the two images based on the triangul
Sep. 2015, Proc. of Int. Symposium on Applied Electromagnetics and Mechanics, 1 - 2, English[Refereed]
International conference proceedings
In this paper, we propose a new online system to detect malicious spam emails and to adapt to the changes of malicious URLs in the body of spam emails by updating the system daily. For this purpose, we develop an autonomous system that learns from double-bounce emails collected at a mail server. To adapt to new malicious campaigns, only new types of spam emails are learned by introducing an active learning scheme into a classifier model. Here, we adopt Resource Allocating Network with Locality Sensitive Hashing (RAN-LSH) as a classifier model with data selection. In this data selection, the same or similar spam emails that have already been learned are quickly searched for a hash table using Locally Sensitive Hashing, and such spam emails are discarded without learning. On the other hand, malicious spam em ails are sometimes drastically changed along with a new arrival of malicious campaign. In this case, it is not appropriate to classify such spam emails into malicious or benign by a classifier. It should be analyzed by using a more reliable method such as a mal ware analyzer. In order to find new types of spam emails, an outlier detection mechanism is implemented in RAN-LSH. To analyze email contents, we adopt the Bag-of-Words (BoW) approach and generate feature vectors whose attributes are transformed based on the normalized term frequency-inverse document frequency. To evaluate the developed system, we use a dataset of double-bounce spam emails which are collected from March 1st, 2013 to August 29th, 2013. In the experiment, we study the effect of introducing the outlier detection in RAN-LSH. As a result, by introducing the outlier detection, we confirm that the detection accuracy is enhanced on average over the testing period.
IEEE, 2015, 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), English[Refereed]
International conference proceedings
In this paper, we propose a new online system that can quickly detect malicious spam emails and adapt to the changes in the email contents and the Uniform Resource Locator (URL) links leading to malicious websites by updating the system daily. We introduce an autonomous function for a server to generate training examples, in which double-bounce emails are automatically collecte
Scientific Research Publishing, 2015, Journal of Intelligent Learning Systems and Applications,, 7, 42 - 57, English[Refereed]
Scientific journal
This paper presents a machine learning approach to large-scale monitoring for malicious activities on Internet. In the proposed system, network packets sent from a subnet to a darknet (i.e., a set of unused IPs) are collected, and they are transformed into 27-dimensional TAP (Traffic Analysis Profile) feature vectors. Then, a hierarchical clustering is performed to obtain clusters for typical malicious behaviors. In the monitoring phase, the malicious activities in a subnet are estimated from the closest TAP feature cluster. Then, such TAP feature clusters for all subnets are visualized on the proposed monitoring system in real time. In the experiment, we use a big data set of 303,733,994 darknet packs collected from February 1st to February 28th, 2014 (28 days) for monitoring. As a result, we can successfully detect an indication of the pandemic of a new malware, which attacked to the vulnerability of Synology NAS (port 5,000/TCP).
ELSEVIER SCIENCE BV, 2015, INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 53, 175 - 182, English[Refereed]
International conference proceedings
This paper presents an adaptive large-scale monitoring system to detect Distributed Denial of Service (DDoS) attacks whose backscatter packets are observed on the darknet (i.e., unused IP space). To classify DDoS backscatter, 17 features of darknet traffic are defined from IPs/ports information for source and destination hosts. To adapt to the change of DDoS attacks, we newly implement an online learning function in the proposed monitoring system, where an SVM classifier is continuously trained with darknet features transformed from packets during a certain period. In the performance evaluation, we use the MWS Dataset 2014 that consists of darknet packets collected from 1st January 2014 to 28th February 2014 (8 weeks). We demonstrate that the proposed system keeps good test performance in the detection of DDoS backscatter (0.98 in F-measure).
SPRINGER INTERNATIONAL PUBLISHING AG, 2015, NEURAL INFORMATION PROCESSING, ICONIP 2015, PT IV, 9492, 376 - 383, English[Refereed]
International conference proceedings
In this paper, we propose a new online non-linear feature extraction method, called the incremental two-dimensional kernel principal component analysis (I2DKPCA), not only to reduce the computational cost but also to provide good feature representation. Batch type feature extraction methods such as principal component analysis (PCA) and two-dimensional PCA (2DPCA) require more computational time and memory usage, as they collect the entire training data to extract the basis vectors. Also, these linear feature extraction methods could not effectively represent the non-linear distribution of input data. Therefore, by adopting a non-linear kernel approach with chunk concept, the KPCA and 2DKPCA can effectively address the non-linear feature representation problem by adaptively changing the feature spaces. However, this kernel approach requires more computational time for processing images with high dimensional input data. In order to solve these problems, we combined the 2DKPCA with incremental learning for (1) solving the non-linear problem and (2) reducing the memory usage with computational time. In order to evaluate the performance of I2DKPCA, several experiments have been performed using well-known face and object image databases. (C) 2014 Elsevier B.V. All rights reserved.
ELSEVIER, Jun. 2014, NEUROCOMPUTING, 134, 280 - 288, English[Refereed]
Scientific journal
In real life, data are not always generated under stationary environments. However, traditional learning systems have normally assumed that the property of data streams is stationary over time, and this sometimes leads to the degradation in the system performance when there are some hidden contexts changes (e.g. changes in class boundaries and temporal trends in time series). S
Institute of Systems, Control and Information Engineers, Apr. 2014, Trans. of Institute of Systems, Control and Information Engineers, 27 (4), 133 - 140, English[Refereed][Invited]
Scientific journal
In this paper, we propose an incremental neural network model for a general class of sequential multi-task classification problems where a training data of a task may not only have multiple class labels but also have task information. Such data property originates from the uncertainty of teaching signals given by a supervisor. To handle this type of classification problems, the proposed model consists of a three-layer feedforward neural network with long-term/short-term memories, and it has the following functions: one-pass incremental learning, task allocation, handling multi-label data, task consolidation, and knowledge transfer. We newly introduce the following two types of task consolidation functions other than the conventional error-based one: the task consolidation based on the co-occurrence relation of class labels and task information. In the experiments, we evaluate the proposed model for various kinds of data sets. The experimental results demonstrate that the proposed model has good performance in both classification and task categorization even if the task information is not always given.
IOS PRESS, 2014, SMART DIGITAL FUTURES 2014, 262, 402 - 411, English[Refereed]
International conference proceedings
Kernel Principal Component Analysis (KPCA) is widely used feature extraction as it have been proven that KPCA is powerful in many areas in pattern recognition. Considering that the conventional KPCA should decompose a kernel matrix of all training data, this would be an unrealistic assumption for data streams in real-world applications. Therefore, in this paper, we propose an online feature extraction called Chunk Incremental Kernel Principal Component Analysis (CIKPCA) that can handle data streams in an incremental mode. In the proposed method, the training data are assumed to be given in a chunk of multiple data at one time. In CIKPCA, an eigen-feature space is updated by solving the eigenvalue decomposition once whenever a chunk of data is given. However, if a chunk size is large, a kernel matrix to be decomposed is also large, resulting in high computational time. Considering that not all the data are useful for the eigen-feature space learning, the data in a chunk are first selected based on the importance. Several benchmark data sets in the UCI Machine Learning Repository are used to evaluate the performance of the proposed method. The experimental results show that our proposed method can accelerate the learning of the eigen-feature space compared to Takeuchi et al.'s IKPCA without reducing the recognition accuracy.
IEEE, 2014, PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 3135 - 3142, English[Refereed]
International conference proceedings
In this work, we propose a method to discriminate backscatter caused by DDoS attacks from normal traffic. Since DDoS attacks are imminent threats which could give serious economic damages to private companies and public organizations, it is quite important to detect DDoS backscatter as early as possible. To do this, 11 features of port/IP information are defined for network packets which are sent within a short time, and these features of packet traffic are classified by Suppurt Vector Machine (SVM). In the experiments, we use TCP packets for the evaluation because they include control flags (e.g. SYN-ACK, RST-ACK, RST, ACK) which can give label information (i.e. backscatter or non-backscatter). We confirm that the proposed method can discriminate DDoS backscatter correctly from unknown darknet TCP packets with more than 90% accuracy.
IEEE, 2014, 2014 NINTH ASIA JOINT CONFERENCE ON INFORMATION SECURITY (ASIA JCIS), 39 - 43, English[Refereed]
International conference proceedings
In this work, we propose a method to quickly discriminate DDoS backscatter packets from those of other traffic observed by darknet sensors (i.e., backscatter or non-backscatter). Upon the packets that are sent by a host towards the monitored darknet during a short time-window, we define 12 descriptive features, which are then input to an SVM classifier for classification.In the experiments, we use TCP packets sent from port 80 and UDP packets sent from port 53 as the training and testing data, because of the easiness to label them based on domain knowledge. Experiments showed promising results on these two ports.
The Institute of Electronics, Information and Communication Engineers, 2014, 電子情報通信学会技術研究報告, 114 (340(ICSS2014 51-62)), 49 - 53, JapaneseSNS is one of the most effective communication tools and it has brought about drastic changes in our lives. Recently, however, a phenomenon called flaming or backlash becomes an imminent problem to private companies. A flaming incident is usually triggered by thoughtless comments/actions on SNS, and it sometimes ends up damaging to the company's reputation seriously. In this paper, in order to prevent such unexpected damage to the company's reputation, we propose a new approach to sentiment analysis using a Naive Bayes classifier, in which the features of tweets/comments are selected based on entropy-based criteria and an empirical rule to capture negative expressions. In addition, we propose a semi-supervised learning approach to relabeling noisy training data, which come from various SNS media such as Twitter, Facebook, blogs and a Japanese textboard called '2-channel'. In the experiments, we use four data sets of users' comments, which were posted to different SNS media of private companies. The experimental results show that the proposed Naive Bayes classifier model has good performance for different SNS media, and a semi-supervised learning effectively works for the data consisting of long comments. In addition, the proposed method is applied to detect flaming incidents, and we show that it is successfully detected.
IEEE, 2014, 2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIG DATA (CIBD), 20 - 25, English[Refereed]
International conference proceedings
Malicious spam is one of the major problems of the Internet nowadays. It brings financial damage to companies and security threat to governments and organizations. Most recent spam emails contain URLs that redirect spam receivers to malicious Web servers. In this paper, we propose an online machine learning based malicious spam email detection system. The term-weighting scheme represents each spam email. These feature vectors are then used as the input of the classifier. The learning is periodically performed to update the classifier so that the system provides increased adaptability to take account of spam emails whose contents change from time to time. A real data set is labeled by the SPIKE system which is developed by NICT. Evaluation experiments show that the detection system is efficient and accurate to identify malicious spam emails.
SPRINGER-VERLAG BERLIN, 2014, NEURAL INFORMATION PROCESSING, ICONIP 2014, PT III, 8836, 365 - 372, English[Refereed]
International conference proceedings
This paper proposes a new online feature extraction method called the Incremental Recursive Fisher Linear Discriminant (IRFLD), whose batch learning algorithm, referred to as RFLD, was proposed by Xiang and colleagues. In the conventional Linear Discriminant Analysis (LDA), the number of discriminant vectors is limited to the number of classes minus one due to the rank of the between-class covariance matrix. However, RFLD and the proposed IRFLD can break this limit; that is, an arbitrary number of discriminant vectors can be obtained. In the proposed IRFLD, the Incremental Linear Discriminant Analysis (ILDA) of Pang and colleagues is extended in such a way that effective discriminant vectors are recursively searched for in the complementary space of a conventional discriminant subspace. In addition, to estimate a suitable number of effective discriminant vectors, the classification accuracy is evaluated using the cross-validation method in an online manner. For this purpose, validation data are obtained by performing k-means clustering on incoming training data and the previous validation data. The performance of IRFLD is evaluated for 16 benchmark data sets. The experimental results show that the final classification accuracies of IRFLD are always better than those of ILDA. We also show that this performance improvement is attained by adding discriminant vectors in a complementary LDA subspace. (c) 2013 Wiley Periodicals, Inc. Electron Comm Jpn, 96(4): 2940, 2013; Published online in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/ecj.10430
WILEY, Apr. 2013, ELECTRONICS AND COMMUNICATIONS IN JAPAN, 96 (4), 29 - 40, English[Refereed]
Scientific journal
In this paper, we propose a robust incremental principal component analysis (IPCA) for stream data that can handle missing values on an ongoing basis. In the proposed IPCA, a missing value is substituted with the value estimated from a conditional probability density function. The conditional probability density functions are incrementally updated when new data are given. In the experiments, we evaluate the performance for both artificial and real data sets through the comparison with the two conventional approaches to handing missing values. We first investigate the estimation errors of missing values. The experimental results demonstrate that the proposed IPCA gives lower estimation errors compared to the other approaches. Next, we investigate the approximation accuracy of eigenvectors. The results show that the proposed IPCA has relatively good accuracy of eigenvectors not only for major components but also for minor components.
IEEE, 2013, 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 1 - 8, English[Refereed]
International conference proceedings
In this paper, we propose a robust incremental principal component analysis (IPCA) for stream data that can handle missing values on an ongoing basis. In the proposed IPCA, a missing value is substituted with the value estimated from a conditional probability density function. The conditional probability density functions are incrementally updated when new data are given. In the experiments, we evaluate the performance for both artificial and real data sets through the comparison with the two conventional approaches to handing missing values. We first investigate the estimation errors of missing values. The experimental results demonstrate that the proposed IPCA gives lower estimation errors compared to the other approaches. Next, we investigate the approximation accuracy of eigenvectors. The results show that the proposed IPCA has relatively good accuracy of eigenvectors not only for major components but also for minor components.
IEEE, 2013, 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 1 - 8, English[Refereed]
International conference proceedings
This paper presents a new sequential multi-task learning model with the following functions: one-pass incremental learning, task allocation, knowledge transfer, task consolidation, learning of multi-label data, and active learning. This model learns multi-label data with incomplete task information incrementally. When no task information is given, class labels are allocated to appropriate tasks based on prediction errors; thus, the task allocation sometimes fails especially at the early stage. To recover from the misallocation, the proposed model has a backup mechanism called task consolidation, which can modify the task allocation not only based on prediction errors but also based on task labels in training data (if given) and a heuristics on multi-label data. The experimental results demonstrate that the proposed model has good performance in both classification and task categorization.
SPRINGER-VERLAG BERLIN, 2013, ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2013, 8131, 162 - 169, English[Refereed]
International conference proceedings
Recently, mining knowledge from stream data such as access logs of computer, commodity distribution data, sales data, and human lifelog have been attracting many attentions. As one of the techniques suitable for such an environment, active learning has been studied for a long time. In this work, we propose a fast learning technique for neural networks by introducing Locality Sensitive Hashing (LSH) and a local learning algorithm with LSH in RBF networks. © Springer-Verlag 2013.
Springer, 2013, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8226 (1), 369 - 376, English[Refereed]
International conference proceedings
In the conventional Incremental Principal Component Analysis (IPCA), an eigenvalue problem has to be solved whenever one or a small number of training data are given in sequence. Since the eigenvalue decomposition requires high computational costs in general, solving the eigenvalue problem repeatedly results in the deterioration in the real-time learning property of IPCA. Hence
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, Jun. 2012, TECHNICAL REPORT OF IEICE, 112 (108), 1 - 6, JapaneseSymposium
Along with the development of the network technology and high-performance small devices such as surveillance cameras and smart phones, various kinds of multimodal information (texts, images, sound, etc.) are captured real-time and shared among systems through networks. Such information is given to a system as a stream of data. In a person identification system based on face recognition, for example, image frames of a face are captured by a video camera and given to the system for an identification purpose. Those face images are considered as a stream of data. Therefore, in order to identify a person more accurately under realistic environments, a high-performance feature extraction method for streaming data, which can be autonomously adapted to the change of data distributions, is solicited. In this review paper, we discuss a recent trend on online feature extraction for streaming data. There have been proposed a variety of feature extraction methods for streaming data recently. Due to the space limitation, we here focus on the incremental principal component analysis.
The Institute of Electrical Engineers of Japan, 2012, 電気学会論文誌 C, 132 (1), 6 - 13, JapaneseAn incremental learning algorithm of Kernel Principal Component Analysis (KPCA) called Chunk Incremental KPCA (CIKPCA) has been proposed for online feature extraction in pattern recognition. CIKPCA can reduce the number of times to solve the eigenvalue problem compared with the conventional incremental KPCA when a small number of data are simultaneously given as a stream of data chunks. However, our previous work suggests that the computational costs of the independent data selection in CIKPCA could dominate over those of the eigenvalue decomposition when a large chunk of data are given. To verify this, we investigate the influence of the chunk size to the learning time in CIKPCA. As a result, CIKPCA requires more learning time than IKPCA unless a large chunk of data are divided into small chunks (e.g., less than 50). © 2012 IEEE.
IEEE, 2012, 2012 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2012 - Proceedings, 7 - 10, English[Refereed]
International conference proceedings
In this paper, a new approach to an online feature extraction under nonstationary environments is proposed by extending Incremental Linear Discriminant Analysis (ILDA). The extended ILDA not only detect so-called "concept drifts" but also transfer the knowledge on discriminant feature spaces of the past concepts to construct good feature spaces. The performance of the extended ILDA is evaluated for the benchmark datasets including sudden changes and reoccurrence in concepts.
SPRINGER-VERLAG BERLIN, 2012, NEURAL INFORMATION PROCESSING, ICONIP 2012, PT II, 7664, 640 - 647, English[Refereed]
International conference proceedings
In this work, we extend the sequential multitask learning model called Resource Allocating Network for Multi-Task Pattern Recognition (RAN-MTPR) by introducing the following new learning functions: multi-label recognition, semi-supervised task learning and active learning. The extended RAN-MTPR can learn a training data with multiple class labels, can handle a semi-supervised setting for task learning, and can actively request class labels for unsure inputs. We evaluate the performance of the extended RAN-MTPR, and we know that the above three functions work well to enhance the generalization performance for pattern recognition problems.
IEEE, 2012, 2012 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL), 1 - 2, English[Refereed]
International conference proceedings
In this paper, we propose a new sequential multitask pattern recognition model called Resource Allocating Network for Multi-Task Learning with Metric Learning (RAN-MTLML). RAN-MTLML has the following five functions: one-pass incremental learning, task-change detection, memory/retrieval of task knowledge, reorganization of classifier, and knowledge transfer. The knowledge transfer is actualized by transferring the metrics of all source tasks to a target task based on the task relatedness. Experimental results demonstrate the effectiveness of introducing the metric learning and the knowledge transfer on metric in the proposed RAN-MTLML.
IEEE, 2012, 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 671 - 674, English[Refereed]
International conference proceedings
[Refereed]
Scientific journal
We propose a new approach to a real-time personal authentication system based on incrementally updated visual (face) and audio (voice) features of persons. The proposed system consists of real-time face detection, incremental audiovisual feature extraction, and incremental neural classifier model with long-term memory. The face detection part, a biologically motivated face-color preferable selective attention model first localizes face candidate regions in natural scenes, and then the Adaboost-based face detection identifies human faces from the localized face-candidate regions. The mel-frequency cepstral coefficient is used for vocal feature extraction of speakers. Moreover, incremental principal component analysis (IPCA) is used to reduce the dimensions of audiovisual features and to update them incrementally. The features extracted by IPCA is fed to the resource allocating network with long-term memory which learns facial and vocal features incrementally and recognizes faces in real time. Experimental results show that the proposed system can enhance the test performance incrementally without serious forgetting. In addition, a multi-modal (facial and vocal) feature effectively increases the robustness of the personal authentication system in noisy environments. © 2011 Springer-Verlag.
Dec. 2011, Evolving Systems, 2 (4), 261 - 272, English[Refereed]
Scientific journal
In this paper, a novel type of radial basis function network is proposed for multitask pattern recognition. We assume that recognition tasks are switched sequentially without notice to a learner and they have relatedness to some extent. We further assume that training data are given to learn one by one and they are discarded after learning. To learn a recognition system incrementally in such a multitask environment, we propose Resource Allocating Network for Multi-Task Pattern Recognition (RAN-MTPR). There are five distinguished functions in RAN-MTPR: one-pass incremental learning, task change detection, task categorization, knowledge consolidation, and knowledge transfer. The first three functions enable RAN-MTPR not only to acquire and accumulate knowledge of tasks stably but also to allocate classes to appropriate tasks unless task labels are not explicitly given. The fourth function enables RAN-MTPR to recover the failure in task categorization by minimizing the conflict in class allocation to tasks. The fifth function, knowledge transfer from one task to another, is realized by sharing the internal representation of a hidden layer with different tasks and by transferring class information of the most related task to a new task. The experimental results show that the recognition performance of RAN-MTPR is enhanced by introducing the two types of knowledge transfer and the consolidation works well to reduce the failure in task change detection and task categorization if the RBF width is properly set.
SPRINGER, Jun. 2011, NEURAL PROCESSING LETTERS, 33 (3), 283 - 299, English[Refereed]
Scientific journal
In this paper, we propose a new online feature extraction algorithm called Incremental Recursive Fisher Linear Discriminant (IRFLD). In the conventional Linear Discriminant Analysis (LDA), the number of discriminant vectors is limited to the number of classes minus one due to the rank of a between-class covariance matrix. However, the proposed IRFLD can remove this limitation. That is, an arbitrary number of discriminant vectors up to input dimensions can be obtained to construct a feature space. In the proposed IRFLD, the Pang et al.'s Incremental Linear Discriminant Analysis (ILDA) is extended such that effective discriminant vectors are recursively searched for the complementary space of a conventional discriminant space. In addition, a suitable number of effective discriminant vectors are automatically determined using a cross-validation method, where several representative training data are held as validation data and they are updated using the k-means clustering whenever a chunk of new training data are given. The performance of IRFLD is evaluated for 5 benchmark data sets. The experimental results show that the final classification accuracies of IRFLD are always better than those of ILDA. We also reveal that this performance improvement is attained by adding discriminant vectors in a complementary discriminant space. © 2011 IEEE.
2011, IEEE SSCI 2011: Symposium Series on Computational Intelligence - EAIS 2011: 2011 IEEE Workshop on Evolving and Adaptive Intelligent Systems, 70 - 76, English[Refereed]
International conference proceedings
In this paper, we propose a new incremental two-directional two-dimensional principal component analysis (I(2D)(2)PCA) to efficiently recognize human faces. For implementing a real time face recognition system in an embedded system, the reduction of computational load as well as memory of a feature extraction algorithm is very important issue. The (2D)(2)PCA is faster than the conventional PCA. From memory capacity point of view, the incremental PCA is very efficient algorithm by adapting the eigensapce only using a new incoming sample data without memorizing all of previous trained data. In order to construct an efficient algorithm with less memory and small computational load, we propose a new feature extraction method by combining the IPCA and the (2D)(2)PCA. To evaluate the performance of the proposed (I(2D)(2)PCA), a series of experiments were performed on two face image databases: ORL and Yale face databases. The experimental results show that the proposed feature extraction method is efficient by reducing the memory while computational load is nearly similar to I(2D)(2)PCA.
IEEE, 2011, 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, pp. 1493 - 1496, 1493 - 1496, English[Refereed]
International conference proceedings
This paper proposes a new online feature extraction method called Incremental Recursive Fisher Linear Discriminant (IRFLD) whose batch learning algorithm called RFLD has been proposed by Xiang et al. In the conventional Linear Discriminant Analysis (LDA), the number of discriminant vectors is limited to the number of classes minus one due to the rank of the between-class covariance matrix. However, RFLD and the proposed IRFLD can break this limit; that is, an arbitrary number of discriminant vectors can be obtained. In the proposed IRFLD, the Pang et al.'s Incremental Linear Discriminant Analysis (ILDA) is extended such that effective discriminant vectors are recursively searched for the complementary space of a conventional discriminant subspace. In addition, to estimate a suitable number of effective discriminant vectors, the classification accuracy is evaluated with a cross-validation method in an online manner. For this purpose, validation data are obtained by performing the k-means clustering against incoming training data and previous validation data. The performance of IRFLD is evaluated for 16 benchmark data sets. The experimental results show that the final classification accuracies of IRFLD are always better than those of ILDA. We also reveal that this performance improvement is attained by adding discriminant vectors in a complementary LDA subspace.
The Institute of Electrical Engineers of Japan, 2011, 電気学会論文誌 C, 131 (7), 1368 - 1376, JapaneseIn this paper, we propose an incremental 2-directional 2-dimensional linear discriminant analysis (I-(2D)(2)LDA) for multitask pattern recognition (MTPR) problems in which a chunk of training data for a particular task are given sequentially and the task is switched to another related task one after another. In I-(2D) 2LDA, a discriminant space of the current task spanned by 2 types of discriminant vectors is augmented with effective discriminant vectors that are selected from other tasks based on the class separability. We call the selective augmentation of discriminant vectors knowledge transfer of feature space. In the experiments, the proposed I-(2D)(2)LDA is evaluated for the three tasks using the ORL face data set: person identification (Task 1), gender recognition (Task 2), and young-senior discrimination (Task 3). The results show that the knowledge transfer works well for Tasks 2 and 3; that is, the test performance of gender recognition and that of young-senior discrimination are enhanced.
IEEE, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), pp. 2911-2916, 2911 - 2916, English[Refereed]
International conference proceedings
In this paper, a new incremental learning algorithm of Kernel Principal Component Analysis (KPCA) is proposed for online feature extraction in pattern recognition problems. The proposed algorithm is derived by extending the Takeuchi et al.'s Incremental KPCA (T-IKPCA) that can learn a new data incrementally without keeping past training data. However, even if more than two data are given in a chunk, T-IKPCA should learn them individually; that is, in order to update the eigen-feature space, the eigenvalue decomposition should be performed for every data in the chunk. To alleviate this problem, we extend T-IKPCA such that an eigen-feature space learning is conducted by performing the eigenvalue decomposition only once for a chunk of given data. In the proposed IKPCA, whenever a new chunk of training data are given, linearly independent data are first selected based on the cumulative proportion. Then, the eigenspace augmentation is conducted by calculating the coefficients for the selected linearly independent data, and the eigen-feature space is rotated based on the rotation matrix that can be obtained by solving a kernel eigenvalue problem. To verify the effectiveness of the proposed IKPCA, the learning time and the accuracy of eigenvectors are evaluated using the three UCI benchmark data sets. From the experimental results, we confirm that the proposed IKPCA can learn an eigen-feature space very fast without sacrificing the recognition accuracy.
IEEE, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), pp. 2881-2888, 2881 - 2888, English[Refereed]
International conference proceedings
In this paper, we extend the sequential multitask learning model called Resource Allocating Network for Multi-Task Pattern Recognition (RAN-MTPR) proposed by Nishikawa et al. such that it can learn a training sample with multiple class labels which are originated from different lassification tasks. Here, we assume that no task information is given for training samples. Therefore, the extended RAN-MTPR has to allocate multiple class labels to appropriate tasks under unsupervised settings. This is carried out based on the prediction errors in the output sections, and the most probable task is selected from the output section with a minimum error. Through the computer simulations using the ORL face dataset, we show that the extended RAN-MTPR works well as a multitask learning model. © 2011 IEEE.
2011, Proceedings - 10th International Conference on Machine Learning and Applications, ICMLA 2011, 2, 35 - 40, English[Refereed]
International conference proceedings
[Refereed]
Scientific journal
[Refereed]
Scientific journal
In this paper, we propose a new incremental linear discriminant analysis (ILDA) for multitask pattern recognition (MTPR) problems in which a chunk of training data for a particular task are given sequentially and the task is switched to another related task one after another. The Pang et al.'s ILDA is extended such that a discriminant space of the current task is augmented with effective discriminant vectors that are selected from other tasks based on the class separability. We call this selective augmentation of discriminant vectors knowledge transfer of feature space. In the experiments, the proposed ILDA is evaluated for seven MTPR problems, each of which consists of three recognition tasks. The results demonstrate that the proposed ILDA with knowledge transfer outperforms the conventional ILDA and its naive extension to MTPR problems with regard to both class separability and recognition accuracy. We confirm that the proposed knowledge transfer works well to evolve effective feature spaces online in MTPR problems. © Springer-Verlag 2010.
Aug. 2010, Evolving Systems, 1 (1), 17 - 27, English[Refereed]
Scientific journal
This paper presents a new approach to reinforcement learning in which an optimal action policy is learned not only for primitive actions but also for deterministic state-action sequences called macro-actions. To control the exploration and exploitation of macro-actions, the temperature parameter defined by the state values and the frequency of visiting states are added to representative state-action. pairs called memory items, which are stored in the long-term memory of the proposed Actor-Critic neural model. In the proposed model, no explicit form of macro-actions is defined. A macro-action is defined as a sequence of memory items with low temperature. By applying the softmax action selection to each of the memory items, an agent takes a series of actions in a deterministic way, resulting in the exploitation of a macro-action. The experimental results demonstrate that the proposed model can learn quite faster than the conventional Actor-Critic neural models in which no macro-action is introduced.
ICIC INTERNATIONAL, Feb. 2010, INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 6 (2), 577 - 590, English[Refereed]
Scientific journal
本論文では,初期データにのみ教師情報が与えられる「準教師付き学習タスク」において,ストリーミングデータからオンラインで非線形な特徴を抽出できる追加学習型カーネル主成分分析(IKPCA)を提案する.提案するIKPCAでは,学習データが入力されるたびにカーネル主成分分析の固有値問題を更新し,それを解くことで固有ベクトルの更新を行う.特徴固有空間で一次独立なデータを選択して固有ベクトルを表現するため,データの一次独立性を判定する必要がなく,追加学習時に保持するデータ数が少なくなって学習が高速化される.ベンチマークデータを用いた評価実験において,主成分分析(PCA)と追加学習型主成分分析(IPCA),更にカーネル主成分分析(KPCA)と比較し,IKPCAで得られる特徴量の評価を行った.その結果,IKPCAによってバッチ学習のKPCAと同等の認識性能が得られ,安定した追加学習が行われることを示した.このことは,IKPCAとKPCAにおいて,固有ベクトルや固有値の一致度を調べた実験からも確認された.また,多くの評価データでPCAやIPCAよりも,認識性能の優れた特徴が得られることを示した.
The Institute of Electronics, Information and Communication Engineers, 2010, The IEICE transactions on information and systems, J93-D (6), 826 - 836, JapaneseIn this paper, we propose a new autonomous incremental learning algorithm for radial basis function networks called Autonomous Learning algorithm for Resource Allocating Network (AL-RAN). The proposed AL-RAN can carried out the following operations autonomously: (1) data collection for initial learning, (2) data normalization, (3) allocation of RBFs, (4) setting and adjusting RBF widths, and (5) incremental learning. In this paper, we mainly improve the first four functions in the initial learning phase where a convergence criterion based on the class separability of collected data is adopted in order to reduce the computational costs. In AL-RAN, training data are first collected until the class separability is converged or the recognition accuracies for normalized and unnormalized data have a significant difference. Then, an initial structure of AL-RAN is autonomously determined from the collected data, and AL-RAN is trained with them. After the initial learning, the incremental learning of AL-RAN is conducted whenever a new training data is given. In the experiments, we evaluate AL-RAN using five benchmark datasets. The experimental results demonstrate that the above autonomous functions work well and the number of collected data in the proposed AL-RAN is significantly decreased without sacrificing the final recognition accuracy as compared with the previous version of AL-RAN.
IEEE, 2010, 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, pp. 706-71, English[Refereed]
International conference proceedings
To avoid the catastrophic interference in incremental learning, we have proposed Resource Allocating Network with Long Term Memory (RAN-LTM). In RAN-LTM, not only new training data but also some memory items stored in long-term memory are trained either by a gradient descent algorithm or by solving a linear regression problem. In the latter approach, radial basis function (RBF) centers are not trained but selected based on output errors when connection weights are updated. The proposed incremental learning algorithm belongs to the latter approach where the errors not only for a training data but also for several retrieved memory items and pseudo training data are minimized to suppress the catastrophic interference. The novelty of the proposed algorithm is that connection weights to be learned are restricted based on RBF activation in order to improve the efficiency in learning time and memory size. We evaluate the performance of the proposed algorithm in one-dimensional and multi-dimensional function approximation problems in terms of approximation accuracy, learning time, and average memory size. The experimental results demonstrate that the proposed algorithm can learn fast and have good performance with less memory size compared to memory-based learning methods.
The Institute of Electrical Engineers of Japan, 2010, 電気学会論文誌 C, 130 (9), 1667 - 1673, JapaneseWe propose a new approach for a real-time personal authentication system, which consists of a selective face attention model, incremental feature extraction, and an incremental neural classifier model with long-term memory. In this paper, a face-color preferable selective attention combined with the Adaboost algorithm is used to detect human faces, and incremental principal component analysis (IPCA) and resource allocating network with long-term memory (RAN-LTM) are effectively combined to implement real-time personal authentication systems. The biologically motivated face-color preferable selective attention model localizes face candidate regions in a natural scene, and then the Adaboost based face detection process identifies human faces from the localized face-candidate regions. IPCA updates an eigen- space incrementally by rotating eigen-axes and adaptively increasing the eigen-space dimensions. The features extracted by projecting inputs to the eigen-space are given to RAN-LTM which learns facial features incrementally without unexpected forgetting and recognizes faces in real time. The experimental results show that the proposed model successfully recognizes 200 human faces through incremental learning without serious forgetting.
SPRINGER-VERLAG BERLIN, 2010, PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 6230, 445 - +, English[Refereed]
International conference proceedings
In this paper, we present a modified version of incremental Kernel Principal Component Analysis (IKPCA) which was originally proposed by Takeuchi et al. as an online nonlinear feature extraction method. The proposed IKPCA learns a high-dimensional feature space incrementally by solving an eigenvalue problem whose matrix size is given by the power of the number of independent data. In the proposed IKPCA, independent data are used for calculating eigenvectors in a feature space, but they are selected in a low-dimensional eigen-feature space. Hence, the size of an eigenvalue problem is usually small, and this allows IKPCA to learn eigen-feature spaces very fast even though the eigenvalue decomposition has to be carried out at every learning stage. The proposed IKPCA. consists of two learning phases: initial learning phase and incremental learning phase. In the former, some parameters are optimized and an initial eigen-feature space is computed by applying the conventional KPCA. In the latter, the eigen-feature space is incrementally updated whenever a new data is given. In the experiments, we evaluate the learning time and the approximation accuracies of eigenvectors and eigenvalues. The experimental results demonstrate that the proposed IKPCA learns eigen-feature spaces very fast with good approximation accuracy.
SPRINGER-VERLAG BERLIN, 2010, PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 6230, 487 - 497, English[Refereed]
International conference proceedings
When environments are dynamically changed for agents, the knowledge acquired in an environment might be useless in future. In such dynamic environments, agents should be able to not only acquire new knowledge but also modify old knowledge in learning. However, modifying all knowledge acquired before is not efficient because the knowledge once acquired may be useful again when similar environment reappears and some knowledge can be shared among different environments. To learn efficiently in such environments, we propose a neural network model that consists of the following modules: resource allocating network, long-term & short-term memory, and environment change detector. We evaluate the model under a class of dynamic environments where multiple function approximation tasks are sequentially given. The experimental results demonstrate that the proposed model possesses stable incremental learning, accurate environmental change detection, proper association and recall of old knowledge, and efficient knowledge transfer.
The Institute of Electrical Engineers of Japan, 2010, 電気学会論文誌 C, 130 (1), 21 - 28, JapaneseThis paper presents a new learning algorithm for multitask pattern recognition (MTPR) problems. We consider learning multiple multiclass classification tasks online where no information is ever provided about the task category of a training example. The algorithm thus needs an automated task recognition capability to properly learn the different classification tasks. The learning mode is "online" where training examples for different tasks are mixed in a random fashion and given sequentially one after another. We assume that the classification tasks are related to each other and that both the tasks and their training examples appear in random during "online training." Thus, the learning algorithm has to continually switch from learning one task to another whenever the training examples change to a different task. This also implies that the learning algorithm has to detect task changes automatically and utilize knowledge of previous tasks for learning new tasks fast. The performance of the algorithm is evaluated for ten MTPR problems using five University of California at Irvine (UCI) data sets. The experiments verify that the proposed algorithm can indeed acquire and accumulate task knowledge and that the transfer of knowledge from tasks already learned enhances the speed of knowledge acquisition on new tasks and the final classification accuracy'. In addition, the task categorization accuracy is greatly improved for all MTPR problems by introducing the reorganization process even if the presentation order of class training examples is fairly biased.
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, Mar. 2009, IEEE TRANSACTIONS ON NEURAL NETWORKS, 20 (3), 430 - 445, English[Refereed]
Scientific journal
A macro-action is a typical series of useful actions that brings high expected rewards to an agent. Murata et al. have proposed an Actor-Critic model which can generate macro-actions automatically based on the information on state values and visiting frequency of states. However, their model has not assumed that generated macro-actions are utilized for leaning different tasks. In this paper, we extend the Murata's model such that generated macro-actions can help an agent learn an optimal policy quickly in multi-task Grid-World (MTGW) maze problems. The proposed model is applied to two MTGW problems, each of which consists of six different maze tasks. Prom the experimental results, it is concluded that the proposed model could speed up learning if macro-actions are generated in the so-called correlated regions. © 2009 The Institute of Electrical Engineers of Japan.
Institute of Electrical Engineers of Japan, 2009, IEEJ Transactions on Electronics, Information and Systems, 129 (4), 21 - 743, English[Refereed]
Scientific journal
This paper presented a novel active linear discriminant analysis (LDA) learning method in the form of curiosity-driven incremental LDA (cILDA) and multiple cILDA agents cooperative learning (mcILDA). The curiosity in psychology here is modelled mathematically as a discriminability residue in-between instance space and its corresponding eigenspace. As the learning proceeds, the curiosity of an individual agent updates over time by two incremental learning processes: One updates the characterization of eigenspace and another re-calculates the curiosity. In the multi-agent scenario, individual agent communicates and cooperates with each other at every learning stage to discover the discriminant characterization of the whole pattern. In the experiment, we described how the discriminative instances could be significantly selected based on the curiosity with, at most, minor sacrifices in learning rate and classification accuracy. The experimental results show that the proposed curiosity learning performs gracefully under different level of redundancy, and the proposed cILDA/mcILDA learning system is capable of learning less instances, but has more often an improved discrimination performance.
IEEE, 2009, IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, pp. 2401-2408, 1616 - +, English[Refereed]
International conference proceedings
This paper presents an online feature extraction method called Incremental Recursive Fisher Linear Discriminant (IRFLD) whose batch learning algorithm called RFLD has been proposed by Xiang et al. In the conventional Linear Discriminant Analysis (LDA), the number of discriminant vectors is limited to the number of classes minus one due to the rank of the between-class scatter matrix. RFLD and the proposed IRFLD can eliminate this limitation. In the proposed IRFLD, the Pang et al.'s Incremental Linear Discriminant Analysis (ILDA) is extended such that effective discriminant vectors are recursively searched for the complementary space of a conventional ILDA subspace. In addition, to estimate a suitable number of effective discriminant vectors, we also propose a convergence criterion for the recursive computations which is defined by using the class separability of discriminant features projected on the complementary subspace. The experimental results suggest that the recognition accuracies of IRFLD is improved as the learning proceeds. For several datasets, we confirm that the proposed IRFLD outperforms ILDA in terms of the recognition accuracy. However, the advantage of IRFLD against ILDA depends on datasets.
IEEE, 2009, IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, pp. 2310-2315, 2671 - 2676, English[Refereed]
International conference proceedings
In this paper, we propose a new Chunk IPCA algorithm in which an optimal threshold of accumulation ratio is adaptively selected such that the classification accuracy is maximized for a validation data set. In order to obtain a proper set of validation data, an online clustering method called Evolving Clustering Method (ECM) is introduced into Chunk IPCA. In the proposed Chunk IPCA called CIPCA-ECM, training data are first separated into the subsets of every class; then, ECM is applied to each subset to update the validation data set. In the experiments, the evaluation of the proposed Chunk IPCA algorithm is carried out using the four UCI data sets and the effectiveness of updating the threshold is discussed. The results suggest that the incremental learning of an eigenspace in the proposed CIPCA-ECM is stably carried out, and a compact and effective eigenspace is obtained over the entire learning stages. The recognition accuracy of CIPCA-ECM is almost equal to the best performance of CIPCA-FIX in which an optimal threshold is manually predetermined.
IEEE, 2009, IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, pp. 2394-2400, 2889 - +, English[Refereed]
International conference proceedings
We have proposed all online feature extraction method called Chunk Incremental Principal Component Analysis (Chunk IPCA) where a chunk of data is trained at a time to update an eigenspace model. In this paper, we propose an extended version of Chunk IPCA in which a proper threshold for the accumulation ratio is adaptively determined such that the highest classification accuracy is maintained for a validation data set. Whenever a new chunk of training data is given, the validation set is updated in all online fashion by using the k-means clustering or through the prototype selection based oil the classification results. The experimental results show that the extended version of Chunk IPCA call determine a proper threshold oil an ongoing basis, resulting in keeping higher classification accuracy than the original Chunk IPCA..
SPRINGER-VERLAG BERLIN, 2009, ADVANCES IN NEURO-INFORMATION PROCESSING, PT I, 5506, 1196 - +, English[Refereed]
International conference proceedings
hi this paper, we propose a new incremental linear discriminant analysis (ILDA) for multitask pattern recognition (MTPR) problems in which training samples of a specific recognition task are given one after another for a certain period of time and the task is switched to another related task in turn. The Pang et al.'s ILDA is extended such that a discriminant space of the current task is augmented with effective discriminant vectors that are selected from other related tasks based on the class separability. We call the selection and augmentation of such discriminant vectors knowledge transfer of feature subspaces. Tit the experiments, the proposed ILDA is evaluated for the four MTPR, problems, each of which consists of three multi-class recognition tasks. The results demonstrate that the proposed ILDA outperforms the ILDA without the knowledge transfer with regard to both the class separability and recognition accuracy. Front the experimental results, we confirm that the proposed knowledge transfer mechanism works well to construct effective discriminant feature spaces incrementally.
SPRINGER-VERLAG BERLIN, 2009, ADVANCES IN NEURO-INFORMATION PROCESSING, PT I, 5506, 1163 - +, English[Refereed]
International conference proceedings
In this paper, we propose a new multitask learning (MTL) model which can learn a series of multi-class pattern recognition problems stably. The knowledge transfer in the proposed MTL model is implemented by the following mechanisms: (1) transfer by sharing the internal representation of RBFs and (2) transfer of the information on class subregions from the related tasks. The proposed model can detect task changes on its own based on the output errors even though no task information is given by the environment. It also learn training samples of different tasks that are given one after another. In the experiments, the recognition performance is evaluated for the eight MTPR problems which are defined from the four UCI data sets. The experimental results demonstrate that the proposed MTL model outperforms a single-task learning model in terms of the final classification accuracy. Furthermore, we show that the transfer of class subregion contributes to enhancing the generalization performance of a new task with less training samples.
SPRINGER-VERLAG BERLIN, 2009, ADVANCES IN NEURO-INFORMATION PROCESSING, PT I, 5506, 821 - +, English[Refereed]
International conference proceedings
We propose two methods for tuning membership functions of a kernel fuzzy classifier based on the idea of SVM (support vector machine) training. We assume that in a kernel fuzzy classifier a fuzzy rule is defined for each class in the feature space. In the first method, we tune the slopes of the membership functions at the same time so that the margin between classes is maximized under the constraints that the degree of membership to which a data sample belongs is the maximum among all the classes. This method is similar to a linear all-at-once SVM. We call this AAO tuning. In the second method, we tune the membership function of a class one at a time. Namely, for a class the slope of the associated membership function is tuned so that the margin between the class and the remaining classes is maximized under the constraints that the degrees of membership for the data belonging to the class are large and those for the remaining data are small. This method is similar to a linear one-against-all SVM. This is called OAA tuning. According to the computer experiment for fuzzy classifiers based on kernel discriminant analysis and those with ellipsoidal regions, usually both methods improve classification performance by tuning membership functions and classification performance by AAO tuning is slightly better than that by OAA tuning. © 2009 Springer-Verlag.
2009, Memetic Computing, 1 (3), 221 - 228, English[Refereed]
Scientific journal
A macro-action is a typical series of useful actions that brings high expected rewards to an agent. Murata et al. have proposed an Actor-Critic model which can generate macro-actions automatically based on the information on state values and visiting frequency of states. However, their model has not assumed that generated macro-actions are utilized for leaning different tasks. In this paper, we extend the Murata's model such that generated macro-actions can help an agent learn an optimal policy quickly in multi-task Grid-World (MTGW) maze problems. The proposed model is applied to two MTGW problems, each of which consists of six different maze tasks. From the experimental results, it is concluded that the proposed model could speed up learning if macro-actions are generated in the so-called correlated regions.
IEEE, 2009, 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 6 pages, 3088 - 3093, English[Refereed]
International conference proceedings
Selecting proper parameters of RBF networks has been a puzzling problem even for batch learning. The parameter selection is usually carried out by an external supervisor. To exclude the intervention by an external supervisor from the parameter selection, we propose a new learning scheme called Autonomous Learning algorithm for Resource Allocating Network (AL-RAN). AL-RAN is an incremental learning algorithm which consists of the following functions: automated data, normalization and automated adjustment of RBF widths. In the experiments, we evaluate AL-RAN using nine benchmark datasets in terms of the decision accuracy of data normalization and the final classification accuracy. The experimental results demonstrate that the above two functions in AL-RAN work well and the final classification accuracy of AL-RAN is almost the same as that, of a non-autonomous model whose parameters are manually tuned by an external supervisor.
SPRINGER-VERLAG BERLIN, 2009, INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, PROCEEDINGS, 5788, 134 - +, English[Refereed]
International conference proceedings
To learn things incrementally without the catastrophic interference, we have proposed Resource Allocating Network with Long-Term Memory (RAN-LTM). In RAN-LTM, not only training data but also memory items stored in long-term memory are trained. In this paper, we propose an extended RAN-LTM called Resource Allocating Network by Local Linear Regression (RAN-LLR), in which its centers are not trained but selected based on output errors and the connections are updated by solving a linear regression problem. To reduce the computation and memory costs, the modified connections are restricted based on RBF activity. In the experiments, we first apply RAN-LLR to a one-dimensional function approximation problem to see how the negative interference is effectively suppressed. Then, the performance of RAN-LLR is evaluated for a real-world prediction problem. The experimental results demonstrate that the proposed RAN-LLR can learn fast and accurately with less memory costs compared with the conventional models.
SPRINGER-VERLAG BERLIN, 2009, NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 5863, 562 - 569, English[Refereed]
International conference proceedings
This paper presents a pattern classification system in which feature extraction and classifier learning are simultaneously carried out not only online but also in one pass where training samples are presented only once. For this purpose, we have extended incremental principal component analysis (IPCA) and some classifier models were effectively combined with it. However, there was a drawback in this approach that training samples must be learned one by one due to the limitation of IPCA. To overcome this problem, we propose another extension of IPCA called chunk IPCA in which a chunk of training samples is processed at a time. In the experiments, we evaluate the classification performance for several large-scale data sets to discuss the scalability of chunk IPCA under one-pass incremental learning environments. The experimental results suggest that chunk IPCA can reduce the training time effectively as compared with IPCA unless the number of input attributes is too large. We study the influence of the size of initial training data and the size of given chunk data on classification accuracy and learning time. We also show that chunk IPCA can obtain major eigenvectors with fairly good approximation.
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, Jun. 2008, IEEE TRANSACTIONS ON NEURAL NETWORKS, 19 (6), 1061 - 1074, English[Refereed]
Scientific journal
We have proposed an online feature extraction method called Chunk Incremental Principal Component Analysis (CIPCA) where a chunk of data is trained at a time to update an eigenspace model. This paper presents an extended version in which the threshold for accumulation ratio is adaptively determined so that the classification accuracy for validation data is always maximized. To define the validation set online, the prototypes are selected from given training samples by k-means clustering or nearest neighbor classifier. The experimental results show that the proposed CIPCA can update the threshold properly so as to maintain high classification accuracy.
IEEE, 2008, 2008 PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-7, 2370 - +, English[Refereed]
International conference proceedings
In this paper, a novel face recognition system is presented in which not only a classifier but also a feature space is learned incrementally to adapt to a chunk of incoming training samples. A distinctive feature of the proposed system is that the selection of useful features and the learning of an optimal decision boundary are conducted in an online fashion. In the proposed system, Chunk Incremental Principal Component Analysis (CIPCA) and Resource Allocating Network with Long-Term Memory are effectively combined. In the experiments, the proposed face recognition system is evaluated for a self-compiled face image database. The experimental results demonstrate that the test performance of the proposed system is consistently improved over the learning stages, and that the learning speed of a feature space is greatly enhanced by CIPCA.
SPRINGER-VERLAG BERLIN, 2008, NEURAL INFORMATION PROCESSING, PART II, 4985, 396 - +, English[Refereed]
International conference proceedings
This paper presents a learning model of multitask pattern recognition (MTPR) which is constructed by several neural classifiers, long-term memories, and the detector Of task changes. In the MTPR problem, several multi-class classification tasks are sequentially given to the learning model without notifying their task categories. This implies that the learning model is supposed to detect task changes by itself and to utilize the knowledge on the previous tasks for learning of new tasks. In addition, the learning model must acquire knowledge of multiple tasks incrementally without unexpected forgetting under the condition that not only tasks but also training samples are sequentially given. The proposed model is evaluated for two artificial MTPR problem. In the experiments, we verify that the proposed model can acquire and accumulate task knowledge very stably and the speed of knowledge acquisition for new tasks is enhanced by transferring knowledge.
IEEE COMPUTER SOC, 2008, SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, pp. 747- 751, 747 - +, English[Refereed]
International conference proceedings
Independent component analysis (ICA) is a technique of transforming observation signals into their unknown independent components; hence, ICA has often been applied to blind signal separation problems. In this application, it is expected that tile obtained independent components extract essential information of independent signal sources from input data in an unsupervised fashion. Based on Such characteristics, ICA is currently utilized as a feature extraction method for images and sounds for recognition purposes. However, since ICA is an unsupervised learning, the obtained independent components are not always useful in recognition. To overcome this problem, we propose a supervised approach to ICA using category information. The proposed method is implemented in a conventional three-layered neural network, but its objective function to be minimized is defined not only for the output layer but also for the hidden layer. The objective function consists of the following two terms: one evaluates the kurtosis of hidden unit outputs and the other evaluates the error between Output signals and their teacher signals. The experiments are performed using several standard datasets to evaluate performance of the proposed algorithm. It is confirmed that a higher recognition accuracy is attained by the proposed method as compared with a conventional ICA algorithm. (c) 2007 Wiley Periodicals, Inc.
SCRIPTA TECHNICA-JOHN WILEY & SONS, Nov. 2007, ELECTRICAL ENGINEERING IN JAPAN, 161 (2), 25 - 32, English[Refereed]
Scientific journal
In this paper, we present a new method to enhance classification performance of a multiple classifier system by combining a boosting technique called AdaBoost.M2 and Kernel Discriminant Analysis (KDA). To reduce the dependency between classifier outputs and to speed up the learning, each classifier is trained in a different feature space, which is obtained by applying KDA to a small set of hard-to-classify training samples. The training of the system is conducted based on AdaBoot.M2, and the classifiers are implemented by Radial Basis Function networks. To perform KDA at every boosting round in a realistic time scale, a new kernel selection method based on the class separability measure is proposed. Furthermore, a new criterion of the training convergence is also proposed to acquire good classification performance with fewer boosting rounds. To evaluate the proposed method, several experiments are carried out using standard evaluation datasets. The experimental results demonstrate that the proposed. method can select an optimal kernel parameter more efficiently than the conventional cross-validation method, and that the training of boosting classifiers is terminated with a fairly small number of rounds to attain good classification accuracy. For multi-class classification problems, the proposed method outperforms both Boosting Linear Discriminant Analysis (BLDA) and Radial-Basis Function Network (RBFN) with regard to the classification accuracy. On the other hand, the performance evaluation for 2-class problems shows that the advantage of the proposed BKDA against BLDA and RBFN depends on the datasets.
IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG, Nov. 2007, IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, E90D (11), 1853 - 1863, English[Refereed]
Scientific journal
In this paper, a feature extraction method for online classification problems is proposed by extending Kernel Principal Component Analysis (KPCA). In our previous work, we proposed an incremental KPCA algorithm which could learn a new input incrementally without keeping all the past training data. In this algorithm, eigenvectors are represented by a linear sum of linearly independent data which are selected from given training data. A serious drawback of the previous IKPCA is that many independent data are prone to be selected during learning and this causes large computation and memory costs. For this problem, we propose a novel approach to the selection of independent data that is, they are not selected in the high-dimensional feature space but in the low-dimensional eigenspace spanned by the current eigenvectors. Using this method, the number of independent data is restricted to the number of eigenvectors. This restriction makes the learning of the modified IKPCA (M-IKPCA) very fast without loosing the approximation accuracy against true eigenvectors. To verify the effectiveness of M-IKPCA, the learning time and the accuracy of eigenspaces are evaluated using two UCI benchmark datasets. As a result, we confirm that the learning of M-IKPCA is at least 5 times faster than the previous version of IKPCA. ©2007 IEEE.
2007, IEEE International Conference on Neural Networks - Conference Proceedings, 2346 - 2351, English[Refereed]
International conference proceedings
In this paper, a new approach to face recognition is presented in which not only a classifier but also a feature space is learned incrementally to adapt to a chunk of training samples. A benefit of this type of incremental learning is that the search for useful features and the learning of an optimal decision boundary are carried out in an online fashion. To implement this idea, Chunk Incremental Principal Component Analysis (IPCA) and Resource Allocating Network with Long-Term Memory are effectively combined. Using Chunk IPCA, a feature space is updated by rotating its eigen-axes and increasing the dimensions to adapt to a chunk of given training samples. In the experiments, the proposed incremental learning model is evaluated over a self-compiled face image database. As the result, we verify that the proposed model works well without serious forgetting and the test performance is improved as the learning stages proceed.
IEEE, 2007, PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-8, 1963 - 1966, English[Refereed]
International conference proceedings
In this paper, a feature extraction method for online classification problems is proposed by extending Kernel Principal Component Analysis (KPCA). In our previous work, we proposed an incremental KPCA algorithm which could learn a new input incrementally without keeping all the past training data. In this algorithm, eigenvectors are represented by a linear sum of linearly independent data which are selected from given training data. A serious drawback of the previous IKPCA is that many independent data are prone to be selected during learning and this causes large computation and memory costs. For this problem, we propose a novel approach to the selection of independent data; that is, they are not selected in the high-dimensional feature space but in the low-dimensional eigenspace spanned by the current eigenvectors. Using this method, the number of independent data is restricted to the number of eigenvectors. This restriction makes the learning of the modified IKPCA (M-IKPCA) very fast without loosing the approximation accuracy against true eigenvectors. To verify the effectiveness of M-IKPCA, the learning time and the accuracy of eigenspaces are evaluated using two UCI benchmark datasets. As a result, we confirm that the learning of M-IKPCA is at least 5 times faster than the previous version of IKPCA.
IEEE, 2007, 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, CD-ROM (6 pages), 2346 - 2351, English[Refereed]
International conference proceedings
This paper presents a new boosting algorithm called Boosting Kernel Discriminant Analysis (BKDA) in which the feature selection and the classifier training are conducted by Kernel Discriminant Analysis (KDA) and AdaBoost.M2, respectively. To reduce the dependency between classifier outputs and to speed up the learning, each classifier is trained in the different feature space which is obtained by applying KDA to a small set of hard-to-classify training samples. The proposed BKDA is evaluated using standard benchmark datasets. The experimental results demonstrate that BKDA outperforms both Boosting Linear Discriminant Analysis (BLDA) and Support Vector Machine (SVM) for multi-class classification problems. On the other hand, the performance evaluation for 2-class problems shows that the advantage of the proposed BKDA against BLDA and SVM depends on the datasets.
IEEE, 2007, 2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, CD-ROM (4 pages), 674 - +, English[Refereed]
International conference proceedings
Independent component analysis (ICA) is a technique of transforming observation signals into their unknown independent components; hence ICA has been often applied to blind signal separation problems. In this application, it is expected that the obtained independent components extract essential information of independent signal sources from input data in an unsupervised fashion. Based on such characteristics, ICA is recently utilized as a feature extraction method for images and sounds for recognition purposes. However, since ICA is an unsupervised learning, the obtained independent components are not always useful in recognition. To overcome this problem, we propose a supervised approach to ICA using category information. The proposed method is implemented in a conventional three-layered neural network, but its objective function to be minimized is defined for not only the output layer but also the hidden layer. The objective function consists of the following two terms: one evaluates the kurtosis of hidden unit outputs and the other evaluates the error between output signals and their teacher signals. The experiments are performed for some standard datasets to evaluate the proposed algorithm. It is verified that higher recognition accuracy is attained by the proposed method as compared with a conventional ICA algorithm.
The Institute of Electrical Engineers of Japan, 01 Apr. 2006, IEEJ Transactions on Electronics, Information and Systems, 126 (4), 542 - 547, JapaneseA new concept for pattern classification systems is proposed in which the feature selection and the learning classifier are simultaneously carried out on-line. An advantage of this concept is that classification systems can improve their performance constantly even if insufficient training samples are given when the learning starts, often resulting in inappropriate feature selection and poor classifier performance. To implement this concept, we propose an adaptive evolving connectionist model in which Incremental Principal Component Analysis and Evolving Clustering Method are effectively combined. The proposed on-line learning scheme has two major desirable properties. First, the performance is improved as the learning proceeds and it converges to an acceptable level from any initial conditions. Second, the learning is sequentially carried out without retaining all the training data given so far; thus, the learning is conducted efficiently in term of the computation and memory costs. To evaluate the proposed model, the recognition performance is investigated using three standard datasets in the UCI machine learning repository. From the experimental results, we verify that the proposed scheme possesses the above two characteristics.
ICIC INTERNATIONAL, Feb. 2006, INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2 (1), 181 - 192, English[Refereed]
Scientific journal
In this paper, a feature extraction method for online classification problems is presented by extending Kernel Principal Component Analysis (KPCA). ne proposed incremental KPCA (IKPCA) constructs a nonlinear high-dimensional feature space incrementally by not only updating eigen-axes but also adding new eigen-axes. The augmentation of a new eigen-axis is carried out when the accumulation ratio falls below a threshold value. We mathematically derive the incremental update equations of eigen-axes and the accumulation ratio without keeping all training samples. From the experimental results, we conclude that the proposed IKPCA works well as an incremental learning algorithm of a feature space in the sense that a minimum number of axes are augmented to maintain a designated accumulation ratio, and that the eigenvectors with major eigenvalues can converge closely to those of the batch type of KPCA. In addition, the recognition accuracy of IKPCA is similar to or slightly better than that of KPCA.
IEEE, 2006, INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 1, PROCEEDINGS, Vol. 1, pp. 595-600, 595 - +, English[Refereed]
International conference proceedings
[Refereed]
Scientific journal
This paper presents a new algorithm of dynamic feature selection by extending the algorithm of Incremental Principal Component Analysis (IPCA), which has been originally proposed by Hall and Martin. In the proposed IPCA, a chunk of training samples can be processed at a time to update the eigenspace of a classification model without keeping all the training samples given so far. Under the assumption that L of training samples are given in a chunk, first we derive a new eigenproblem whose solution gives us a rotation matrix of eigen-axes, then we introduce a new algorithm of augmenting eigen-axes based on the accumulation ratio. We also derive the one-pass incremental update formula for the accumulation ratio. The experiments are carried out to verify if the proposed IPCA works well. Our experimental results demonstrate that it works well independent of the size of data chunk, and that the eigenvectors for major components are obtained without serious approximation errors at the final learning stage. In addition, it is shown that the proposed IPCA can maintain the designated accumulation ratio by augmenting new eigen-axes properly. This property enables a learning system to construct an informative eigenspace with minimum dimensionality.
IEEE, 2006, 2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, pp. 10493-10500, 2278 - +, English[Refereed]
International conference proceedings
In this paper, we propose an incremental learning model for ensemble classifier systems. In the proposed model, the number of classifiers is predetermined and fixed during the learning, and all classifiers are updated at every learning stage based on an extended algorithm of AdaBoost.M1. A neural network model called Resource Allocating Network with Long-Term Memory (RAN-LTM), which has been developed to realize stable incremental learning, is adopted as a classifier. We also propose a new method to update the classifier weights in the weighted majority voting under the one-pass incremental learning situations. In the experiments, first we verify that the proposed model can learn incrementally without serious forgetting and that the performance is not influenced seriously by the size of a training subset given at every learning stage. Then, through a comparison with Resource Allocating Network (RAN), RAN-LTM, and AdaBoost.M1, we demonstrate that the proposed incremental ensemble classifier system has comparable performance with a batch-learning ensemble classifier system, and that it outperforms both batch-learning and incremental-learning single-classifier systems.
IEEE, 2006, 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, pp. 6453-6459, 3421 - +, English[Refereed]
International conference proceedings
This paper presents a constructive method for deriving an updated discriminant eigenspace for classification when bursts of data that contains new classes is being added to an initial discriminant eigenspace in the form of random chunks. Basically, we propose an incremental linear discriminant analysis (ILDA) in its two forms: a sequential ILDA and a Chunk ILDA. In experiments, we have tested ILDA using datasets with a small number A classes and small-dimensional features, as well as datasets with a large number of classes and large-dimensional features. We have compared the proposed ILDA against the traditional batch LDA in terms of discriminability, execution time and memory usage with the increasing volume of data addition. The results show that the proposed ILDA can effectively evolve a discriminant eigenspace over a fast and large data stream, and extract features with superior discriminability in classification, when compared with other methods.
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, Oct. 2005, IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 35 (5), 905 - 914, English[Refereed]
Scientific journal
We describe an application of independent component analysis (ICA) to pattern recognition in order to evaluate the effectiveness of features extracted by ICA. We propose a recognition method suitable for independent components that consists of modules for each category. A module has two parts: feature extraction and classification. Features are independent components estimated by ICA and outputs of modules are candidates for categories. These candidates are combined and categories are decided with a majority rule. This recognition method is applied to two tasks: hand-written digits in the MNIST database and acoustic diagnosis for a compressor as real-world tasks. A FastICA algorithm is applied to extracting independent features in the proposed method. Through recognition experiments, we demonstrate that the ICA of each category extracts useful features for these tasks and the independent components are superior to the principal components in recognition accuracy.
SPRINGER, Oct. 2005, NEURAL PROCESSING LETTERS, 22 (2), 113 - 124, English[Refereed]
Scientific journal
We have proposed a new approach to pattern recognition in which not only a classifier but also a feature space of input variables is learned incrementally. In this paper, an extended version of Incremental Principal Component Analysis (IPCA) and Resource Allocating Network with Long-Term Memory (RAN-LTM) are effectively combined to implement this idea. Since IPCA updates a feature space incrementally by rotating the eigen-axes and increasing the dimensions, the inputs of a neural classifier must also change in their values and the number of input variables. To solve this problem, we derive an approximation of the update formula for memory items, which correspond to representative training samples stored in the long-term memory of RAN-LTM. With these memory items, RAN-LTM is efficiently reconstructed and retrained to adapt to the evolution of the feature space. This function is incorporated into our face recognition system. In the experiments, the proposed incremental learning model is evaluated over a self-compiled video clip of 24 persons. The experimental results show that the incremental learning of a feature space is very effective to enhance the generalization performance of a neural classifier in a realistic face recognition task. (c) 2005 Elsevier Ltd. All rights reserved.
PERGAMON-ELSEVIER SCIENCE LTD, Jun. 2005, NEURAL NETWORKS, 18 (5-6), 575 - 584, English[Refereed]
Scientific journal
Applications of independent component analysis (ICA) to feature extraction have been a topic of research interest. However, the effectiveness of pattern features extracted by conventional ICA algorithms greatly depends on datasets in general. As one of the reasons, we have pointed out that conventional ICA features are obtained by increasing only their independence even if class information is available. In this paper, we propose a supervised learning approach to ICA to extract useful and robust features. The proposed method consists of several modules, each of which is responsible for extracting features for each class and identifying the class labels using the k nearest neighbor classifier. All the module outputs are combined to identify final results based on a majority rule. We evaluate the performance of the proposed method in several recognition tasks. From these results, we confirm the effectiveness of the recognition method using independent components for each class.
The Institute of Electrical Engineers of Japan, 01 May 2005, IEEJ Transactions on Electronics, Information and Systems, 125 (5), 807 - 812, JapaneseIn this paper, we present a new method to enhance classification performance based on Boosting by introducing nonlinear discriminant analysis as feature selection. To reduce the dependency between hypotheses, each hypothesis is constructed in a different feature space formed by Kernel Discriminant Analysis (KDA). Then, these hypotheses are integrated based on AdaBoost. To conduct KDA in each Boosting iteration within realistic time, a new method of kernel selection is also proposed. Several experiments are carried out for the blood cell data and thyroid data to evaluate the proposed method. The result shows that it is almost the same as the best performance of Support Vector Machine without any time-consuming parameter search.
SPRINGER-VERLAG WIEN, 2005, Adaptive and Natural Computing Algorithms, 429 - 432, English[Refereed]
International conference proceedings
This paper presents a constructive method for deriving an updated discriminant eigenspace for classification, when bursts of new classes of data is being added to an initial discriminant eigenspace in the form of random chunks. The proposed Chunk incremental linear discriminant analysis (I-LDA) can effectively evolve a discriminant eigenspace over a fast and large data stream, and extract features with superior discriminability in classification, when compared with other methods.
SPRINGER-VERLAG BERLIN, 2005, ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 3497, 51 - 56, English[Refereed]
Scientific journal
In this paper, a new approach to face recognition is presented in which not only a classifier but also a feature space of input variables is learned incrementally to adapt to incoming training samples. A benefit of this type of incremental learning is that the search for useful features and the learning of an optimal decision boundary are carried out in an online fashion. To implement this idea, an extended version of Incremental Principal Component Analysis (IPCA) and Resource Allocating Network with Long-Term Memory (RAN-LTM) are effectively combined. Using IPCA, a feature space is updated by rotating its eigen-axes and increasing the dimensions to adapt to a new training sample. In RAN-LTM, a small number of training samples called memory items are selected and they are utilized for retraining a classifier to realize an excellent incremental ability. To accommodate the classifier to the evolution of the feature space, we present a way to reconstruct the neural classifier without keeping all of the training samples given previously. In the experiments, the proposed incremental learning model is evaluated over a self-compiled face image database. As the result, we verify that the proposed model works well without serious forgetting and the test performance is improved as the learning stages proceed.
IEEE, 2005, Proceedings of the International Joint Conference on Neural Networks (IJCNN), 5, 3174 - 3179, English[Refereed]
International conference proceedings
It is important to detect flammable or poisonous gas leaked from the cracks in pipes of petroleum refining plants or chemical plants. We applied a novel strategy of construction of neural network to the acoustic diagnosis technique for the gas leakage. An example of the modular neural network to realize the strategy is able to adapt its structure according to the dynamic environment. Experiments were performed for an artificial gas leakage device under various experimental conditions over about 18 months in a petroleum refining plant. Experimental results showed that the proposed network could adapt the structure to changes in environments and its performance was superior to that of feed-forward networks with the re-training strategy. From these results, we confirmed the effectiveness of the modular neural network for practical use. (C) 2004 Elsevier B.V. All rights reserved.
ELSEVIER SCIENCE BV, Dec. 2004, NEUROCOMPUTING, 62, 427 - 440, English[Refereed]
Scientific journal
When training samples are given incrementally, neural networks often suffer from the catastrophic interference, which results in forgetting input-output relationships acquired in the past. To avoid the catastrophic interference, we have proposed Resource Allocating Network with Long-Term Memory (RAN-LTM). In RAN-LTM, not only a new training sample but also some memory items stored in long-term memory are used for training based on a gradient descent algorithm. In general, the gradient descent algorithm is usually slow and can be easily fallen into local minima. In this paper, to alleviate these problems, we introduce a linear regression approach into the learning of RAN-LTM, in which its centers are not trained but selected based on output errors in an incremental fashion. In this approach, the regression is carried out for not only a training sample and memory items but also pseudodata that are selected around the centers of hidden units based on the complexity of an approximated function. This selection reduces the total number of pseudodata at each learning step; as a result, fast incremental learning is realized in RAN-LTM. Since only memory items are stored in memory, the proposed RAN-LTM does not need so much memory capacity when the incremental learning is carried out. This property is useful especially for small-scale systems. To verify these characteristics of RAN-LTM, we apply it to several function approximation problems, in which the performance in approximation accuracy, learning time, and needed memory capacity are investigated by comparison with some conventional models. Moreover, when extending the learning domain with time, the increase trends in learning time and needed memory capacity are investigated. From the experimental results, it is verified that the proposed model can learn fast and accurately, and that it needs rather small memory capacity so far as the learning domain is not too large.
The Society of Instrument and Control Engineers, 2004, Transactions of the Society of Instrument and Control Engineers, 40 (12), 1227 - 1235, JapaneseRecently, Independent Component Analysis (ICA) has been applied to not only problems of blind signal separation, but also feature extraction of patterns. However, the effectiveness of pattern features extracted by conventional ICA algorithms depends on pattern sets; that is, how patterns are distributed in the feature space. As one of the reasons, we have pointed out that ICA features are obtained by increasing only their independence even if the class information is available. In this context, we can expect that more high-performance features can be obtained by introducing the class information into conventional ICA algorithms.
In this paper, we propose a supervised ICA (SICA) that maximizes Mahalanobis distance between features of different classes as well as maximize their independence. In the first experiment, two-dimensional artificial data are applied to the proposed SICA algorithm to see how maximizing Mahalanobis distance works well in the feature extraction. As a result, we demonstrate that the proposed SICA algorithm gives good features with high separability as compared with principal component analysis and a conventional ICA. In the second experiment, the recognition performance of features extracted by the proposed SICA is evaluated using the three data sets of UCI Machine Learning Repository. From the results, we show that the better recognition accuracy is obtained using our proposed SICA. Furthermore, we show that pattern features extracted by SICA are better than those extracted by only maximizing the Mahalanobis distance.
[Refereed]
International conference proceedings
When environments are dynamically varied for agents. the knowledge acquired from an environment would be useless in the future environments. Thus, agents should be able to not only acquire new knowledge but also modify old knowledge in learning. However. modifying all acquired Knowledge is not always efficient. Because the knowledge once acquired may be useful again when the same (or similar) environment reappears. Moreover, some of the knowledge can be shared among different environments. To learn efficiently in such a situation, we propose a neural network model that consists of the following four modules: resource allocating network, long-term memory, association buffer. and environmental change detector. We apply this model to a simple dynamic environment in which several target functions to be approximated are varied in turn.
IEEE, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, PROCEEDINGS, 437 - 442, English[Refereed]
International conference proceedings
Independent Component Analysis (ICA) is a method to transform from mixed signals into independent components. ICA has been so far applied to blind signal separation problems such as sound, speech, images, and biological signals. Recently, ICA is applied to feature extraction for face, speech, and image recognitions. Since ICA is an unsupervised learning, extracted independent components are not always useful for recognition purposes. In this paper, we propose a new supervised learning approach to ICA using class information to enhance the separability of features. The proposed method is implemented by a three-layered feedforward network in which target signals are given to the output units. The defined objective function is composed of the following two terms: one is for evaluating independency of hidden outputs and the other is for evaluating errors between output signals and their targets. Simulations are performed for some datasets in the UCI repository to evaluate the effectiveness of the proposed method. In the proposed method, we obtain higher recognition accuracies as compared with a conventional unsupervised ICA algorithm.
SPRINGER-VERLAG BERLIN, 2004, NEURAL INFORMATION PROCESSING, 3316, 1052 - 1057, English[Refereed]
Scientific journal
Real membership authentication applications require machines to learn from stream data while making a decision as accurately as possible whenever the authentication is needed. To achieve that, we proposed a novel algorithm which authenticated membership by a one-pass incremental principle component analysis(IPCA) learning. It is demonstrated that the proposed algorithm involves an useful incremental feature construction in membership authentication, and the incremental learning system works optimally due to its performance is converging to the performance of a batch learning system.
SPRINGER-VERLAG BERLIN, 2004, BIOMETRIC AUTHENTICATION, PROCEEDINGS, 3072, 155 - 161, English[Refereed]
Scientific journal
We have proposed a new concept for pattern classification systems in which feature selection and classifier learning are simultaneously carried out on-line. To realize this concept, Incremental Principal Component Analysis (IPCA) and Evolving Clustering Method (ECM) was effectively combined in the previous work. However, in order to construct a desirable feature space, a threshold value to determine the increase of a new feature should be properly given in the original IPCA. To alleviate this problem, we can adopt the accumulation ratio as its criterion. However, in incremental situations, the accumulation ratio must be modified every time a new sample is given. Therefore, to use this ratio as a criterion, we also need to develop a one-pass update algorithm for the ratio. In this paper, we propose an improved algorithm of IPCA in which the accumulation ratio as well as the feature space can be updated online without all the past samples. To see if correct feature construction is carried out by this new IPCA algorithm, the recognition performance is evaluated for some standard datasets when ECM is adopted as a prototype learning method in Nearest Neighbor classifier.
SPRINGER-VERLAG BERLIN, 2004, PRICAI 2004: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 3157, 231 - 240, English[Refereed]
Scientific journal
It is important to detect gas leakage sounds from pipes in petroleum refining plants and chemical plants, as often the gas used in these plants are flammable or poisonous. In order to detect the leakage accurately, we should select a feature extraction method for sounds properly. The purpose of this paper is to examine whether independent component analysis (ICA) is useful as a feature extraction method. Several experiments are performed in a plant using an artificial gas leakage device under various experimental conditions. A separating matrix that separates the independent components from collected leakage sounds and background noises is trained by an ICA algorithm. Through several simulations, we find that most basis functions acquired from this training are localized in frequency. Furthermore, there are remarkable differences in amplitude of some independent components between leakage sounds and background noises. From these results, we confirm that the feature extraction using the ICA algorithm is very useful for detecting gas leakage sounds.
THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS (ISCIE), 15 Oct. 2003, Transactions of the Institute of Systems, Control and Information Engineers, 16 (10), 539 - 547, JapaneseTo avoid the catastrophic interference in incremental learning, we have proposed Resource Allocating Network with Long Term Memory (RAN-LTM). In RAN-LTM, not only a new training sample but also some memory items stored in Long-Term Memory are trained based on a gradient descent algorithm. In general, the gradient descent algorithm is usually slow and can be easily fallen into local minima. To solve these problems, we propose a fast incremental learning algorithm of RAN-LTM, in which its centers are not trained but selected based on output errors. This model does not need so much memory capacity and it also realizes robust incremental learning ability. To verify these characteristics of RAN-LTM, we apply it to two function approximation problems: one-dimensional function approximation and prediction of Mackey-Glass time series. From the experimental results, it is verified that the proposed RAN-LTM can learn fast and accurately without large main memory unless incremental learning is conducted over a long period of time.
24 Sep. 2003, Proceedings of the International Joint Conference on Neural Networks, 1, 102 - 107In reinforcement learning problems, the agent learns what to do so as to maximize numerical rewards. In many cases, the agent learns its proper actions through the estimation of an action-value function. When the agent's states are continuous, the action-value function cannot be represented by a lookup table in general. A solution for this problem is that a neural network is utilized for approximating it. However, when neural networks are trained incrementally, input-output relationships that are trained formerly tend to be collapsed by given new data. This phenomenon is called “interference”. Since the rewards are incrementally given from the environment, the interference could be also serious in reinforcement learning problems. To solve this problem, we propose a memory-based reinforcement learning model that is composed of Resource Allocating Network and memory. The distinctive feature of the proposed model is that it needs quite a small main memory to execute the accurate learning of action-value functions. To examine this feature, the proposed model is applied to the two conventional problems: Random Walk Task and Extended Mountain-Car Task. In these tasks, the learning domains are temporally expanded in order to evaluate the incremental learning ability. In the simulations, we verify that the proposed model can approximate proper action-value functions with quite a small main memory as compared with the conventional approaches.
The Society of Instrument and Control Engineers, 2003, Transactions of the Society of Instrument and Control Engineers, 39 (12), 1129 - 1135, JapaneseWhen the environment is dynamically changed for agents, knowledge acquired from an environment might be useless in the future environments. Therefore, agents should not only acquire new knowledge but also modify or delete old knowledge. However, this modification and deletion are not always efficient in learning. Because the knowledge once acquired in the past can be useful again in the future when the same environment reappears. To learn efficiently in this situation, agents should have memory to store old knowledge. In this paper, we propose an agent architecture that consists of four modules: resource allocating network (RAN), long-term memory (LTM), association buffer (A-Buffer), and environmental change detector (ECD). To evaluate the adaptability in a class of dynamic environments, we apply this model to a simple problem that some target functions to be approximated are changed in turn.
The Institute of Systems, Control and Information Engineers, 2003, Proceedings of the Annual Conference of the Institute of Systems, Control and Information Engineers, 3 (0), 5506 - 5506, English[Refereed]
International conference proceedings
Since the training of support vector machines needs to solve the dual problem with the number of variables equal to the number of training data, the training becomes slow when the number of training data is large. To speed up training the Sequential Minimal Optimization (SMO) technique has been proposed, in which two data are optimized simultaneously. In this paper, we propose to extend SMO so that more than two data are optimized simultaneously. Namely, we select a working set including variables, solve the equality constraint for one variable included in the working set, and substitute it into the objective function. Then we solve the subproblem related to the working set by calculating the inverse of the Hessian matrix. We evaluate our method for the five benchmark data sets and show the speed-up of training over SMO.
THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS (ISCIE), 15 Nov. 2002, Transactions of the Institute of Systems, Control and Information Engineers, 15 (11), 607 - 614, Japanese© 2002 Nanyang Technological University. When neural networks are used for approximating action-values of Reinforcement Learning (RL) agents, the "interference" caused by incremental learning can be serious. To solve this problem, in this paper, a neural network model with incremental learning ability was applied to RL problems. In this model, correctly acquired input-output relations are stored into long-term memory, and the memorized data are effectively recalled in order to suppress the interference. In order to evaluate the incremental learning ability, the proposed model was applied to two problems: Extended Random-Walk Task and Extended Mountain-Car Task. In these tasks, the working space of agents is extended as the learning proceeds. In the simulations, we certified that the proposed model could acquire proper action-values as compared with the following three approaches to the approximation of action-value functions: tile coding, a conventional neural network model and the previously proposed neural network model.
01 Jan. 2002, ICONIP 2002 - Proceedings of the 9th International Conference on Neural Information Processing: Computational Intelligence for the E-Age, 5, 2566 - 2570In this paper, an approach to feature extraction utilizing independent component analysis (ICA) is pro-posed. In our approach, input patterns are transformed into feature vectors using ICA-bases that are obtained through two-layer neural network learning. A k-NN classifier is applied to these ICA feature vectors when the recognition accuracy is evaluated. Hand-written digits in MNIST database are used as target characters. Fast ICA algorithm is applied to these images in order to learn ICA-bases. In recognition experiments, we demonstrate that the ICA approach realizes a potential feature extraction method for hand-written digits. Furthermore, we show the addition of noise patterns to training data is effective for elimination of redundant basis functions.
The Institute of Electrical Engineers of Japan, 2002, IEEJ Transactions on Electronics, Information and Systems, 122 (3), 465 - 470, JapaneseWhen neural networks are trained incrementally, input-output relations that are trained formerly tend to be collapsed by the learning of new data. This phenomenon is often called interference. To suppress the interference efficiently, we propose an incremental learning model, in which Long-Term Memory (LTM) is introduced into Resource Allocating Network (RAN) proposed by Platt. This type of memory is utilized for storing useful training data (called LTM data) that are generated adaptively in the learning phase. When a new training datum is given, the proposed system searches several LTM data that are useful for suppressing the interference. The retrieved LTM data as well as the new training datum are trained simultaneously in RAN. In the simulations, the proposed model is applied to various incremental learning problems to evaluate the function approximation accuracy and the learning speed. From the simulation results, we certify that the proposed model can attain good approximation accuracy with small computation costs.
The Society of Instrument and Control Engineers, 2002, Transactions of the Society of Instrument and Control Engineers, 38 (9), 792 - 799, JapaneseWhen the distribution of given training data is biased and temporally varied, it is well known that the learning of neural networks becomes difficult in general. In Reinforcement Learning (RL) problems, such situations often arise. In this paper, an incremental learning system, which has been devised for supervised learning, is implemented as an RL agent that can acquire an action-value function properly even in the above difficult situations. The proposed RL agent is applied to an extended mountain-car task in which learning domains axe temporally expanded. Through computer simulations, we demonstrate that the proposed agent can acquire a right policy in this task.
I O S PRESS, 2001, KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 69, 22 - 26, English[Refereed]
International conference proceedings
When neural networks are trained incrementally, input-output relationships that are trained formerly tend to be collapsed by the learning of new training data. This phenomenon is called "interference". To suppress the interference, we have proposed an incremental learning system (called RAN-LTM), in which Long-Term Memory (LTM) is introduced into Resource Allocating Network (RAN). Since RAN-LTM needs to train not only new data but also some LTM data to suppress the interference, if many LTM data are retrieved, large computations are required. Therefore, it is important to design appropriate procedures for producing and retrieving LTM data in RAN-LTM. In this paper, these procedures in the previous version of RAN-LTM are improved. In simulations, the improved RAN-LTM is applied to the approximation of a one-dimensional function, and the approximation error and the training speed are evaluated as compared with RAN and the previous RAN-LTM.
01 Jan. 2001, Proceedings of the International Joint Conference on Neural Networks, 3, 1989 - 1994A main problem with dynamical associative memories (DAMs) is that when memory patterns are stored, pseudo-memories (false fixed points and limit cycles) are also generated and they hinder proper association of input patterns. To overcome this problem, Hassoun proposed a heuristic method of reducing pseudo-memories. In this method, DAMs are constructed such that a zero vector called “ground state” as well as stored patterns is stabilized and sparsely activated states (sparse patterns) converge to the ground state. Such dynamical properties of neural networks can be described with linear inequalities, and connection weights of networks are obtained by solving these inequalities using the Ho-Kashyap algorithm. In this paper, we propose an extended Hassoun model in which network dynamics are modified such that dense patterns, mix-ture patterns and inhibition patterns are also converged to the ground state. In simulations, we compare association performance of this extended Hassoun model with conventional associative memory models, and demonstrate the usefulness of our proposed model as a dynamical associative memory.
The Institute of Electrical Engineers of Japan, 2001, IEEJ Transactions on Electronics, Information and Systems, 121 (5), 899 - 905, JapaneseThe detection of gas leakage sound from pipes is important in petroleum refining plants and chemical plants, as often the gas used in these plants are flammable or poisonous. In order to establish the acoustic diagnosis technique for the leakage sound, we examined the application of modular neural networks to the stable detection. The modular neural network has the ability to adapt its structure according to the environment. Experiments were performed for an artificial gas leakage device with various experimental conditions to imitate the change of environment for a long term. The discrimination accuracy with the proposed network was observed to be about 93%. From the results, we confirmed the effectiveness for the application of the modular neural network to the detection of the leakage sound for the practical use.
The Society of Instrument and Control Engineers, 30 Sep. 2000, Transactions of the Society of Instrument and Control Engineers, 36 (9), 797 - 803, JapaneseIn this paper we discuss training of three-layered neural network classifiers by solving inequalities. Namely, first we represent each class by the center of the training data belonging to the class, and determine the set of hyperplanes that separate each class (i.e., each center) into a single region. Then according to whether the center is on the positive or negative side of the hyperplane, we determine the target values of each class for the hidden neurons (i.e., hyperplanes). Since the convergence condition of the neural network classifier is now represented by the two sets of inequalities, we solve the sets successively by the Ho-Kashyap algorithm. We demonstrate the advantage of our method over the backpropagation algorithm using several benchmark data sets.
THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS (ISCIE), 15 Jun. 2000, Transactions of the Institute of Systems, Control and Information Engineers, 13 (6), 276 - 283, JapaneseIn this paper we discuss training of three-layer neural network classifiers by solving inequalities. Namely, first we represent each class by the center of the training data belonging to the class, and determine the set of hyperplanes that separate each class into a single region. Then according to whether the center is on the positive or negative side of the hyperplane, we determine the target values of each class for the hidden neurons. Since the convergence condition of the neural network classifier is now represented by the two sets of inequalities, we solve the sets successively by the Ho-Kashyap algorithm. We demonstrate the advantage of our method over the BP using three benchmark data sets.
IEEE COMPUTER SOC, 2000, IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL III, 3, 555 - 560, EnglishInternational conference proceedings
In this paper, we propose an evolutionary approach to architecture design of modular dynamical neural networks. As one of modular dynamical neural networks, we adopt Cross-Coupled Hopfield Nets (CCHN) in which plural Hopfield networks are coupled to each other. The architecture of CCHN is represented by some structural-parameters such as the number of modules, the numbers of units per module, the module connectivity, and so forth. In the proposed design method, these structural-parameters are treated as phenotype of an individual, and suitable modular architecture is searched through the evolution of its genetic representation (genotype) by using genetic algorithms. Based on a simple direct coding method, the order of length of genetic representation for the structural-parameters can be estimated to be O(N2) where N is the total number of units. On the other hand, the order of genetic representation proposed here is O(N). To verify the usefulness of proposed method, we apply a CCHN to associative memories. Here, the fitness of an individual is defined so as to be larger when a CCHN has a simpler architecture as well as when the association performance is higher. As the result of simulations, we certify that the proposed design method can find high-performance CCHN with simple modular architectures.
The Society of Instrument and Control Engineers, 2000, Transactions of the Society of Instrument and Control Engineers, 36 (3), 298 - 305, JapaneseWe describe what characteristics an independent component analysis can extract from Japanese continuous speech. Speech data was selected from ATR database uttered by a female speaker. The data was recorded at 20kHz sampling frequency and was pre-processed with a whitening filter. The learning algorithm of a network was an information-maximization approach proposed by Bell and Sejnowski. After the learning, most of the basis functions that are columns of a mixing matrix were localized in both time and frequency. Furthermore, we confirmed that there were some basis functions to extract the acoustic feature such as the pitch and the formant of each vowel.
The Society of Instrument and Control Engineers, 2000, Transactions of the Society of Instrument and Control Engineers, 36 (5), 456 - 458, JapaneseIt was reported that a sparse coding algorithm produced a set of basis functions being spatially localized, oriented, and bandpass for natural images. The application of Independent Component Analysis (ICA) to the natural images has shown to be similar results to the sparse coding's result. However, the ICA can be applied in the case of basis function matrices to be non-singular and invertible. There are not such limitations in the sparse coding algorithm. This property allows that the code is overcomplete, that is, the number of code elements is greater than the effective dimensionality of the input space. The purpose of this paper is to examine what characteristics of speech the sparse coding algorithm extracts from natural sounds. Speech data was Japanese five vowels uttered by a female speaker during about 1sec. Most of the basis functions were localized in frequency after the training. Some basis functions only shifted in time and resembled each other. Each basis function was compared with the speech data and the result was that some basis functions responded selectively to each vowel. The frequency analysis for the basis function showed that some basis functions extracted the pitch frequency and the formant of each vowel.
The Institute of Electrical Engineers of Japan, 2000, IEEJ Transactions on Electronics, Information and Systems, 120 (12), 1996 - 2002, JapaneseThis paper presents a continuous-time model of Autoassociative Neural Memories (ANMs) which correspond to a modified version of pseudoinverse-type ANMs. This ANM model is derived from minimizing the energy function for a modular neural network. Through the eigendecomposition of the connection matrix, we show that the dynamical properties of the ANM are qualitatively different in the two state subspaces: a pattern-subspace and a noise-subspace. The proposed ANM has a distinctive feature in the noise-subspace dynamics. The size of basins of attraction can be varied by controlling the contribution of the noise-subspace dynamics to the whole network. The first simulation confirms this attractive feature. In the second simulation, we investigate the performance robustness of the ANM for several kinds of correlated pattern sets. These simulation results confirm the usefulness of the proposed ANM.
Kluwer Academic Publishers, 1999, Neural Processing Letters, 10 (2), 97 - 109, English[Refereed]
Scientific journal
This work contains a proposition of an artificial modular neural network (MNN) in which every module network exchanges input/output information with others simultaneously. It further studies the basic dynamical characteristics of this network through both computer simulations and analytical considerations. A notable feature of this model is that it has generic representation with regard to the number of composed modules, network topologies, and classes of introduced interactions. The information processing of the MNN is described as the minimization of a total-energy function that consists of partial-energy functions for modules and their interactions, and the activity and weight dynamics are derived from the total-energy function under the Lyapunov stability condition. This concept was realized by Cross-Coupled Hopfield Nets (CCHN) that one of the authors proposed. In this paper, in order to investigate the basic dynamical properties of CCHN, we offer a representative model called Cross-Coupled Hopfield Nets with Local And Global Interactions (CCHN-LAGI) to which two distinct classes of interactions - local and global interactions - are introduced. Through a conventional test for associative memories, it is confirmed that our energy-function-based approach gives us proper dynamics of CCHN-LAGI even if the networks have different modularity. We also discuss the contribution of a single interaction and the joint contribution of the two distinct interactions through the eigenvalue analysis of connection matrices.
Springer Verlag, 1998, Biological Cybernetics, 78 (1), 19 - 36, English[Refereed]
Scientific journal
In this paper, we propose a new autoassociative memory model which is derived from Cross-Coupled Hopfield Nets (CCHN). The CCHN is a modular neural network in which plural Hopfield networks are mutually connected via feedforward neural networks. The CCHN's architecture is determined by the following structural parameters : the number of modules, the numbers of units in the modules, the contribution of the module information processings and the interactions to the whole network information processing, and the module connectivity. If these parameters are changed, the network dynamics are also changed; therefore, it may be possible to implement a great number of autoassociative memories with different nature. Through some computer simulations, we will discuss a diversity of association properties in the proposed model.
THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS (ISCIE), 15 Dec. 1997, Transactions of the Institute of Systems, Control and Information Engineers, 10 (12), 668 - 678, JapaneseIn this paper, the association characteristics of Cross-Coupled Hopfield Nets (CCHN) proposed as a modular neural network model are discussed in an analytical way. In the CCHN, an arbitrary number of modules (Hopfield networks) can be mutually connected via feedforward networks called “internetworks”, whose outputs generate the interactions among module networks. To evaluate the CCHN as a modular neural network, it has been applied to associative memories so far. Although its excellent association performance is supported by many simulation results, it is still difficult to compute the memory capacity exactly and examine the dynamical properties rigorously, because the information processing of the CCHN includes strong nonlinearity. Hence, as the first step to the analytical approach, this paper focuses on a 1-module CCHN whose interaction is realized by a two-layered feedforward internetwork. In this case, the connection matrix of the CCHN degenerates into a single square-matrix like a conventional auto-association type of associative memory. Through the eigenvalue analysis for the connection matrix, we reveal that the essential differences between the association characteristics of the CCHN and a conventional auto-correlation associative memory originate from the dynamics in the noise-space which is the orthogonal complement of the subspace generated from memory patterns.
The Institute of Electrical Engineers of Japan, 1997, IEEJ Transactions on Electronics, Information and Systems, 117 (9), 1253 - 1258, Japanese本論文では,モジュール構造をもつニューラルネット(モジュール化ニューラルネット)のモデル化手法として,情報処理様式をエネルギー関数で記述する方法を採用する.この一モデルとして,モジュールネットの情報処理とモジュールネット間の相互作用に対するエネルギー関数を線形に加算し,更にモジュールネットの状態間に多対多の写像関係がある場合でも適用可能としたCross-Coupled Nets With Many-to-Many Mapping Internetworks(CCHN-MMMI)を提案する.また,CCHN-MMMIのネットワークダイナミックスを導出し,モジュール数が2であるCCHN-MMMIの想起特性とその連想記億能力をシミュレーション実験により調べる.シミューレーション実験では,モジュール構造が明示的に与えられることによる効果を,その想起過程から従来の自己相関型連想記億モデルとの比較により考察する.次に,文字パターン対の連想を例にとり,モジュールネットの状態間に多対多の写像関係がある場合でも正しく動作することを確認する.また,基本記憶をランダムに選んだとき,その数の増加に伴う想起ダイナミックスの劣化を定量的に調べ,連想記憶モデルとしての評価を行う.その結果,さまざまな基本記憶に対し,CCHN-MMMIはモジュールネットの状態間に多対多の写像関係がある場合でも正しく動作し,その相互作用は偽記憶の想起を妨げるよう機能することがわかった.特に,モジュールネット間の写像関係を多層ネットワークで学習するCCHN-MMMIでは,自己相関型連想記憶モデルに比べ大幅に連想能力が改善され,モジュールネット間に非線形な相互作用をもつことの効果が確認された.
The Institute of Electronics, Information and Communication Engineers, Jun. 1994, The Transactions of the Institute of Electronics,Information and Communication Engineers., 77 (6), 1135 - 1145, Japanese[Refereed]
[Invited]
Book review
Others
Others
Others
Others
Others
Others
Others
Others
Others
Scholarly book
Scholarly book
Scholarly book
Scholarly book
Scholarly book
Scholarly book
Scholarly book
Scholarly book
Scholarly book
[Invited]
Public discourse
[Invited]
Public discourse
[Invited]
Nominated symposium
Public symposium
[Invited]
Keynote oral presentation
[Invited]
Public discourse
Oral presentation
[Invited]
Keynote oral presentation
Oral presentation
[Invited]
Public discourse
[Invited]
Public discourse
[Invited]
Public discourse
Oral presentation
Oral presentation
Oral presentation
Oral presentation
[Invited]
Keynote oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
[Invited]
Invited oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
Oral presentation
Oral presentation
Oral presentation
[Invited]
Public discourse
[Invited]
Invited oral presentation
[Invited]
Nominated symposium
[Invited]
Public discourse
[Invited]
Public discourse
Oral presentation
Oral presentation
[Invited]
Invited oral presentation
Concerns about privacy of data prevent from making good use of a huge amount of data. Data analysis while preserving privacy is a very important task. In this research, we propose a Privacy-Preserving Machine Learning that can efficiently compute inner product in a three-layered neural network using Ring-LWE-based Homomorphic Encryption. We propose a two-party model consisting of client and server: the former encrypts input data and receives a classification result from a server and the latter performs predicting process over the encrypted data using a trained classification model. This enables that the client acquires the inference result without revealing the privacy of their data and the server protects their model from exposing it. The proposed method costs 10.549 [ms] per one class for prediction process and performed keeping its accuracy close to the case of sigmoid and ReLU.
Oral presentation
Investment trust and fund management companies have accumulated a large number of visit records that were summarized by their analysts after conducting hearings against companies. Such visit reports include crucial information of companies such as companies' financial conditions and future strategies, which are used to estimate investment values of individual companies. However, it is not easy even for skilled fund managers to derive suitable market outlooks and investment decisions from a huge amount of accumulated documents. In this research, to support investment decisions, we propose a new LSTM model with self-attention mechanism that can extract important sentences in analyst visit reports. Such extraction is conducted based on the sentence scoring, which is obtained as the weights in a self-attention mechanism. In our experiments for a set of 1,390 visit reports, we demonstrate that the proposed model has about 79% accuracy for extraction on average under the 5-fold cross-validation.
Oral presentation
To decide valuable companies to be invested, investment trust and fund management companies, which manage funds deposited from investors, have collected information on company's budget status and plans. However, the number of visit reports are usually too large even for skilled fund managers to easily derive reliable business outlooks and investment decisions. In this research, to alleviate fund managers' and analysts' commitment for the investigation and analysis, we propose a machine learning system that can support them to make accurate predictions on business outlook from collected visit reports. We attempt to predict business confidence for specific companies and industries using CNN that is expected to have good readability and robustness for polarity perturbation. As a result, we obtain 81.4% in classification accuracy for analysts' reports provided by the Sumitomo Mitsui DS Asset Management Company, Limited. It has 5.7% better accuracy than the best baseline model using Word2Vec and SVM.
Poster presentation
[Invited]
Nominated symposium
Oral presentation
Oral presentation
Oral presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
[Invited]
Keynote oral presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
Poster presentation
Poster presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
Oral presentation
[Invited]
Invited oral presentation
[Invited]
Invited oral presentation
Poster presentation
[Invited]
Public discourse
[Invited]
Public discourse
[Invited]
Public discourse
[Invited]
Public discourse
Oral presentation
[Invited]
Public discourse
Public discourse
Public discourse
Oral presentation
Oral presentation
Oral presentation
[Invited]
Public discourse
Public discourse
Public discourse
[Invited]
Public discourse
Oral presentation
[Invited]
Nominated symposium
[Invited]
Public discourse
Oral presentation
[Invited]
Nominated symposium
Oral presentation
Oral presentation
[Invited]
Nominated symposium
[Invited]
Nominated symposium
Oral presentation
[Invited]
Public discourse
Oral presentation
Oral presentation
[Invited]
Public discourse
Public discourse
Oral presentation
Oral presentation
Oral presentation
Public symposium
Public symposium
Public discourse
Oral presentation
Others
Oral presentation
Oral presentation
[Invited]
Nominated symposium
Oral presentation
[Invited]
Invited oral presentation
Oral presentation
[Invited]
Invited oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Poster presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Others
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Oral presentation
Information Processing Society of Japan
Jun. 2020 - PresentACM
Jan. 2018 - PresentAsia Pacific Neural Network Society (APNNS)
Jan. 2016 - PresentInternational Neural Network Society (INNS)
Mar. 2015 - PresentIEEE
Jan. 2001 - PresentThe Japanese Society for Artificial Intelligence
Japanese Neural Network Society
システム制御情報学会
電子情報通信学会
計測自動制御学会
課題1.「セキュアなクラウド・エッジコンピューティングに関する研究」の子課題「準同型計算と大小比較の融合」に取り組み、プライバシー保護データマイニングに活用されるセキュアな大小比較アプローチについて研究し、効率性、安全性、及び柔軟性を向上させるために、従来研究の最も効率がよいセキュアな大小比較アプローチSK17を改良した3つの方式を提案した。その中で、Efficiency-enhanced提案方式は既存方式SK17より50%程度で効率化を実現した;Security-enhanced提案方式はデータ所有者とクラウドサーバの間Oneランド通信(非対話型)で暗号化したまま大小比較の結果計算でき、サーバからデータを完全に守るより高い安全性を実現した。成果は国際会議The 23rd International Conference on Network-Based Information Systems(NBiS2020)発表した。 課題2.「プライバシー保護しつつ直・並列学習メカニズムの設計」の子課題「同・異業種データを柔軟に処理可能な直・並列学習メカニズムの提案」に取り組み、プライバシー保護決定木推測の効率化アプローチを提案した。提案手法は同業種か異業種かにも関わらず、適用可能であるので、汎用性がある。また、決定木の各ノードで分岐する時、クラウドサーバ経由で特徴値と閾値の大小比較を計算しなければいけないので、上記のEfficiency-enhancedセキュアな大小比較提案方式を利用した。成果は論文誌投稿中。
Competitive research funding
In this project, we have proposed several online learning algorithms to continuously perform the detection, classification, and visualization of cyberattacks by analyzing communication packets observed by a large-scale darknet (i.e., unused IP address range) sensor, while following the ever-evolving cyberattacks. In addition, we have developed three types of adaptive attack-monitoring systems. The first is a DDoS backscatter monitoring system, which applies communication traffic features in combination with support vector machines and deep neural networks to achieve detection accuracy of 97% or more and high-speed learning characteristics. Moreover, we have developed a new type of cyberattack monitoring systems that can detect unknown cyber-threats and monitor changing behaviors of malware by association rule mining and the representation learning of port-number embedding.
Competitive research funding
Competitive research funding
In order to protect network uses from malicious spam mail attacks that can lead to malware infections and to conduct a large-scale monitoring of malicious activities by malwares, we developed three types of learning systems introducing machine learning techniques. First, we developed a malicious spam mail detection system with the following three sophisticated functions: an automatic mechanism to collect suspected malicious spam mails, an automatic labelling (malicious or benign) function for collected spam mails by a crawler-type of web security analyzer, and online learning function for automatically collected training data. Second, we developed a large-scale monitoring system which can observe transitions of subnet infection states by allocating the most similar typical patters obtained by performing the hierarchical clustering for darknet traffic features. Finally, we developed a large-scale monitoring system which can detect DDoS backscatter from observed darknet traffic features.
Competitive research funding
Competitive research funding
Competitive research funding
In the environments where multiple pattern recognition tasks with some relatedness are learned sequentially, it is known that the learning is conducted efficiently even with a small number of training data by using "knowledge transfer" from one task to another. In the research project, we developed a multitask learning algorithm with an efficient knowledge transfer mechanism where a useful feature space is learned incrementally in an efficient way by transferring a part of previous learned knowledge to an unknown task. The proposed multitask learning algorithm is implemented as a person identification system using face images and the effectiveness of this system is verified.
Competitive research funding
Competitive research funding
a) KDA (Kernel Discriminant Analysis)を特徴選択の基準として特徴選択する方式を開発した. b) 特徴空間上のKDA に基づいてパターン認識する方式を開発した.またファジィ識別器の可視化のプリミティブな方式を開発した. c) カーネルファジィ識別器のメンバーシップ関数をSVM のマージン最大化の概念によりチューニングする方式を開発した. d) 相関のある複数のパターン認識問題が逐次的に与えられるマルチタスク学習問題に対し,少ない訓練データで高い汎化能力が得られるマルチプルクラシファイアシステムを開発した.
Competitive research funding
This research project developed a new learning algorithm for the multi-task pattern recognition problem. This project considers learning multiple classification tasks online where no information is ever provided about the task category of a training example. The algorithm thus needs an automated task recognition capability to properly learn the different classification tasks. The learning mode is “online" where training examples for different tasks are mixed in a random fashion and given sequentially one after another. It is assumed that the classification tasks are related to each other and that their training examples appear in random sequences during “online training." Thus, the learning algorithm has to continually switch from learning one task to another whenever the training examples change to a different task. This also implies that the learning algorithm has to detect task changes automatically and fast and utilize knowledge of previous tasks to learn new tasks. Overall, automated task recognition falls in the category of unsupervised learning since no information about task categories of training examples is provided to the algorithm. The performance of the algorithm is evaluated using several artificially generated and three UCI datasets. The experiments in this project verify that the proposed algorithm can indeed acquire and accumulate task knowledge and that the transfer of knowledge from tasks already learned enhances the speed of knowledge acquisition on new tasks.
Competitive research funding
We conducted research on knowledge acquisition and system development and got the following results : 1. Knowledge acquisition and system development under steady state environments (1) We have developed a clustering method using support vector machines (SVMs) by dividing the image data in segments and feature selection method using SVM ensembles. (2) We have developed an incremental training method that keeps only support vector candidates and demonstrated that the generalization ability was maintained while deleting training data. 2. Knowledge acquisition by data mining and system development under dynamic environments (1) Dynamic feature space learning : we have developed incremental learning algorithms of Principal Component Analysis (PCA) and Kernel PCA (KPCA), and demonstrated that the feature selection was successfully carried out by adapting to the variation of data distributions. (2) System development : We proposed a method to integrate the developed dynamic feature selection algorithm into a classifier model, in which the k-nearest neighbor method was combined with a dynamic clustering algorithm, and a neural network model. We demonstrated that the proposed method enabled the classifier to conduct stable incremental learning, and that the developed system had excellent performance for not only bench-mark datasets but also facial recognition datasets. Although we tried to develop an incremental SVM system, it has not been completed yet. This is reserved as our future work. 3. Development of image segmentation system by data mining (1)Development of clustering method for image segmentation : we have developed a basic system to detect and classify the uncertain object imagery and demonstrated that the detection and classification of crystal imagery was properly carried out using image database of protein crystallizations. (2) Development of feature extraction method for image segmentation : we have developed a feature extraction method based on the multiresolution spectral histograms by wavelet transformation and demonstrated that the proposed feature is available for the image retrieval.
追加学習可能なパターン認識システムの開発に必要不可欠な学習アルゴリズムを考案した.成果の概要を以下にまとめる. (1)特徴空間の追加学習として,従来のIncremental Principal Component Analysis (IPCA)の改良を行った.具体的には,特徴空間の次元増加の判定基準として,寄与率による方法を提案し,その更新式を求めた. (2)従来のIPCAは,1つデータが与えられるたびに固有値問題を解く必要があった.これに対し,複数のデータをまとめて1回の更新で新しい固有基底を求める学習アルゴリズム(Chunk IPCA)を提案した. (3)改良IPCAアルゴリズムおよびCIPCAアルゴリズムを顔画像認識に適用し,追加学習が進むにつれて,認識精度が高まることとFalse Positive Rateが小さくなることを確認した.また,CIPCAを導入することによって,学習時間が大幅に短縮されることを確認した. (4)特徴空間の更新に伴い,識別機(ニューラルネット)の更新も同時に行う必要があり,結合荷重の更新だけでなく,入力変数の個数の変動にも追従できなければいけない.この問題に対し,長期記憶を導入したニューラルネットの記憶アイテムを特徴空間に合わせて更新し,それらを訓練データと一緒に学習するアルゴリズムを開発した. (5)従来の独立成分分析(Independent Component Analysis ; ICA)を教師あり学習に拡張する方式を提案し,独立性とクラス分離性を同時に高める学習アルゴリズムを導出した.また,いくつかのベンチマークデータで性能評価を行い,従来のICAやPCAで求めた特徴量に比べて,性能がよいデータもあることを確認した.
This research was executed for the motor unit decomposition and making the activity visible from the surface electromyogram as follows. A)Improvement of motor unit decomposition technique We proposed a new motor unit decomposition technique with overcomplete bases to introduce the statistical model. It was confirmed that this technique was applied to the real measurement surface EMG signals, and it had the decomposition performance equal with blind deconvolution. Moreover, it was confirmed that each motor unit activity might be separable even if the number of observation channels were less than the number of active motor units. B)Examination of effectiveness of recognition technique using statistical model We proposed the recognition system using both KDA and boosting mechanism and confirmed that it's recognition performance was just like an existing technique such as SVM. In general, to maximize generalization performance, parameter tuning process such as cross validation, whose computation cost is very expensive, is needed. To solve this problem, we proposed new index value for parameter selection, and with this index, appropriate parameters can be selected without so expensive computation cost. C)The three dimensional position estimation of each decomposed motor unit The 3D position of the depolarization of individual motor unit was estimated by using the 3D finite element method from the potential distribution on the skin surface which was estimated with motor unit decomposition technique from surface EMG signals. As a result, it was confirmed that 3D position and dynamics of dopolarization of individual motor unit might be estimated. Moreover, the size of innervation zone and temporal dynamics of current intensity of depolarization could be estimated. In addition, this study had been started in 2003 fiscal year under Dr.Kotani head researcher, Assistant Professor of Kobe University. However he died in May, 2004, so the head was changed suddenly. I write down that contribution of Dr.Kotani covers the whole of the above results.
We have developed multiclass support vector machines and applied them to diagnosis problems and image processing. The major results of the project are as follows : 1. Development of multiclass support vector machines ・We have developed fuzzy support vector machines that resolve unclassifiable regions in multiclass problems. ・We have developed optimal ordering of decision-tree and pairwise support vector machines to improve the generalization ability. 2. Development of fast training methods ・We have developed steepest ascent methods for pattern classification and function approximation, in which more than two data are processed at a time. 3. Evaluation for medium to large sized data sets ・We confirmed that our methods improve the generalization ability and speed up training for large sized data sets. 4. Application to diagnosis ・We have examined the feature extraction based on independent component analysis (ICA) to enhance the discrimination performance of support vector machines. We confirmed that ICA could extract the effective features from the gas leakage sound in pipes, digit patterns, and the various benchmark datasets. ・We have developed the evolutionary feature extraction using margin maximization method. 5. Application to diagnosis ・We have developed the multi-resolution feature extraction method by using 2-dimensional wavelet decomposition for the inputs of support vector machines. ・We have developed the feature extraction method as a preparation for classifying the state of protein crystals by using support vector machines.
ダイナミックに変化する環境の下では,過去に得た知識が常に有効であるとは限らず,環境に適応するため絶えず修正を要求される.しかし,同じ環境が将来において再び現れるようなケースでは,過去に獲得したすべての知識を修正するのは必ずしも効率的とはいえない.つまり,ある時点で通用しなくなった知識であっても,長期的な記憶として保存し,その知識が有用となる環境が再び現れたときに想起・利用できるようなメカニズムをもつことが望ましい.また,学習期間に終わりのないlife-long学習としての性質をもつには,知識を効率よくメモリに蓄積できなければならない. 本研究では,上記のような機能を有するニューラルネットモデルを提案した.このモデルは,(1)入出力関係を学習するニューラルネット部,(2)ニューラルネット部で抽出された知識を蓄える連想バッファ,(3)連想バッファにある知識のうち必要なものを長期的に保持するための長期記憶,(4)環境の変動を検知する検知部の4つのモジュールで構成される. 平成14年度において提案したモデルでは,ロバストな環境変動の検知を行うためのメカニズムと高速な環境への適用を実現するための連想メカニズムを開発し,この機能を実装した.複数の異なる一次元関数が順次移り変わっていく単純な動的環境の下で提案モデルの適応能力をシミュレーション実験で調べた.その結果,提案したモデルは環境変動を正確に検知し,過去に経験した環境の知識を活かして,高速に環境に追従できることを確認した.また,動的環境の下であっても追加学習を安定に行えることを示した. 平成15年度では,移り変わっていく個々の環境に特有の知識と不変な知識を区別して,共有知識を抽出・利用する知識移転のメカニズムを付加した.シミュレーション実験を通して,この知識転移の機能が正しく機能し,さらに高速な環境適応が可能となることを確認した.
We have developed the acoustic diagnosis system which is capable of adapting to the dynamic environment. The major results of the project as follows : 1. Development diagnostic networks We proposed a novel model of modular neural networks which has the ability to adapt its structure according to the environment. Experiments were performed for an artificial gas leakage device with various experimental conditions to imitate the change of environment for a long term. The discrimination accuracy with the proposed network was observed to be about 93%. Result shows that the proposed model is effective for detection of the leakage sound for the practical use. 2. Independent component analysis and evolutionary computation as feature extraction We have examined the feature extraction for acoustic signal using independent component analysis and genetic algorithms to obtain the stable diagnosis. We proposed a novel recognition method using features extracted by ICA. The proposed method consists of some modules for each category and a synthesizer. We evaluate the performance of the proposed method for several recognition tasks including acoustic diagnosis. From these results, we confirmed the effectiveness of the recognition method using independent components for each class. The effectiveness of the proposed method were also confirmed for biological instruments.
We examine an application of independent component analysis (ICA) to feature extraction of signal processing such as digit patterns and acoustic signals. In order to evaluate the effectiveness of independent components as features, we compare discrimination accuracy using independent components with those using principal components. Furthermore, we apply the ICA to biological signal processing. We obtain the following results : 1. Acoustic diagnosis In order to detect the leakage from pipesaccurately, we should select a feature extraction method for sounds properly. The purpose of this research is to examine whether independent component analysis (ICA) is useful as a feature extraction method for acoustic signals. We confirm that the feature extraction using the ICA algorithm is very useful for detecting gas leakage sounds. 2. Digit recognition We propose a novel recognition method using features extracted by ICA. The proposed method consists of some modules for each category and a synthesizer. We evaluate the performance of the proposed method for several recognition tasks. From these results, we confirm the effectiveness of the recognition method using independent components for each class. 3. Deconvolution for EMG We apply a multichannel blind deconvolution method based on ICA to surface EMG signals. We obtained a few components of which firing patterns is similar to motor units.
The detection of gas leakage sound from pipes is important in petroleum refining plants and chemical plants, as often the gas used in these plants are flammable or poisonous. In order to establish the acoustic diagnosis technique for the leakage sound, we examined the application of modular neural networks to the stable detection. The modular neural network has the ability to adapt its structure according to the environment. Experiments were performed for an artificial gas leakage device with various experimental conditions to imitate the change of environment for a long term. We applied Fast Fourier Transform(FFT) as the pre-processing method and examine features of power spectrum for the gas leakage sound. The feature is that the power spectrum for the gas leakage sound are more than those for the normal sound within the range from about 5kHz to 20kHz. The discrimination accuracy with the proposed network was observed to be about 93%. From the results, we confirmed the effectiveness for the application of the modular neural network to the detection of the leakage sound for the practical use. Furthermore, we have developed the handy system based on the diagnostic technique.
構造物の健全度を非破壊で調べるヘルスモニタリングの基本技術として,ウェーブレット解析と階層型ニューラルネットのハイブリッド化手法を提案した。ウェーブレット解析は観測信号(構造物の加速度や速度情報)から正常/異常の判断に用いる特徴量の抽出に利用され,ニューラルネットは構造物の劣化を推定するのに用いられる。構造物を3自由度ダイナミカルシステムで近似し,観測信号から解析的に解が求まらない設定のもとで,バネ定数および減衰定数の劣化の程度を推定する問題に提案したハイブリッド手法を適用した。計算機シミュレーションによって推定精度を調べた結果,バネ定数はある程度の精度で推定可能であったが,減衰定数については十分な推定精度が得られなかった.ここで用いたニューラルネットはモジュール構造をもたないものであるが,この代わりに創発的に機能獲得が可能なモジュール構造ニューラルネットを適用することで,前述の問題を改善できる可能性がある.そこで,まずニューラルネットの基本特性を調べる際によく取り上げられる連想記憶の問題を取り上げ,モジュール構造ニューラルネットの有効性を検討した.モジュール構造を決定するパラメータの探索に遺伝的アルゴリズムを用い,所望の特性に適合したモジュール構造(各モジュールは異なる機能をもつ)が自動的に決定されるニューラルシステムを開発した。シミュレーション実験の結果,試行錯誤では発見の難い高性能なモジュール構造の探索が,本システムで可能になることを確認した.最終的には,これをヘルスモニタリングに適用して性能が改善されることを確認する必要がある.しかし,本研究期間内では残念ながらこれを達成できなかった.今後の研究課題としたい.また,特徴抽出手法として,ウェーブレット解析の代わりに最近ブラインド信号分離問題で盛んに研究されている独立成分分析の適用も検討したことを付記しておく.
モジュール型神経回路網モデルとして、対称結合をもつ相互結合型神経回路網を複数個結合したものを取り上げた。このようなモデルにおける各モジュールの活性ダイナミクスとモジュール間の結合ダイナミクスは、ネットワーク全体のエネルギー関数を定義することで与えらる。 本年度の研究は、このようなモデルの連想記憶能力をシミュレーション実験で定量的に評価することを目的とした。記銘するパターンとしては、各ビットが2値(±1)であるランダムベクトルを用い、総ニューロン数に対するパターン数(記憶率:r)で評価した。その結果、モジュール構造をもたない従来のホップフィールドネットに比べ、モジュール数が多すぎなければ、飛躍的に連想能力が向上することがわかった。例えば、総ニューロン数が500、モジュール数が2のとき、r=0.5で限界方向余弦d_C≒0.5、r=1.0でd_C≒0.7となった(ホップフィールドネットの場合、r≒0.2でd_C≒1.0となる)。さらに、与えられた問題に対して最適なモジュール数が存在し、それを越えると急速に連想能力が劣化する現象が見られた。この原因についての詳しい考察は、今後の課題とする。 また、ここで用いたモジュール型神経回路網には、モジュールの状態間に多対多の関係がある場合でもうまく動作するような機能を付加している。これは、2つのモジュール間の相互作用を決定する階層型ネットワーク(インターネット)に両モジュールの状態を入力することで実現している。本研究では、この機能がうまく動作することも、多対多関係をもつ文字パターン対を使った実験により確認している。
This research project has been carried out over two years (1990-1991). A major aim of this research project is to develop an automatic seal imprint verification system which is adoptable to seal imprints with various qualities. We were able to achieve our aim of this research project. The summary of results obtained by the research project is as follows : 1. We developed the quality identification method of seal imprint on the basis of characteristics of its gray level histogram. Through the quality identification experiment using actual seal imprints with various qualities, it is confirmed that the quality identification result by this method coincides with the result by document examiners. 2. We developed the new method for automatic seal imprint verification by adding the imprint quality identification process to our previous verification method. Each partial region of an examined seal imprint is identified first, and then only good quality partial regions of an examined imprint are verified with registered one. A major advantage of this method is that this method can verify seal imprints with various flaws. 3. On the basis of above result, automatic seal imprint verification system for practical applications was realized by using a small hardware system which consists of a personal computer and an image processing unit. Experimental results show that this system might be feasible for practical applications.