Home研究成果
 

過去のニュース & イベント

一覧

2007.03.19 "Interactive Glyph Analysis with R by Alexander Gribov" (Augsburg University, Germany)を開催しました。

2007.02.15~2007.02.16 DSC2007 (Directions in Statistical Computing)において「Jaspによる並列環境の構築」について発表しました。

2006.12.22 "Tutorial of GAP by Han-Ming Wu (Institute of Statistical Science, Academia Sinica)を開催しました。

2006.12.11 "GotoBLAS チュートリアル" by 後藤和茂 (Univ. of Texas, US)を開催しました。

2006.12.08~2006.12.09 共同研究「Rの整備と利用」研究会を開催しました。

2006.12.07 "Visualization for data mining" by Brian Ripley (Univ. of Oxford, UK)を開催しました。

2006.12.01 Tutorial of Bioconductorを開催しました。

2006.11.30 Tutorial of GGobi : Interactive and dynamic data visualization systemを開催しました。

2006.11.11~2006.11.17 スーパーコンピューティングカンファレンス SC2006にブースを展示しました。

2006.06.15~2006.06.17 useR! The R User Conference 2006において 「Web Decomp and E-Decomp - Time Series Analysis using R」を発表しました。

2006.05.26~2006.05.27 韓国統計学会春季大会において「Java library for Interactive Statistical Graphs」を発表しました。

"Interactive Glyph Analysis with R by Alexander Gribov" (Augsburg University, Germany)

日時: 2007年3月19日(月) 10:30 - 11:30 (12:00 まで延長可)

場所:統計数理研究所 新館2階 研修室

Abstract:

GAUGUIN ( Grouping And Using Glyphs Uncovering Individual Nuances ) is a project for the interactive visual exploration of multivariate data sets, developed for use on all major platforms (Windows, Linux, Mac). It supports a variety of methods for displaying flat-form data and hierarchically clustered data.

Glyphs are defined as geometric shapes scaled by the values of multivariate data. Each glyph represents usually one high-dimensional data point (or average of data points). GAUGUIN offers four different glyph shapes.

The number of data elements which can be displayed simultaneously is limited, because each glyph requirees a minimum amount of screen space to be viewed, but hierarchical glyphs can be drawn for groups of cases. Hierarchical glyphs are composed of a highlighted case representing the group and a band around it showing the variability of all the members of the cluster. GAUGUIN also provides a variety of cluster analysis tools via Rserve and is also able to use R to calculate MDS views for the data. All GAUGUIN displays are linked interactively and can be directly queried.

資料 (PDF)

"Tutorial of GAP by Han-Ming Wu (Institute of Statistical Science, Academia Sinica)

日時: 2006年12月22日(金) 9:30 - 11:30 (12:00 まで延長可)

場所:統計数理研究所 第2セミナー室

資料 (PDF)

"GotoBLAS チュートリアル" by 後藤和茂 (Univ. of Texas, US)

日時: 2006年12月11日(月) 10:30 - 11:30 (12:00 まで延長可)

場所:統計数理研究所 新館2階 研修室

Abstract:

GotoBLAS は複数のアーキテクチャをサポートする最適化 BLAS の実装の 1 つ であるが、GotoBLAS 上で施されている種々の最適化の手法ついてはあまり知 られていない。そこで、本講演では、最適化を行う上での基本的な考え方を解 説するとともに、BLAS の Level 1 ~ Level 3 の特性及びこれらの関数を呼 び出す場合の注意点等に関して議論を行う。

資料 (PDF)

統計数理研究所共同研究「Rの整備と利用」研究会

日時:2006年12月8日(金) 9:30 - 17:00

 2006年12月9日(金) 9:30 - 16:20

場所:統計数理研究所 講堂

Program:12月8日

09:30 - 10:30

Brian Ripley (University of Oxford, UK)

Title: Software for Statistical Developments

10:45 - 11:45

Kazushige Goto (The University of Texas, US)

Title: Various optimization and performance tips for processors

13:30 -14:30

Stefano Iacus (Universita degli Studi di Milano, Italy)

Two R specific optimization techniques for speed and data

14:45 - 15:45

Sungwoo Park (Pohang University of Science and Technology, Korea)

Title: A critique of R from the perspective of programming language theory

16:00 - 17:00

Junji Nakano (ISM, Japan) and Ei-ji Nakama(COM-ONE Inc., Japan)

Title: R on super-computers at ISM

( 18:00 - Dinner)

Abstracts & 講演資料

会場の模様

Program:12月9日

09:30 - 10:15

舟尾暢男 武田薬品工業(株)医薬開発本部 日本開発センター 統計解析部 統計グループ

「Rでデータハンドリング ~ データフレーム 30 分クッキング~」

資料 (PDF)

10:15 - 11:00

久保拓弥 北海道大学・地球環境科学研究院・環境生物科学部門 陸域生態学分野

「MCMC 計算まわりでさまよう R ユーザー」

投影資料&実験プログラム

11:00 - 11:45

鈴木了太 (株) ef-prime

「Rで起業!フリーソフトウェアとデータ分析ビジネスの現在」

資料 (PDF)

(11:45 - 13:00 昼休み)

13:00 - 13:45

小笠原理(国立遺伝学研究所 生命情報・DDBJ研究センター)

服部恵美((株)情報数理研究所)

三十尾潔高((株)情報数理研究所)情報数理研究所

「R graphical manualsの開発と今後の展開」

資料 (PDF)

13:45 - 14:30

牧山文彦 特定医療法人敬愛会 ちばなクリニック 健康管理センター

「GoogleEarth とR言語」

資料 (PDF)

(14:30 - 14:50 休憩)

14:50 - 15:35

谷村晋 長崎大学 熱帯医学研究所 社会環境分野

「地理空間分析のためのベクタデータモデル共通基盤 - sp パッケージのクラスとメソッド-」

資料 (PDF)

15:35 - 16:20

間瀬茂 東京工業大学 情報理工学研究科 数理・計算科学専攻

「R 紹介のためのオープンソース原稿」

Abstracts of talks on 8 December:

Title: Software for Statistical Developments

Speaker: Brian Ripley

To make statistical methodology available to end-users, especially to those outside statistics, we need to make available software to implement the method. That software should ideally be readily available, easy to use, flexible (as R is) and work correctly (and R has a great record for rapidly fixing bugs).

The talk will discuss the process of moving new statistical methodology from ideas to practical implementation(s) (with R providing a near-idea vehicle for a reference implementation), illustrated by some case studies of state-of-art statistical analyses made possible by having code in S or R.

資料 (PDF)

Title: Various optimization and performance tips for processors

Speaker: Kazushige Goto

Most developers don't like doing optimization, because they believe it's a compiler's job, or optimization makes their code dirty. Unfortunately they confused algorithmic optimization and instruction optimization. Actually you don't have to do instruction optimizations which make your code dirty. Instead you need to think good algorithm to avoid various traps that CPU hates.

This talk will discuss the key to perform your application better and how to avoid wrong coding.

資料 (PDF)

Title: Two R specific optimization techniques for speed and data management

Speaker: Stefano Iacus

There are two aspects of R programming that can make the implementation of some algorithm non efficient. One is related to the interactive "function based" OOP design of the R language (in contrast to "class based" OOP languages like Java, etc.) and the other one is related to the fact that R normally stores its objects in memory (although something new is on the horizon). In Monte Carlo analysis the natural need is to iterate some procedure which means make use a ``for'' loop. Because of F-OOP this might be too inefficient with respect to speed. On the other hand, updating big objects (data frames, distance matrixes, DNA sequences data) stored in memory might also become inefficient under some circumstances for the reason of speed and memory management. With the help of two concrete example packages we present two natural efficient ways to approach the above mentioned situations which may be of general interest to R (advanced) users.

資料 (PDF)

Title: A critique of R from the perspective of programming language theory

Speaker: Sungwoo Park

R is a programming system which provides rich support for statistical computing and high level graphics. Despite its popularity in the statistics community, however, R, as a programming language, has quite a few flaws in its design. For example, R allows the lazy evaluation strategy in the presence of computational effects, such as assignments, vector updates, and graphic outputs, with no provisions for the resultant strange semantics. In fact, combining the lazy evaluation strategy with computational effects in an unobtrusive way has been one of the key research problems in the programming language community for many years. The definition of R is also far from acceptable from the viewpoint of programming language theory. For example, specific implementation strategies are taken as part of the definition, while part of the definition is delegated to implementation strategies.

In this talk, we give a critique of R from the perspective of programming language theory. We first criticize negative aspects of R and then highlight positive aspects of R. We analyze R as a programming language rather than as a programming system, thereby focusing on its semantic elements rather than its statistics/graphics library. As an alternative linguistic framework to R, we propose functional languages, which provide all those features provided by R and also come with formal semantics.

資料 (PDF)

Title: R on super-computers at ISM

Speakers: Junji Nakano and Ei-ji Nakama

The institute of statistical mathematics (ISM) is a (national) research institute for statistical sciences and provide several super- computer systems for statistical research by Japanese statisticians. We use R on these super-computers and have made several parallel computing functions available on it. These functions include parallel BLAS replacements such as ATLAS and GOTO linrary, and snow package for implementing parallel R functions using MPI and other distributed computing techniques. We also implement a Web environment to use parallel R easily.

資料 (PDF)

"Visualization for data mining" by Brian Ripley (Univ. of Oxford, UK)

日時:2006年12月7日(木) 10:30 - 11:30 (12:00 まで延長可)

場所:統計数理研究所 新館2階 研修室

Abstract:

The talk will start with an introduction to data mining, and the types of data and questions that occur in that field. It will then consider ways to visualize such high-dimensional datasets via for example projection pursuit and multidimensional scaling, and show some examples from real-world consulting problems using R and GGobi.

Tutorial of Bioconductor

日時:2006年12月1日 13:30-17:00

場所:統計数理研究所 新館2階 研修室

Speaker: Eun-kyung Lee (Seoul National University, Korea)

Program:

13:30 - 15:00 Part 1 : Overview of the Bioconductor project

15:30 - 17:00 Part 2 : Statistical analysis using Bioconductor

Abstract:

Introduction to Bioconductor Bioconductor is an open-source and open-development software project for the analysis of biomedical and genomic data. This project repository has a lot of packages for the analysis of genomic data, e.g. data I/O, image preprocessing, data normalization, pathway analysis, data visualiation, meta analysis, annotation facilities(MIAME, GeneBank, LocusLink, Unigene, etc), reference crawling over the PubMed and GO, design of experiments, gene selection, classification, clustering, ROC curves, Cox regression and much more. In this talk, we will browse Bioconductor project first. After then, we will talk about statistical issues in genomic data.

Part 1 : Overview of the Bioconductor project In this talk, we will discuss what Bioconductor Project is, focusing on Bioconductor software design. We also talk about the Vignetts and Sweave, a special documentation paradigm in the Bioconductor. We will browse Bioconductor packages.

資料 (PDF)

Part 2 : Statistical analysis using Bioconductor In this talk, we will focus on statistical issues to analyze genomic data. Using real example, we wll discuss how to analyze genomic data(microarray) using packages in Bioconductor and statistical issues in each step of analyses.

資料 (PDF)

GGobi : Interactive and dynamic data visualization system

日時:2006年11月30日 10:30-12:00

場所:統計数理研究所 新館2階 研修室

Speaker: Eun-kyung Lee (Seoul National University, Korea)

Abstract:

GGobi is an open source visualization program for exploring high-dimensional data. It provides dynamic and interactive graphics such as tours, linked brushing and identification, etc. Recently the guided tour for exploratory supervised classification is added. We will discuss the main feature of GGobi and focus on the guided tours using projection pursuit index.

資料(PDF)

SC06

2006.11.11~2006.11.17 スーパーコンピューティングカンファレンス SC2006にブースを展示しました。

ブース展示の様子