Autoclass PDF Print E-mail
Written by Rizki Noor Hidayat Wijayaź   

AutoClass is an unsupervised Bayesian classification system that seeks a maximum posterior probability classification. Key features:

  • determines the number of classes automatically;
  • can use mixed discrete and real valued data;
  • can handle missing values;
  • processing time is roughly linear in the amount of the data;
  • cases have probabilistic class membership;
  • allows correlation between attributes within a class;
  • generates reports describing the classes found; and
  • predicts *test* case class memberships from a *training* classification.

Inputs consist of a database of attribute vectors (cases), either real or discrete valued, and a class model. Default class models are provided. AutoClass finds the set of classes that is maximally probable with respect to the data and model. The output is a set of class descriptions, and partial membership of the cases in the classes. For more details see *Bayesian Classification (AutoClass): Theory and Results* (kdd-95.ps in ~/autoclass-c/doc/), *Bayesian Classification Theory* (tr-fia-90-12-7-01.ps in ~/autoclass-c/doc/). A list of references is included below.

WHAT IS AUTOCLASS III:

AutoClass III, programmed in Common Lisp, is the official released implementation of AutoClass available from COSMIC (NASA*s software distribution agency): COSMIC University of Georgia 382 East Broad Street Athens, GA 30602 USA voice: (706) 542-3265 fax: (706) 542-4807 telex: 41- 190 UGA IRC ATHENS e-mail: cosmic@@uga.bitnet or service@@cossack.cosmic.uga.edu Request *AutoClass III - Automatic Class Discovery from Data (ARC-13180)*.

WHAT IS AUTOCLASS X:

AutoClass X is an experimental extension to AutoClass III, available only domestically, by means of a non-disclosure agreement. It implements hierarchical classification where attributes are associated with appropriate levels of the class hierarchy. The search methodology is currently in development. It is implemented in Common Lisp. Contact Will Taylor ( ).

WHAT IS AUTOCLASS C:

AutoClass C is a publicly available implementation of AutoClass III, with some improvements from AutoClass X, done in the C language. It was programmed by Dr. Diane Cook ( ) and Joseph Potts ( ) of the University of Texas at Arlington. Will Taylor ( ) *productized* the software through extensive testing, addition of sample data bases, and re-working the user documentation.

[download: 27,40 MB]