Data mining can be defined as the process of selection, exploration and modelling of large databases, in order to discover models and patterns. The increasing availability of data in the current information society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract such knowledge from data. Applications occur in many different fields, including statistics, computer science, machine learning, economics, marketing and finance. This book is the first to describe applied data mining methods in a consistent statistical framework, and then show how they can be applied in practice. All the methods described are either computational, or of a statistical modelling nature. Complex probabilistic models and mathematical tools are not used, so the book is accessible to a wide audience of students and industry professionals. The second half of the book consists of nine case studies, taken from the author's own work in industry, that demonstrate how the methods described can be applied to real problems. Provides a solid introduction to applied data mining methods in a consistent statistical framework Includes coverage of classical, multivariate and Bayesian statistical methodology Includes many recent developments such as web mining, sequential Bayesian analysis and memory based reasoning Each statistical method described is illustrated with real life applications Features a number of detailed case studies based on applied projects within industry Incorporates discussion on software used in data mining, with particular emphasis on SAS Supported by a website featuring data sets, software and additional material Includes an extensive bibliography and pointers to further reading within the text Author has many years experience teaching introductory and multivariate statistics and data mining, and working on applied projects within industry A valuable resource for advanced undergraduate and graduate students of applied statistics, data mining, computer science and economics, as well as for professionals working in industry on projects involving large volumes of data – such as in marketing or financial risk management.
With the advent of the Web along with the unprecedented amount of information available in electronic format, conceptual data analysis is more useful and practical than ever, because this technology addresses important limitations of the systems that currently support users in their quest for information. Concept Data Analysis: Theory & Applications is the first book that provides a comprehensive treatment of the full range of algorithms available for conceptual data analysis, spanning creation, maintenance, display and manipulation of concept lattices. The accompanying website allows you to gain a greater understanding of the principles covered in the book through actively working on the topics discussed. The three main areas explored are interactive mining of documents or collections of documents (including Web documents), automatic text ranking, and rule mining from structured data. The potentials of conceptual data analysis in the application areas being considered are further illustrated by two detailed case studies. Concept Data Analysis: Theory & Applications is essential for researchers active in information processing and management and industry practitioners who are interested in creating a commercial product for conceptual data analysis or developing content management applications.
The purpose of this book is to provide a practical approach for IT professionals to acquire the necessary knowledge and expertise in data modeling to function effectively. It begins with an overview of basic data modeling concepts, introduces the methods and techniques, provides a comprehensive case study to present the details of the data model components, covers the implementation of the data model with emphasis on quality components, and concludes with a presentation of a realistic approach to data modeling. It clearly describes how a generic data model is created to represent truly the enterprise information requirements.
The new edition of the classic bestseller that launched the data warehousing industry covers new approaches and technologies, many of which have been pioneered by Inmon himself In addition to explaining the fundamentals of data warehouse systems, the book covers new topics such as methods for handling unstructured data in a data warehouse and storing data across multiple storage media Discusses the pros and cons of relational versus multidimensional design and how to measure return on investment in planning data warehouse projects Covers advanced topics, including data monitoring and testing Although the book includes an extra 100 pages worth of valuable content, the price has actually been reduced from $65 to $55
This is the first book to provide in-depth coverage of star schema aggregates used in dimensional modeling-from selection and design, to loading and usage, to specific tasks and deliverables for implementation projects Covers the principles of aggregate schema design and the pros and cons of various types of commercial solutions for navigating and building aggregates Discusses how to include aggregates in data warehouse development projects that focus on incremental development, iterative builds, and early data loads
This book introduces the reader to methods of data mining on the web, including uncovering patterns in web content (classification, clustering, language processing), structure (graphs, hubs, metrics), and usage (modeling, sequence analysis, performance).
A goldmine of valuable tools for data modelers! Data modelers render raw data-names, addresses, and sales totals, for instance-into information such as customer profiles and seasonal buying patterns that can be used for making critical business decisions. This book brings together thirty of the most effective tools for solving common modeling problems. The author provides an example of each tool and describes what it is, why it is needed, and how it is generally used to model data for both databases and data warehouses, along with tips and warnings. Blank sample copies of all worksheets and checklists described are provided in an appendix. Companion Web site features updates on the latest tools and techniques, plus links to related sites offering automated tools.
Provides readers with the methods, algorithms, and means to perform text mining tasks This book is devoted to the fundamentals of text mining using Perl, an open-source programming tool that is freely available via the Internet (www.perl.org). It covers mining ideas from several perspectives–statistics, data mining, linguistics, and information retrieval–and provides readers with the means to successfully complete text mining tasks on their own. The book begins with an introduction to regular expressions, a text pattern methodology, and quantitative text summaries, all of which are fundamental tools of analyzing text. Then, it builds upon this foundation to explore: Probability and texts, including the bag-of-words model Information retrieval techniques such as the TF-IDF similarity measure Concordance lines and corpus linguistics Multivariate techniques such as correlation, principal components analysis, and clustering Perl modules, German, and permutation tests Each chapter is devoted to a single key topic, and the author carefully and thoughtfully introduces mathematical concepts as they arise, allowing readers to learn as they go without having to refer to additional books. The inclusion of numerous exercises and worked-out examples further complements the book's student-friendly format. Practical Text Mining with Perl is ideal as a textbook for undergraduate and graduate courses in text mining and as a reference for a variety of professionals who are interested in extracting information from text documents.