1. Tech mining using open source tools -- 1.1 Why this book -- 1.2 Who would be interested -- 1.3 The state of play -- 1.4 What comes next
Text of Note
2. Python installation -- 2.1 Scripts, data, and examples -- 2.2 Different versions of Python -- 2.3 Installing Python -- 2.4 Development environment -- 2.5 Packages
Text of Note
3. Python basics for text mining -- 3.1 Input, strings, and output -- 3.2 Data structures -- 3.3 Compound data structures
Text of Note
4. Sources of science and technology information -- 4.1 Collecting and downloading the data -- 4.2 Altmetrics and the supply and demand for knowledge -- 4.3 Examples used in the text
Text of Note
5. Parsing collected data -- 5.1 Reading column-structured data -- 5.2 Reading row-structured data -- 5.3 Adapting the parsers for new databases -- 5.4 Reading and parsing from a directory -- 5.5 Reading and printing a JSON dictionary of dictionaries
Text of Note
6. Parsing tree-structured files -- 6.1 Reading an XML file -- 6.2 Web scraping using BeautifulSoup -- 6.3 Mining content from PDF files
Text of Note
7. Extracting and reporting on text -- 7.1 Splitting JSONs on an attribute -- 7.2 Making a counter -- 7.3 Making simple reports from the data -- 7.4 Making dictionaries of the data -- 7.5 Counting words in documents
Text of Note
8. Indexing and tabulating the data -- 8.1 Creating a partial index of the data -- 8.2 Making dataframes -- 8.3 Creating cross-tabs -- 8.4 Reporting on dataframes
Text of Note
Conclusions -- References -- Index.
0
8
8
8
8
8
8
8
8
SUMMARY OR ABSTRACT
Text of Note
This book offers practical tools in Python to students of innovation as well as competitive intelligence professionals to track new developments in science, technology, and innovation. The book will appeal to both--tech-mining and data science audiences. For tech-mining audiences, Python presents an appealing, all-in-one language for managing the tech-mining process. The book is a complement to other introductory books on the Python language, providing recipes with which a practitioner can grow a practice of mining text. For data science audiences, this book gives a succinct overview of the most useful techniques of text mining. The book also provides relevant domain knowledge from engineering management; so, an appropriate context for analysis can be created. This is the first book of a two-book series. This first book discusses the mining of text, while the second one describes the analysis of text. This book describes how to extract actionable intelligence from a variety of sources including scientific articles, patents, pdfs, and web pages. There are a variety of tools available within Python for mining text. In particular, we discuss the use of pandas, BeautifulSoup, and pdfminer.