Python自然语言处理

当前位置:首页 > 网络编程 > 编程语言与程序设计 > Python自然语言处理

出版社:东南大学出版社
出版日期:2010-6
ISBN:9787564122614
作者:(英)伯德,(英)克莱因,(美)洛普
页数:479页

章节摘录

　　Back in elementary school you learned the difference between nouns, verbs, adjectives,and adverbs. These "word classes" are not just the idle invention of grammarians, but are useful categories for many language processing tasks. As we will see, they arise from simple analysis of the distribution of words in text. The goal of this chapter is to answer the following questions：　　1. What are lexical categories, and how are they used in natural language processing？　　2. What is a good Python data structure for storing words and their categories？　　3. How can we automatically tag each word of a text with its word class？　　Along the way, well cover some fundamental techniques in NLP, including sequence　　labeling, n-gram models, backoff, and evaluation. These techniques are useful in many areas, and tagging gives us a simple context in which to present them. We will also see how tagging is the second step in the typical NLP pipeline, following tokenization.　　The process of classifying words into their parts-of-speech and labeling them accord-ingly is known as part-of-speech tagging, POS tagging, or simply tagging. Parts-of-speech are also known as word classes or lexical categories. The collection of tags used for a particular task is known as a tagset. Our emphasis in this chapter is on exploiting tags, and tagging text automatically.

前言

This is a book about Natural Language Processing. By "natural language" we mean alanguage that is used for everyday communication by humans; languages such as Eng-lish， Hindi， or Portuguese. In contrast to artificial languages Such as programming lan-guages and mathematical notations， natural languages have evolved as they pass fromgeneration to generation， and are hard to pin down with explicit rules. We will takeNatural Language Processing——-or NLP for shortmin a wide sense to cover any kind ofcomputer manipulation of natural language. At one extreme， it could be as simple ascounting word frequencies to compare different writing styles. At the other extreme，NLP involves "understanding" complete human utterances， at least to the extent ofbeing able to give Useful responses to them. Technologies based on NLP are becoming increasingly widespread. For example，phones and handheld computers support predictive text and handwriting recognition;web search engines give access to information locked up in unstructured text; machinetranslation allows us to retrieve texts written in Chinese and read them in Spanish. Byproviding more natural human-machine interfaces， and more sophisticated access tostored information， language processing has come to play a central role in the multi-lingual information society.This book provides a highly accessible introduction to the field of NIP. It can be usedfor individual study or as the textbook for a course on natural language processing orcomputational linguistics， or as a supplement to courses in artificial intelligence， textmining， or corpus linguistics. The book is intensely practical， containing hundreds offully worked examples and graded exercises.

媒体关注与评论

　　“很少有这样一本方法清晰、代码整洁的书来讨论如此高难度的计算机问题……这是学习自然语言处理的入门佳作。”　　——Ken Getz，资深咨询顾问，MCW Technologies公司

内容概要

Steven Bird是墨尔本大学计算机科学和软件工程系副教授，以及宾夕法尼亚大学语言数据联合会高级研究助理。

    克莱因是爱丁堡大学信息学院语言技术教授。

    洛普最近从宾夕法尼亚大学获得机器学习自然语言处理博士学位，目前是波士顿BBN Technologies公司的研究员。

书籍目录

Preface

1.Language Processing and Python

1.1 Computing with Language： Texts and Words

1.2 A Closer Look at Python： Texts as Lists of Words

1.3 Computing with Language： Simple Statistics

1.4 Back to Python： Making Decisions and Taking Control

1.5 Automatic Natural Language Understanding

1.6 Summary

1.7 Further Reading

1.8 Exercises

2.Accessing Text Corpora and Lexical Resources

2.1 Accessing Text Corpora

2.2 Conditional Frequency Distributions

2.3 More Python： Reusing Code

2.4 Lexical Resources

2.5 WordNet

2.6 Summary

2.7 Further Reading

2.8 Exercises

3.Processing Raw Text

3.1 Accessing Text from the Web and from Disk

3.2 Strings： Text Processing at the Lowest Level

3.3 Text Processing with Unicode

3.4 Regular Expressions for Detecting Word Patterns

3.5 Useful Applications of Regular Expressions

3.6 Normalizing Text

3.7 Regular Expressions for Tokenizing Text

3.8 Segmentation

3.9 Formatting： From Lists to Strings

3.10 Summary

3.11 Further Reading

3.12 Exercises

4.Writing Structured Programs

4.1 Back to the Basics

4.2 Sequences

4.3 Questions of Style

4.4 Functions： The Foundation of Structured Programming

4.5 Doing More with Functions

4.6 Program Development

4.7 Algorithm Design

4.8 A Sample of Python Libraries

4.9 Summary

4.10 Further Reading

4.11 Exercises

5.Categorizing andTagging Words

5.1 Using a Tagger

5.2 Tagged Corpora

5.3 Mapping Words to Properties Using Python Dictionaries

5.4 Automatic Tagging

5.5 N-Gram Tagging

5.6 Transformation-Based Tagging

5.7 How to Determine the Category of a Word

5.8 Summary

5.9 Further Reading

5.10 Exercises

6.Learning to Classify Text

6.1 Supervised Classification

6.2 Further Examples of Supervised Classification

6.3 Evaluation

6.4 Decision Trees

6.5 Naive Bayes Classifiers

6.6 Maximum Entropy Classifiers

6.7 Modeling Linguistic Patterns

6.8 Summary

6.9 Further Reading

6.10 Exercises

7.Extracting Information from Text

7.1 Information Extraction

7.2 Chunking

7.3 Developing and Evaluating Chunkers

7.4 Recursion in Linguistic Structure

7.5 Named Entity Recognition

7.6 Relation Extraction

7.7 Summary

7.8 Further Reading

7.9 Exercises

8.Analyzing Sentence Structure

8.1 Some Grammatical Dilemmas

8.2 Whats the Use of Syntax?

8.3 Context-Free Grammar

8.4 Parsing with Context-Free Grammar

8.5 Dependencies and Dependency Grammar

8.6 Grammar Development

8.7 Summary

8.8 Further Reading

8.9 Exercises

9.Building Feature-Based Grammars

9.1 Grammatical Features

9.2 Processing Feature Structures

9.3 Extending a Feature-Based Grammar

9.4 Summary

9.5 Further Reading

9.6 Exercises

10.Analyzing the Meaning of Sentences

10.1 Natural Language Understanding

10.2 Propositional Logic

10.3 First-Order Logic

10.4 The Semantics of English Sentences

10.5 Discourse Semantics

10.6 Summary

10.7 Further Reading

10.8 Exercises

11.Managing Linguistic Data

11.1 Corpus Structure： A Case Study

11.2 The Life Cycle of a Corpus

11.3 Acquiring Data

11.4 Working with XML

11.5 Working with Toolbox Data

11.6 Describing Language Resources Using OLAC Metadata

11.7 Summary

11.8 Further Reading

11.9 Exercises

Afterword： The Language Challenge

Bibliography

NLTK Index

General Index

作者简介

《Python自然语言处理(影印版)》提供了非常易学的自然语言处理入门介绍，该领域涵盖从文本和电子邮件预测过滤，到自动总结和翻译等多种语言处理技术。在《Python自然语言处理(影印版)》中，你将学会编写Python程序处理大量非结构化文本。你还将通过使用综合语言数据结构访问含有丰富注释的数据集，理解用于分析书面通信内容和结构的主要算法。

《Python自然语言处理》准备了充足的示例和练习，可以帮助你：

从非结构化文本中抽取信息，甚至猜测主题或识别“命名实体”；

分析文本语言结构，包括解析和语义分析；

访问流行的语言学数据库，包括WordNet和树库(treebank)；

从多种语言学和人工智能领域中提取的整合技巧。

《Python自然语言处理(影印版)》将帮助你学习运用Python编程语言和自然语言工具包(NLTK)获得实用的自然语言处理技能。如果对于开发Web应用、分析多语言新闻源或记录濒危语言感兴趣——即便只是想从程序员视角观察人类语言如何运作，你将发现《Python自然语言处理》是一本令人着迷且极为有用的好书。

图书封面

Python自然语言处理下载精选章节试读更多精彩书评

发布书评

精彩书评 (总计3条)

现在的研究方向是NLP，由于以前没有相关的基础知识（特别是数学很差劲），导致学习起来觉得困难重重。后来了解到Python语言在NLP领域有较好的应用空间，于是就学习了Python的基础知识，然后就发现了此书的中文翻译版，于是打印出来认真拜读。它的特点是：实例多（虽然木有中文方法的实例），条理清楚，涵盖面广（NLP领域的基本方面都所涉及），所以它的帮助下，我现在感觉自己慢慢入门了，对NLP有了一点儿好感了，希望这种好感继续发酵。
NLTK入门好书，基本就是官方文档的部分集合（定位于入门所以很多内容没涉及到），不过添加了很多Python语言方面的东西，对于非计算机专业的人相当友好。Python 只要解决了中文问题用起来真是爽，Python 3似乎已经解决了，可惜很多算法库都没有3的版本，所以还是老老实实各种hack各种转换老老实实用2吧……
不过书中还用了一部分篇幅穿插讲解python最基础的编程技术，就不太让人理解了。读这种书肯定是先对python有一定的了解了。这一点有点定位不清啊。整体的还是很不错的。为什么发表不了，抱怨我评论太短。 150字还短么

精彩短评 (总计38条)

nltk必读，写得很详细，每节之后都有一些练习题
比较初浅
很好的书！手把手教。
暑假的时候真应该读完………………
【翻过】看过两章，其实适合边做边看。张方舟看过：适合当手册。
NLTK配合实践还是很棒的，但是扩展性稍差。不过算作自然语言处理入门必备
纯英文的处理，好像没有中文的。
介绍个工具NLTK
非常详细的介绍了python的自然语言处理功能，如何使用NLTK和高级使用方法，非常实用！
在读
介绍了好多python,NLP的理论太少了……
绝佳入门书！尤其适合我这种数学渣。NLTK大法好，Zen Bot 的聊天能力也是醉。Free Love！
了解NLTK的入门书，用来做过课程项目。
分析数据时候很有用。
先读一遍，理解太浅，当先了解一下框架吧
吐血推荐
要是能写出这样的书就已经很好了。
主要阅读文字处理等部分。本书使用了一个叫NLTP的库。
nltk是个好工具。可以自己训练模型，书中介绍得简略了，需要好好研究文档。P.S.以前做过中文分词的小工具，unigram+大量语料训练，准确率挺高的了。机器逻辑有其局限性，我觉得有的问题在于对中文本身的探究不够深。
python做nlp的神书，主要讲nltk。现在好像有了中文出版的书
浅尝辄止
书里面有大量的实践，学习理论知识的同时辅之以实践确实能达到意想不到的效果。
一般, 只能大致了解, 然后去翻官网api和tutorial.
评价：这本应该叫《NLTK入门指南>。没了。
nltk工具书，没啥太多干货
基本是nltk的使用说明。。。标题有些误导了
应该改名为NLTK入门指南
大概就是只能当nltk的简易版白痴文档用，后面的玄学又看不懂。神奇（所以不敢说差）
居然我学python的入门教材
前几章还是讲Python，忽略之，跳到第七章开始涉及到NLP，算是入门读物
就是 nltk 的简介 orz
http://www.cnblogs.com/yuxc/category/307122.html有一个网友的读书笔记！
有用的东西不多
最好去看官方最新的文档，这本书里面代码没有更新～
关于nltk如何使用
书是好书，但是让这小子翻译烂了
神翻译啊....
真是再读

Python自然语言处理

发布书评

精彩书评 (总计3条)

精彩短评 (总计38条)

类似图书

相关图书推荐