This book explores the application of generative Large Language Models (LLMs) for extracting and analyzing data from natural language artefacts. Unlike traditional uses of LLMs, such as translation and summarization, this book focuses on utilizing these models to convert unstructured text into data that can be processed through the data science pipeline to generate actionable insights.
The content is designed for professionals in diverse fields including cognitive science, linguistics, management, and information systems. It combines insights from both industry and academia to provide a comprehensive understanding of how LLMs can be effectively used for natural language analytics (NLA). The book details practical methodologies for implementing LLMs locally using open-source tools, ensuring data privacy and feasibility without the need for expensive infrastructure.
Key topics include interpretant, mindset and cultural analysis, emphasizing the use of LLMs to derive soft data—qualitative information crucial for nuanced decision-making. The text also outlines the technical aspects of LLMs, including their architecture, token embeddings, and the differences between encoder-based and decoder-based models. By providing a case study and practical examples, the authors show how LLMs can be used to meet various analytical needs, making this book a valuable resource for anyone looking to integrate advanced natural language processing techniques into their data analysis workflows.