In today's fast-paced digital landscape, the sheer volume of documents such as Data Privacy Policies and Terms of Service can be overwhelming, especially in sectors like education technology (EdTech).
Data privacy policies and terms of service are often complex and difficult to digest, especially for children and their guardians. These documents are crucial, as they provide the first glimpse into how vendors manage, protect, and utilize children's data. Transparency in these policies is essential for building trust; when companies clearly communicate their data practices, it reassures parents and educators that they prioritize the safety and privacy of young users. Or, alternatively, vendors can be held accountable if there is lack of transparency.
To tackle this challenge of reading and understanding such large texts, we’re exploring how advanced tools, like OpenAI’s Large Language Models (LLMs), can streamline information extraction and analysis.
Our recent study evaluated the efficiency and reliability of OpenAI’s APIs, including GPT-3.5 Turbo and GPT-4, in automating question-answering tasks for Data Privacy Policy (DPP) documents from various EdTech providers. We tested these models in several settings—using direct API calls, LangChain, and Retrieval Augmented Generation (RAG) systems.
One key finding was that using OpenAI’s LLMs through API calls can significantly speed up document analysis, especially when local GPU resources are limited. However, local deployments of quantized models, like Llama-2, provide a valuable alternative for organizations concerned about data privacy. These models can handle long texts without compromising performance, proving that smaller models can still be effective.
Our research highlights the importance of choosing the right approach for document analysis, balancing efficiency with data governance. As EdTech continues to evolve, leveraging AI tools will be essential for improving how we manage and interpret critical information.
While far from ideal results, we see the potential of collaborating with such tools in ways that can efficiently navigate through large and difficult to digest texts.
Reference: E. Filipovska, A. Mladenovska, M. Bajrami, J. Dobreva, V. Hillman, P. Lameski, & E. Zdravevski, "Benchmarking OpenAI’s APIs and other Large Language Models for Repeatable and Efficient Question Answering Across Multiple Documents," in Preproceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 107–117. 2024. Available online: https://annals-csis.org/proceedings/2024/pliks/3979.pdf
Comments