The paper discusses the importance of openness, transparency, and accountability in instruction-tuned text generators. It highlights the growing number of open source text generators and questions their true level of openness. The authors provide a comprehensive table that evaluates the openness of various projects based on factors such as availability, documentation, and access. They emphasize the benefits of open alternatives, including reproducible workflows and reduced reliance on proprietary software. The paper also identifies recurring patterns in the landscape of instruction-tuned text generators, such as the lack of open data and the rise of synthetic instruction-tuning. The conclusion emphasizes that while openness is not a complete solution to the challenges of text generators, it enables original research and fosters a culture of accountability.
Signal | Change | 10y horizon | Driving force |
---|---|---|---|
Growing amount of instruction-tuned text generators billing themselves as ‘open source’ | Increasing openness and transparency in text generators | More open and accountable text generators | Desire for reproducible workflows and accountability |
Openness is key for fundamental research, critical computational literacy, and informed choices | Importance of openness in research and education | Greater emphasis on openness and transparency | Desire for cumulative progress and informed decision-making |
15+ ChatGPT alternatives at varying degrees of openness, development, and documentation | Emergence of alternative text generators | Increased availability and diversity of text generators | Desire for more options and alternatives |
Projects inherit data of dubious legality | Questionable data sources | Greater scrutiny and legal compliance in data collection | Need for ethical and legal considerations |
Few projects share instruction-tuning | Limited sharing of instruction-tuning | More sharing and collaboration in instruction-tuning | Desire for transparency and reproducibility |
Synthetic instruction-tuning data is on the rise | Increased use of synthetic data | Research on the consequences and implications of synthetic data | Exploration of new data generation methods |
Openness enables reproducible workflows and understanding of LLM + RLHF architectures | Facilitation of reproducibility and understanding | Advancements in reproducibility and architecture understanding | Desire for progress and transparency |
Openness enables checks and balances and fosters accountability | Promotion of accountability and transparency | Culture of accountability and responsible deployment | Desire for responsible and ethical AI development |