References

Chapter I: Introduction

  1. Dziri, N., Milton, S., Yu, M., Zaiane, O., & Reddy, S. (2022). On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models? Proceedings of NAACL 2022, 5271–5285.
  2. Vaccari, C., & Chadwick, A. (2020). Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News. Social Media + Society, 6(1).
  3. Zuccon, G., Koopman, B., & Shaik, R. (2023). ChatGPT Hallucinates when Attributing Answers. SIGIR-AP '23, 46–51.
  4. Gravel, J., D'Amours-Gravel, M., & Osmanlliu, E. (2023). Learning to Fake It: Limited Responses and Fabricated References Provided by ChatGPT for Medical Questions. Mayo Clinic Proceedings: Digital Health, 1(3), 226–234.
  5. Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is Inevitable: An Innate Limitation of Large Language Models. arXiv:2401.11817.
  6. Yu, L., Cao, M., Cheung, J. C., & Dong, Y. (2024). Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations. Findings of EMNLP 2024, 7943–7956.
  7. Trask, A., Bluemke, E., Garfinkel, B., Ghezzou Cuervas-Mons, C., & Dafoe, A. (2020). Beyond Privacy Trade-offs with Structured Transparency. arXiv:2012.08347.
  8. Youssef, A., et al. (2023). Organizational Factors in Clinical Data Sharing for Artificial Intelligence in Health Care. JAMA Network Open, 6, e2348422.
  9. McMahan, H. B., Moore, E., Ramage, D., & Agüera y Arcas, B. (2016). Federated Learning of Deep Networks using Model Averaging. arXiv:1602.05629.
  10. Rieke, N., et al. (2020). The future of digital health with federated learning. npj Digital Medicine, 3(1).
  11. Nguyen, T. T., et al. (2024). A Survey of Machine Unlearning. arXiv:2209.02299.
  12. Amodei, D., et al. (2016). Concrete Problems in AI Safety. arXiv:1606.06565.
  13. Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437.
  14. Ntoutsi, E., et al. (2020). Bias in data-driven artificial intelligence systems—An introductory survey. WIREs Data Mining and Knowledge Discovery, 10(3), e1356.
  15. Kaplan, J., et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
  16. Hoffmann, J., et al. (2022). Training Compute-Optimal Large Language Models. arXiv:2203.15556.
  17. Sutton, R. (2019). The Bitter Lesson. Incomplete Ideas (blog).
  18. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  19. Dunbar, R. I. M. (1993). Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences, 16(4), 681–735.
  20. Freeman, L. C. (1977). A set of measures of centrality based on betweenness. Sociometry, 35–41.
  21. Gabriel, I., et al. (2024). The Ethics of Advanced AI Assistants. arXiv:2404.16244.
  22. Mouton, C. A., Lucas, C., & Guest, E. (2023). The Operational Risks of AI in Large-Scale Biological Attacks: A Red-Team Approach. RAND Corporation.
  23. E. Schmidt and J. Rosenberg. 2014. How google works Grand Central Publishing.
  24. M. Bhattacharyya, V. M. Miller, D. Bhattacharyya, and L. E. Miller. 2023. High rates of fabricated and inaccurate references in chatgpt-generated medical content. Cureus, 15(5).
  25. T. Devriendt, M. Shabani, and P. Borry. 2021. Data sharing in biomedical sciences: A systematic review of incentives. Biopreservation and Biobanking, 19(3):219–227. PMID: 33926229
  26. L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. Lowe. 2022. Training language models to follow instructions with human feedback. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  27. Z. Roozbahani. 2025. A review of methods for reducing hallucinations in generative artificial intelligence to enhance knowledge economy. Knowledge Economy Studies.
  28. M. A. Ahmad, I. Yaramis, and T. D. Roy. 2023. Creating trustworthy llms: Dealing with hallucinations in healthcare ai.
  29. P. Manakul, A. Liusie, and M. J. F. Gales. 2023. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models
  30. A. Trask, E. Bluemke, B. Garfinkel, C. G. Cuervas-Mons, and A. Dafoe. 2020. Beyond privacy trade-offs with structured transparency. CoRR, abs/2012.08347.
  31. A. Golodny. 2023. Senate ai forum focuses on intellectual property issues, 12.
  32. J. Jeong, B. L. Vey, A. Reddy, T. Kim, T. Santos, R. Correa, R. Dutt, M. Mosunjac, G. OpreaIlies, G. Smith, M. Woo, C. R. McAdams, M. S. Newell, I. Banerjee, J. Gichoya, and H. Trivedi. 2022. The emory breast imaging dataset (embed): A racially diverse, granular dataset of 3.5m screening and diagnostic mammograms
  33. P. D. Y. Trieu, C. Mello-Thoms, M. Barron, and S. Lewis. 2023 Look how far we have come: Breast cancer detection education on the international stage Frontiers in Oncology, 12, 01.
  34. T. Onega, E. Beaber, B. Sprague, W. Barlow, J. Haas, A. Tosteson, M. Schnall, K. Armstrong, M. Schapira, B. Geller, D. Weaver, and E. Conant. 2014. Breast cancer screening in an era of personalized regimens a conceptual model and national cancer institute initiative for risk-based and preference-based approaches at a population level. Cancer, 120, 10.
  35. A. Youssef, M. Ng, J. Long, T. Hernandez-Boussard, N. Shah, A. Miner, D. Larson, and C. Langlotz. 2023. Organizational factors in clinical data sharing for artificial intelligence in health care. JAMA Network Open, 6:e2348422, 12.
  36. Gould, J. (2015). Data sharing: Why it doesn't happen. Nature Jobs.
  37. D. Grybauskas. 2023. Will twitter's new rate limits really stop scraping? Built In, July. Online article.
  38. M. O'Brien. 2025. Reddit sues ai company perplexity and others for 'industrial-scale' scraping of user comments. Associated Press, , October. Updated 4:41 PM GMT-4, October 22, 2025
  39. P. Samuelson. 2023. Generative ai meets copyright. Science, 381(6654):158–161.
  40. M. M. Grynbaum and R. Mac. 2023. The times sues openai and microsoft over a.i. use of copyrighted work. The New York Times, December
  41. J. Patel. 2019. Bridging data silos using big data integration International Journal of Database Management Systems, 11(3):01–06.
  42. K. Robison. 2024. OpenAI cofounder Ilya Sutskever says the way AI is built is about to change. The Verge, December. Accessed: December 31, 2024.
  43. H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas. 2016. Federated learning of deep networks using model averaging. CoRR, abs/1602.05629.
  44. N. Rieke, J. Hancox, W. Li, F. Milletar`ı, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, S. Ourselin, M. Sheller, R. M. Summers, A. Trask, D. Xu, M. Baust, and M. J. Cardoso. 2020. The future of digital health with federated learning. npj Digital Medicine, 3(1), September.
  45. I. Zhou, F. Tofigh, M. Piccardi, M. Abolhasan, D. Franklin, and J. Lipman. 2024. Secure multi-party computation for machine learning: A survey. IEEE Access , 12:53881–53899.
  46. A. Jadon and S. Kumar. 2023. Leveraging generative ai models for synthetic data generation in healthcare: Balancing research and privacy. In 2023 International Conference on Smart Applications, Communications and Networking (SmartNets) , pages 1–4.
  47. Rubin, D. B. (1993). Statistical disclosure limitation. Journal of Official Statistics, 9(2), 461–468.
  48. A. C. Yao. 1982b. Protocols for secure computations. pages 160–164.
  49. J. L. Lobo, S. Gil-Lopez, and J. Del Ser. 2023. The right to be forgotten in artificial intelligence: issues, approaches, limitations and challenges In 2023 IEEE Conference on Artificial Intelligence (CAI) , pages 179–180. IEEE.
  50. T. Nguyen, T. T. Huynh, Z. Ren, P. L. Nguyen, A. W.-C. Liew, H. Yin, and Q. V. H. Nguyen. 2024. A survey of machine unlearning.
  51. D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, and D. Mane. 2016. Concrete problems in ai safety.
  52. I. Gabriel. 2020. Artificial intelligence, values, and alignment. , 30(3):411– 437.
  53. C. A. Mouton, C. Lucas, and E. Guest. 2023. The Operational Risks of AI in Large-Scale Biological Attacks: A Red-Team Approach. RAND Corporation, Santa Monica, CA.
  54. Australian Government Department of Industry, Science and Resources. 2024. Australia's ai ethics principles. Accessed 12-09-2024.
  55. European Commission. 2024b. Germany ai strategy report. Accessed 12-09-2024
  56. European Commission. 2024a. France ai strategy report. Accessed 12-09- 2024
  57. European Commission. 2020. White paper on artificial intelligence: A european approach to excellence and trust. Accessed 12-09-2024
  58. IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. 2021. Ethically aligned design: A vision for prioritizing human well-being with artificial intelligence and autonomous systems. [Accessed 12-09-2024].
  59. Future of Life Institute. 2024 Ai principles. [Accessed 12-09-2024].
  60. Access Now. 2023. The toronto declaration: Protecting the rights to equality and non-discrimination in machine learning systems. [Accessed 12-09-2024].
  61. AI Now Institute. 2024 Algorithmic accountability policy toolkit. in machine learning systems. [Accessed 12-09-2024].
  62. Montreal Declaration. 2024 Montreal declaration for a responsible development of artificial intelligence. [Accessed 12-09-2024].
  63. World Economic Forum. 2023. Ai governance: A holistic approach to implement ethics into ai. [Accessed 12-09-2024].
  64. Harvard Berkman Klein Center for Internet & Society. 2020 Principled ai. [Accessed 12-09-2024].
  65. Alan Turing Institute. 2019. Understanding artificial intelligence ethics and safety [Accessed 12-09-2024].
  66. Google. 2024. Ai principles [Accessed 12-09-2024].
  67. Microsoft. 2024. Ai principles and approach [Accessed 12-09-2024].
  68. IBM. 2018. . Ibm principles for trust and transparency. [Accessed 12-09-2024].
  69. G. Riesen. 2023. Imagine A World: What if global challenges led to more centralization? Audio podcast episode, sep. Accessed: 2025-01-03
  70. Q. Pope. 2023. AI is centralizing by default; let's not make it worse. Effective Altruism Forum, sep. Accessed: 2025-01-03.
  71. crispweed. 2024. The Alignment Trap: AI Safety as Path to Power. LessWrong, oct. Accessed: 2025-01-03.
  72. C. Summerfield, L. P. Argyle, M. Bakker, T. Collins, E. Durmus, T. Eloundou, I. Gabriel, D. Ganguli, K. Hackenburg, G. K. Hadfield, et al. 2025. The impact of advanced ai systems on democracy Nature Human Behaviour, pages 1–11.
  73. E. Welle. 2025 Aligning those who align AI, one satirical website at a time. The Verge, September. Accessed: 2025-10-23.
  74. S. Samuel. 2024. "i lost trust": Why the openai team in charge of safeguarding humanity imploded, May.
  75. C. Hu and K. Cai. 2024. Sep.
  76. J. Morales. 2024. Musk's concerns over google deepmind "ai dictatorship" revealed in emails from 2016 - communications released during the recent openai court case, Nov.
  77. E. Ntoutsi, P. Fafalios, U. Gadiraju, V. Iosifidis, W. Nejdl, M.-E. Vidal, S. Ruggieri, F. Turini, S. Papadopoulos, E. Krasanakis, et al. 2020. Bias in data-driven artificial intelligence systems—an introductory survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 10(3):e1356
  78. J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. 2020. Scaling laws for neural language models. CoRR, abs/2001.08361.
  79. J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J. W. Rae, O. Vinyals, and L. Sifre. 2022 Training compute-optimal large language models.
  80. D. Mirvish. 2017. The Hathaway effect: How Anne gives Warren Buffett a rise, 3. Published March 2, 2011.
  81. A. Tong and M. Martina. 2024. Nov.
  82. R. I. Dunbar. 1993. Coevolution of neocortical size, group size and language in humans. Behavioral and brain sciences, 16(4):681–694.
  83. R. S. BURT. 1992 Structural Holes: The Social Structure of Competition. Harvard University Press.
  84. Burt, R. S. (2003). The social structure of competition. Networks in the Knowledge Economy, 13(2), 57–91.
  85. S. Russell, P. Norvig, and A. Intelligence. 1995. A modern approach. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs, 25(27):79–80.
  86. Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep learning. nature, 521(7553):436–444.
  87. V. Berger. 2025. The ai copyright battle: Why openai and google are pushing for fair use. Forbes, March. Online article.
  88. Statista Market Insights. 2025. Advertising - worldwide. Market Outlook, most recent update: Aug 2025.
  89. E. L. Bernays. 1928. Propaganda. Ig publishing.
  90. S. Adikari and K. Dutta. 2015. Real time bidding in online digital advertisement. In International Conference on Design Science Research in Information Systems, pages 19–38. Springer.

Chapter II: From Deep Learning to Deep Voting

  1. Kaplan, J., et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
  2. Hoffmann, J., et al. (2022). Training Compute-Optimal Large Language Models. arXiv:2203.15556.
  3. Robison, K. (2024). OpenAI cofounder Ilya Sutskever says the way AI is built is about to change. The Verge.
  4. Borgeaud, S., et al. (2022). Improving language models by retrieving from trillions of tokens. ICML 2022, 2206–2240.
  5. Izacard, G., et al. (2023). Atlas: Few-shot Learning with Retrieval Augmented Language Models. Journal of Machine Learning Research, 24, 1–43.
  6. Guo, Z., et al. (2023). Towards lossless dataset distillation via difficulty-aligned trajectory matching. arXiv:2310.05773.
  7. Epoch AI. (2024). Data on Notable AI Models. epoch.ai/data/notable-ai-models.
  8. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  9. Kemker, R., et al. (2018). Measuring Catastrophic Forgetting in Neural Networks. AAAI 2018.
  10. Cummins, M. (2024). How much LLM training data is there, in the limit? Educating Silicon.
  11. Le, Q. V., et al. (2013). Building high-level features using large scale unsupervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  12. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. NeurIPS 2012.
  13. Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. ECCV 2014.
  14. Chomsky, N. (2014). Aspects of the Theory of Syntax. MIT Press.
  15. Dwork, C., et al. (2006). Calibrating noise to sensitivity in private data analysis. TCC 2006.
  16. Abadi, M., et al. (2016). Deep Learning with Differential Privacy. CCS 2016.
  17. Feldman, V., & Zrnic, T. (2020). Individual Privacy Accounting via a Renyi Filter. arXiv:2008.11193.
  18. Papernot, N., et al. (2018). Scalable Private Learning with PATE. ICLR 2018.
  19. Zhao, Y., et al. (2018). Federated Learning with Non-IID Data. arXiv:1806.00582.
  20. Ainsworth, S. K., et al. (2022). Git Re-Basin: Merging Models modulo Permutation Symmetries. arXiv:2209.04836.
  21. Nguyen, T. T., et al. (2024). A Survey of Machine Unlearning. arXiv:2209.02299.
  22. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.
  23. J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. 2020. Scaling laws for neural language models. CoRR, abs/2001.08361.
  24. J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J. W. Rae, O. Vinyals, and L. Sifre. 2022. Training compute-optimal large language models.
  25. F. Strati, P. Elvinger, T. Kerimoglu, and A. Klimovic. 2024. Ml training with cloud gpu shortages: Is cross-region the answer? In Proceedings of the 4th Workshop on Machine Learning and Systems, EuroMLSys '24, page 107–116, New York, NY, USA. Association for Computing Machinery.
  26. D. Kaye. 2025. Nvidia hits new milestone as world's first $5tn company, 10.
  27. Z. Kachwala and A. Bajwa. 2025. Nvidia faces revenue threat from new u.s. ai chip export curbs, analysts say. Reuters, January. Updated January 13, 2025 6:31 PM GMT.
  28. D. Howley. 2023. There's an ai war, and nvidia is the only arms dealer: Analyst. Yahoo Finance, May. Updated May 25, 2023.
  29. J. You. 2025. Most of openai's 2024 compute went to experiments. Accessed: 2025-11-02.
  30. S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. B. Van Den Driessche, J.-B. Lespiau, B. Damoc, A. Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  31. J. Lederer. 2024. Statistical guarantees for sparse deep learning. AStA Advances in Statistical Analysis, 108(2):231–258.
  32. D. Dai, C. Deng, C. Zhao, R. Xu, H. Gao, D. Chen, J. Li, W. Zeng, X. Yu, Y. Wu, et al. 2024. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models. arXiv preprint arXiv:2401.06066.
  33. Z. Guo, K. Wang, G. Cazenavette, H. Li, K. Zhang, and Y. You. 2023. Towards lossless dataset distillation via difficulty-aligned trajectory matching. arXiv preprint arXiv:2310.05773.
  34. Meta AI. 2024. Introducing llama 3.1: Our most capable models to date. Meta AI Blog, July. Published July 23, 2024.
  35. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
  36. K. Wiggers. 2024. Openai ceo sam altman says lack of compute capacity is delaying the company's products. TechCrunch, October. Published 12:37 PM PDT, October 31, 2024.
  37. Epoch AI. 2025. Data on ai models, 07. Accessed: 2025-11-02.
  38. I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. MIT Press.
  39. I. Mehta. 2024. Zuckerberg says Meta will need 10x more computing power to train Llama 4 than Llama 3. TechCrunch, aug. 12:53 AM PDT.
  40. OpenAI. 2024. Learning to reason with LLMs. sep. Introduces OpenAI o1, a new large language model trained with reinforcement learning for complex reasoning
  41. J. Sevilla and E. Roldan. 2024. Training compute of frontier AI models grows by 4-5x per year. ´ Epoch AI Blog, may. Analysis of AI model compute trends showing 4-5x yearly growth from 2010 to 2024.
  42. Weights & Biases. 2025. Sweeps: An overview. Online tutorial for using W&B Sweeps for hyperparameter optimization.
  43. A. K. Veldanda, S.-X. Zhang, A. Das, S. Chakraborty, S. Rawls, S. Sahu, and M. Naphade. 2024. Llm surgery: Efficient knowledge unlearning and editing in large language models. arXiv preprint arXiv:2409.13054.
  44. I. Bratt. 2025. Why there is no AI without inference. WSJ Partner Content sponsored by Arm, Vice President of Machine-Learning Technology.
  45. S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. B. Van Den Driessche, J.-B. Lespiau, B. Damoc, A. Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  46. G. Izacard, P. Lewis, M. Lomeli, L. Hosseini, F. Petroni, T. Schick, J. Dwivedi-Yu, A. Joulin, S. Riedel, and E. Grave. 2023 Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research, 24(251):1–43.
  47. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell. 2017 Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526.
  48. R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan. 2018. Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
  49. K. Robison. 2024. OpenAI cofounder Ilya Sutskever says the way AI is built is about to change. The Verge, December. Accessed: December 31, 2024.
  50. Epoch AI. 2024. Data on machine learning hardware. Updated December 30, 2024
  51. Together. 2023. Redpajama-data-v2: An open dataset with 30 trillion tokens for training large language models, 10. The dataset includes data processing tools for CommonCrawl data, focusing on five languages: English, French, Spanish, German, and Italian. It provides over 40 quality annotations for filtering and weighting data, including natural language indicators, repetitiveness measures, and content-based signals.
  52. A. S. Cummings. 2017. Democracy of sound: Music piracy and the remaking of American copyright in the twentieth century. Oxford University Press.
  53. M. Cummins. 2024. How much llm training data is there, in the limit? Educating Silicon, May. A comprehensive analysis of available text data for LLM training, including web data, code, academic publications, books, court documents, social media, transcribed audio, and private communications. Estimates suggest current LLM training sets are approaching the limits of high-quality public text, with approximately 40-90T tokens available in English and 100-200T tokens across all languages
  54. Wikipedia contributors. 2024. Common Crawl. Last edited on 30 December 2024, at 01:48 (UTC).
  55. B. Kahle. 2024. A message from internet archive founder, brewster kahle. Internet Archive donation page detailing the organization's mission, impact, and ways to support. The Archive hosts over 99 petabytes of data, including 625 billion webpages, 38 million texts, and 14 million audio recordings. Federal Tax ID: 94-3242767.
  56. D. Mider. 2024. Open source intelligence on the internet – categorisation and evaluation of search tools. Internal Security Review, 31:383–412.
  57. P. Taylor. 2024. Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028, 5. Accessed: December 31, 2024.
  58. T. Bogwasi. 2025. Business facts: Essential business statistics you should know in 2025.
  59. Q. V. Le. 2013. Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 8595–8598. IEEE.
  60. T. T. Nguyen, T. T. Huynh, Z. Ren, P. L. Nguyen, A. W.-C. Liew, H. Yin, and Q. V. H. Nguyen. 2024. A survey of machine unlearning.
  61. S. Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6(02):107–116.
  62. B. Hanin. 2018. Which neural net architectures give rise to exploding and vanishing gradients? Advances in neural information processing systems, 31.
  63. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
  64. C. Dwork, F. McSherry, K. Nissim, and A. Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer
  65. M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS'16. ACM, October
  66. J. Soria-Comas, J. Domingo-Ferrer, D. Sanchez, and D. Meg ´ ´ıas. 2016. Individual differential privacy: A utility-preserving formulation of differential privacy guarantees. CoRR, abs/1612.02298.
  67. C. Dwork, A. Roth, et al. 2014. The algorithmic foundations of differential privacy. e, Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407.
  68. O. Lieber, O. Sharir, B. Lenz, and Y. Shoham. 2021. Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs.
  69. J. Rae, S. Borgeaud, T. Cai, K. Millican, J. Hoffmann, F. Song, J. Aslanides, S. Henderson, R. Ring, S. Young, E. Rutherford, T. Hennigan, J. Menick, A. Cassirer, R. Powell, G. van den Driessche, L. A. Hendricks, M. Rauh, P.-S. Huang, A. Glaese, J. Welbl, S. Dathathri, S. Huang, J. Uesato, J. Mellor, I. Higgins, A. Creswell, N. McAleese, A. Wu, E. Elsen, S. Jayakumar, E. Buchatskaya, D. Budden, E. Sutherland, K. Simonyan, M. Paganini,
  70. Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra. 2018 Federated learning with non-iid data.
  71. D. Grover and B. Toghi. 2018. Mnist dataset classification utilizing k-nn classifier with modified sliding-window metric. arXiv preprint arXiv:1809.06846.
  72. S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Millican, G. B. Van Den Driessche, J.-B. Lespiau, B. Damoc, A. Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  73. N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and Ulfar Erlingsson. 2018. Scalable private learning with pate.
  74. X. Hou and X. Wang. 2024. Large language model with federated retrieval-augmented generation for improved knowledge retrieval.
  75. S. K. Ainsworth, J. Hayase, and S. Srinivasa. 2022. Git re-basin: Merging models modulo permutation symmetries. arXiv preprint arXiv:2209.04836.
  76. Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra. 2018. Federated learning with non-iid data.
  77. A. Youssef, M. Ng, J. Long, T. Hernandez-Boussard, N. Shah, A. Miner, D. Larson, and C. Langlotz. 2023. Organizational factors in clinical data sharing for artificial intelligence in health care. JAMA Network Open, 6:e2348422, 12.
  78. Mironov, I. (2017). Rényi Differential Privacy. IEEE 30th Computer Security Foundations Symposium (CSF), 263–275.
  79. McSherry, F., & Talwar, K. (2007). Mechanism Design via Differential Privacy. 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 94–103.
  80. Kalai, A., & Vempala, S. (2005). Efficient Algorithms for Online Decision Problems. Journal of Computer and System Sciences, 71(3), 291–307.
  81. Freund, Y., & Schapire, R. E. (1997). A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 55(1), 119–139.
  82. Phan, L., et al. (2025). Humanity's Last Exam. arXiv:2501.14249.

Chapter III: Network-Source AI

  1. Trask, A., Bluemke, E., Garfinkel, B., Ghezzou Cuervas-Mons, C., & Dafoe, A. (2020). Beyond Privacy Trade-offs with Structured Transparency. arXiv:2012.08347.
  2. Youssef, A., et al. (2023). Organizational Factors in Clinical Data Sharing for Artificial Intelligence in Health Care. JAMA Network Open, 6, e2348422.
  3. G. A. Kaissis, M. R. Makowski, D. Ruckert, and R. F. Braren. (2020). Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence, 2:305– 311, 6.
  4. McMahan, H. B., et al. (2017). Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017.
  5. Gentry, C. (2009). Fully homomorphic encryption using ideal lattices. STOC 2009, 169–178.
  6. Costan, V., & Devadas, S. (2016). Intel SGX Explained. IACR Cryptology ePrint Archive, 2016/086.
  7. Yao, A. C. (1982). Protocols for Secure Computations. FOCS 1982, 160–164.
  8. Goldreich, O., Micali, S., & Wigderson, A. (1987). How to play ANY mental game. STOC 1987, 218–229.
  9. Dwork, C., & Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4), 211–407.
  10. Goldwasser, S., Micali, S., & Rackoff, C. (1989). The Knowledge Complexity of Interactive Proof Systems. SIAM Journal on Computing, 18(1), 186–208.
  11. Feige, U., Fiat, A., & Shamir, A. (1988). Zero-Knowledge Proofs of Identity. Journal of Cryptology, 1(2), 77–94.
  12. Laurie, B., Langley, A., & Kasper, E. (2014). Certificate Transparency. RFC 6962.
  13. Chase, M., et al. (2020). Signal: Private Group System. Signal Foundation.
  14. Shamir, A. (1979). How to Share a Secret. Communications of the ACM, 22(11), 612–613.
  15. Cummings, C. K. (2017). Democracy of Sound: Music Piracy and the Remaking of American Copyright. Oxford University Press.
  16. Shu, K., et al. (2020). Combating Disinformation in a Social Media Age. WIREs Data Mining and Knowledge Discovery, 10(6), e1385.
  17. Schneier, B. (2015). Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World. W.W. Norton & Company.
  18. Veliz, C. (2020). Privacy Is Power: Why and How You Should Take Back Control of Your Data. Transworld Digital.
  19. A. Youssef, M. Ng, J. Long, T. Hernandez-Boussard, N. Shah, A. Miner, D. Larson, and C. Langlotz. 2023. Organizational factors in clinical data sharing for artificial intelligence in health care. JAMA Network Open, 6:e2348422, 12.
  20. B. Fecher, S. Friesike, and M. Hebing. 2015. What drives academic data sharing? PLOS ONE, 10:e0118053, 02.
  21. G. A. Ascoli. 2015. Sharing neuron data: carrots, sticks, and digital records. PLoS Biol, 13(10):e1002275.
  22. M. Sienkiewicz. 2025. From data silos to data mesh: a case study in financial data architecture. In International Conference on Database and Expert Systems Applications, pages 3–20. Springer.
  23. I. Scott and T. Gong. 2021. Coordinating government silos: challenges and opportunities. Global Public Policy and Governance, 1(1):20–38.
  24. N. Vidal. 2024. Compelling responses to NTIA's AI open model weights RFC. Open Source Initiative Blog, April. Accessed: 2025-11-04.
  25. National Telecommunications and Information Administration. 2024. NTIA AI open model weights RFC. Request for Comment, Docket No. NTIA-2023-0009, February. Comment period closed March 27, 2024.
  26. Associated Press. 2025 Ai startup anthropic agrees to pay $1.5bn to settle book piracy lawsuit. The Guardian, September. Settlement could be pivotal after authors claimed company took pirated copies of their work to train chatbots.
  27. Open Source Initiative. 2024. The open source ai definition – 1.0. Accessed: 2025-11-04.
  28. C. Owen-Jackson. 2024. Open source, open risks: The growing dangers of unregulated generative ai. IBM Think.
  29. J. Grow. 2025. The zuckerberg-lecun ai paradox: A tale of two visions. Medium, August. Accessed: 2025-11-04.
  30. I. Gabriel, A. Manzini, G. Keeling, L. A. Hendricks, V. Rieser, H. Iqbal, N. Tomasev, I. Ktena, ˇ Z. Kenton, M. Rodriguez, S. El-Sayed, S. Brown, C. Akbulut, A. Trask, E. Hughes, A. S. Bergman, R. Shelby, N. Marchal, C. Griffin, J. Mateos-Garcia, L. Weidinger, W. Street, B. Lange, A. Ingerman, A. Lentz, R. Enger, A. Barakat, V. Krakovna, J. O. Siy, Z. KurthNelson, A. McCroskery, V. Bolina, H. Law, M. Shanahan, L. Alberts, B. Balle, S. de Haas, Y. Ibitoye, A. Dafoe, B. Goldberg, S. Krier, A. Reese, S. Witherspoon, W. Hawkins, M. Rauh, D. Wallace, M. Franklin, J. A. Goldstein, J. Lehman, M. Klenk, S. Vallor, C. Biles, M. R. Morris, H. King, B. A. y Arcas, W. Isaac, and J. Manyika. 2024. The ethics of advanced ai assistants.
  31. A. Wealand. 2025. Reducing bias in AI models through open source. Red Hat Blog, September. Accessed: 2025-11-04.
  32. F. Pasquale. 2015. The Black Box Society. Harvard University Press.
  33. G. A. Kaissis, M. R. Makowski, D. Ruckert, and R. F. Braren. 2020. Secure, privacy-preserving ¨ and federated machine learning in medical imaging. Nature Machine Intelligence, 2:305– 311, 6.
  34. Gould, J. (2015). Data sharing: Why it doesn't happen. Nature Jobs.
  35. M. M. Grynbaum and R. Mac. 2023. The times sues openai and microsoft over a.i. use of copyrighted work. The New York Times, December.
  36. A. S. Cummings. 2017. Democracy of sound: Music piracy and the remaking of American copyright in the twentieth century. Oxford University Press.
  37. K. Shu, A. Bhattacharjee, F. Alatawi, T. H. Nazer, K. Ding, M. Karami, and H. Liu. 2020 Combating disinformation in a social media age. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(6):e1385.
  38. J. K. Elsea. 2006. The protection of classified information: The legal framework. Technical report.
  39. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. 2017. Communication efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273–1282. PMLR.
  40. C. Gentry and D. Boneh. 2009. A fully homomorphic encryption scheme, volume 20. Stanford university Stanford.
  41. O. Goldreich. 1987. Towards a theory of software protection and simulation by oblivious rams. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 182–194.
  42. D. Bogdanov, P. Laud, S. Laur, and P. Pullonen. 2014. From input private to universally composable secure multi-party computation primitives. In 2014 IEEE 27th Computer Security Foundations Symposium, pages 184–198. IEEE.
  43. M. Craddock, D. Archer, D. Bogdanov, A. Gascon, B. Balle, K. Laine, A. Trask, M. Raykova, M. Jug, r. McLellan, R. Jansen, O. Ohrimenko, S. Wardly, K. Lauter, N. Smart, A. Sharan, I. Saxena, R. Wright, E. Garcia, and A. Wall. 2018. UN Handbook on Privacy-Preserving Computation Techniques.
  44. C. Gentry and D. Boneh. 2009 A fully homomorphic encryption scheme, volume 20. Stanford university Stanford.
  45. D. Boneh, A. Sahai, and B. Waters. 2011. Functional encryption: Definitions and challenges. In Theory of Cryptography Conference, pages 253–273. Springer.
  46. C. Dwork, F. McSherry, K. Nissim, and A. Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer.
  47. C. Dwork, A. Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407.
  48. Sovrin The sovrin network and zero knowledge proofs. sovrin.org.
  49. F. Wang and P. De Filippi. 2020. Self-sovereign identity in a globalized world: Credentialsbased identity systems as a driver for economic inclusion. Frontiers in Blockchain, 2, 01.
  50. B. Laurie. 2014. Certificate transparency. Communications of the ACM, 57(10):40–46.
  51. M. Chase, T. Perrin, and G. Zaverucha. 2020. The signal private group system and anonymous credentials supporting efficient verifiable encryption. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 1445–1459.
  52. J. Loftus and N. P. Smart. 2011. Secure outsourced computation. In International Conference on Cryptology in Africa, pages 1–20. Springer.
  53. V. Costan and S. Devadas. 2016. Intel sgx explained. IACR Cryptol. ePrint Arch., 2016(86):1– 118.
  54. D. Chaum. 1985. Security without identification: Transaction systems to make big brother obsolete. Commun. ACM, 28(10):1030–1044, October
  55. S. Adler, Z. Hitzig, S. Jain, C. Brewer, V. Srivastava, B. Christian, and A. Trask. 2024. Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online.

Chapter IV: Broad Listening

  1. Granovetter, M. S. (1973). The Strength of Weak Ties. American Journal of Sociology, 78(6), 1360–1380.
  2. Dunbar, R. I. M. (1993). Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences, 16(4), 681–735.
  3. Grybauskas, D. (2023). Will Twitter's new rate limits really stop scraping? Built In, July.
  4. O'Brien, M. (2025). Reddit sues AI company Perplexity and others for 'industrial-scale' scraping of user comments. Associated Press, October.
  5. Grynbaum, M. M. and Mac, R. (2023). The Times sues OpenAI and Microsoft over A.I. use of copyrighted work. The New York Times, December.
  6. Youssef, A., Ng, M., Long, J., Hernandez-Boussard, T., Shah, N., Miner, A., Larson, D., and Langlotz, C. (2023). Organizational factors in clinical data sharing for artificial intelligence in health care. JAMA Network Open, 6:e2348422.
  7. Rao, A., Spasojevic, N., Li, Z., and DSouza, T. (2015). Klout score: Measuring influence across multiple social networks. Proceedings of the IEEE International Conference on Big Data, 2282–2289.
  8. Mangalindan, J. P. (2014). Klout acquired for $200 million by Lithium Technologies. Fortune, March.
  9. Oremus, W. (2018). Klout is dead, just in time for Europe's GDPR privacy law. That's not a coincidence. Slate, May.
  10. Simmons, D. (2022). 17 countries with GDPR-like data privacy laws. January.
  11. Hootsuite. (2024). Digital 2024: Global Overview Report. Hootsuite & We Are Social.
  12. Imperva. (2024). 2024 Bad Bot Report. Imperva Threat Research. Reported via Forbes, April 2024.
  13. D'Onfro, J. (2013). A whopping 20% of Yelp reviews are fake. September.
  14. Cross, B. (2022). Up to 30% of online reviews are fake and most consumers can't tell the difference. November.
  15. Foreign, Commonwealth & Development Office. (2022). UK exposes sick Russian troll factory plaguing social media with Kremlin propaganda. UK Government Press Release, May.
  16. Lawrence, S., Giles, C. L., and Bollacker, K. (1999). Digital libraries and autonomous citation indexing. Computer, 32(6):67–71.
  17. Page, L., Brin, S., Motwani, R., and Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab.
  18. Reagle, J. M. (2010). Good faith collaboration: The culture of Wikipedia. MIT Press.
  19. Goldhaber, M. H. (1997). The Attention Economy and the Net. First Monday, 2(4).
  20. Backstrom, L., et al. (2012). Four Degrees of Separation. ACM Web Science 2012, 33–42.
  21. Haveliwala, T. H. (2002). Topic-sensitive PageRank. Proceedings of the 11th International Conference on World Wide Web, 517–526.

Chapter V: Conclusion

The rewritten conclusion chapter synthesizes the thesis without inline citations.

Appendix I

  1. Casson, L. (2001). Libraries in the Ancient World. Yale University Press.
  2. Galen. (c. 170 CE). Commentary on Hippocrates’ Epidemics.
  3. Canfora, L. (1989). The Vanished Library: A Wonder of the Ancient World. University of California Press.
  4. Kennedy, G. A. (2003). Progymnasmata: Greek Textbooks of Prose Composition and Rhetoric. Society of Biblical Literature.
  5. British Library / Smithsonian. The Diamond Sutra.
  6. Scott, J. C. (1998). Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. Yale University Press.
  7. Diderot, D., & d’Alembert, J. le R. (1751–1772). Encyclopédie, ou dictionnaire raisonné des sciences, des arts et des métiers.
  8. Stanford Encyclopedia of Philosophy. Denis Diderot.
  9. Wells, H. G. (1937). World Brain. Methuen & Co.
  10. Wikipedia. Wikipedia: Governance.
  11. Somers, J. (2017). Torching the Modern-Day Library of Alexandria. The Atlantic.
  12. Licklider, J. C. R. (1960). Man-Computer Symbiosis. IRE Transactions on Human Factors in Electronics, HFE-1(1), 4–11.
  13. Cortada, J. W. (2018). Change and Continuity at IBM: Key Themes in Histories of IBM. Business History Review, 92(1), 117–148.
  14. Licklider, J. C. R., & Taylor, R. W. (1968). The Computer as a Communication Device. Science and Technology, 76, 21–31.
  15. Simon, H. A. (1955). A Behavioral Model of Rational Choice. The Quarterly Journal of Economics, 69(1), 99–118.
  16. Simon, H. A. (1956). Rational Choice and the Structure of the Environment. Psychological Review, 63(2), 129–138.
  17. Simon, H. A. (1982). Models of Bounded Rationality. MIT Press.
  18. Hayek, F. A. (1945). The Use of Knowledge in Society. The American Economic Review, 35(4), 519–530.
  19. Sen, A. (1981). Poverty and Famines: An Essay on Entitlement and Deprivation. Oxford University Press.
  20. Anderson, B. (1983). Imagined Communities: Reflections on the Origin and Spread of Nationalism. Verso.
  21. Bernays, E. (1928). Propaganda. Horace Liveright.
  22. Tye, L. (1998). The Father of Spin: Edward L. Bernays and the Birth of Public Relations. Crown Publishers.
  23. Hootsuite. (2024). Digital 2024 Global Overview Report.
  24. CompaniesMarketCap. (2026). Largest Companies by Market Capitalization.
  25. Ostrom, E. (1990). Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press.

Appendix II: Compute Distribution

  1. T. Mann. 2024. Ai's rising tide lifts all chips as amd instinct, cloudy silicon vie for a slice of nvidia's pie. The Register, Dec.
  2. Epoch AI. 2024. Data on machine learning hardware. Updated December 30, 2024.
  3. Ming. 2024. Nvidia ai gpu shipments to hit 4m in 2024. SMYG Limited News, Jun.
  4. Macrotrends LLC. 2024. Nvidia revenue 2010-2024 — nvda. Accessed January 2024

Appendix III: Model Rankings

  1. Epoch AI. 2024. Data on machine learning hardware. Updated December 30, 2024.
University of Oxford