Research data concerning the genetic basis of health and disease is accumulating rapidly, as modern, high-throughput experimental techniques deliver increasingly larger data sets.Data integration efforts in the field face numerous challenges, including the increased data size and complexity, quality control, data sensitivity and personal privacy, data access and publication bias.Traditional approaches of gathering data into centralized repositories and publishing results in static paper journals, which have proved successful in the past, will not be sufficient to address the emerging and future needs of the field.The alternative of a partially centralized and partially federated model has been proposed to solve this problem. This will entail a distributed, decentralized network of interconnected information sources and analysis services, the first incarnations of which are now starting to appear. A central requirement of this model is the far greater use of standardization for data models and exchange formats, and in the deployment of existing and emerging software components and network protocols.Community adoption of new database technologies, and the development of robust data standards, will be vital to achieving the global integration of G2P data in the future. This might also help to address other challenges, such as accrediting and rewarding data submitters and database managers, as we move towards the emergence of a universal G2P 'knowledge environment'.