Skip to main content

Harvard's Clean Energy Project: Big Data Maps To Renewable Energy

Kai Trepte (Harvard Clean Energy Project)
Connected World
Mission City M
Average rating: *****
(5.00, 1 rating)

Recognizing that the fossil fuel based economy of the present must give way to a renewable energy based economy of the future, the Harvard Clean Energy Project set out to discover and design new molecular materials for the next generation of organic solar cells. These carbon-based photovoltaics offer a path to a simple, cost-effective, and high-volume production of renewable energy devices with exceptionally versatile features. They could in particular bring electricity to the estimated 2.5 billion people around the world living in rural areas without access to the power grid. The project is sponsored by the White House Materials Genome Initiative and part of the Global Climate and Energy Project.

By harnessing the immense computing power of the IBM World Community Grid (a distributed volunteer computing platform), the research team at Harvard University performs quantum chemical calculations on millions of organic material candidates. The obtained electronic properties are used to determine which compounds are most promising for high-performance materials.

Currently, Harvard’s Clean Energy Project has studied 2.3 million compounds with 24 million conformers in 150 million density functional theory calculations. It thus represents the most extensive first-principles quantum chemical investigation ever conducted. Each computational characterization of a molecular motif produces about 20-40 megabytes (MB) of data, and the project collects approximately 750 gigabytes (GB) of data each day. So far, the data archive has grown to about 400 terabytes (TB). To store the results of this massive investigation, the scientists at Harvard have built large data storage arrays called “Jabba”, based on a design by Backblaze Inc. Each array utilizes 45 3TB hard drives from HGST, a Western Digital Company. Harvard has designed their Jabba arrays with built-in redundancies (RAID and tape backup) to ensure the integrity of the valuable research data. The key to the arrays’ performance is the use of reliable, high-capacity, and low-power storage from HGST. Harvard has filled over 150 HGST drives to this point and has recently commissioned Jabba 5 and 6 to increase its capacity to 700TB. The project may well accumulate a petabyte of results by the time it winds down.

The virtual high-throughput approach of the Clean Energy Project allows the study of material candidates on an unprecedented scale. It eclipses the possibilities of experiment or traditional computational modeling by 4-5 orders of magnitude. The necessary parameter space for high-performance materials is very narrow and the search for suitable candidates correspondingly difficult. The presented large-scale screening, however, still provides 1000 candidates with the prerequisites for a power conversion efficiency of 11+% and 35000 candidates of 10+%. The most promising compounds are forwarded to experimentalist partners. The results are also used for the cheminformatics analysis of structure-property relationships and thus provide the foundation for the rational design of new leads. In June 2013, the data became available in an open and free reference database for the community.

In this session, Alan Aspura-Guzik, Professor of Chemistry and Chemical Biology at Harvard University and the hands-on lead/practitioner for the Harvard Clean Energy Project will inform attendees about the tools and techniques they acquired during this project around data mining, analysis, machine learning, drug discovery, and pattern recognition. He will outline best practices and lessons learned from this big data project that will benefit mankind aiding the quest for clean energy solutions, bringing electricity to billions around the world, and improving their quality of life.

Photo of Kai Trepte

Kai Trepte

Lead Enginees, Harvard Clean Energy Project

Kai Trepte is the lead software engineer for the Harvard Clean Energy Project. Kai was instrumental in translating the raw data, over 400TB of data on 2.3 million compounds, into an online data-store open to the world. Kai obtained a Masters in Logistics from MIT and was co-founder of John Galt Solutions, Inc., a supply chain management software provider with over 5,000 customers throughout the world. As the lead engineer on the Clean Energy Project Kai is applying his data warehousing and analytic skills to big-data in science. Kai will outline best practices and lessons learned from this big data project that will benefit mankind aiding the quest for clean energy solutions, bringing electricity to billions around the world, and improving their quality of life.