NASA is leveraging the AWS Cloud to study the effects of climate change; when live, NISAR satellite can be used by startups and researchers to predict vulnerabilities on earth.
This year, the world has witnessed violent weather patterns. The unpredictability has been because weather departments have not been able to crunch massive amounts of data and make sense of it. But the data was right there.
For example, the California fires, in Santa Rosa, had all the warning signs because data from NASA showed that the skies were clear for a couple of days before the fire broke out. On October 9, 2017, when giant smoke spots were spreading across the northern side of the State of California, firefighters used this to understand where the fire would be intense and also warn people about where the smoke was going. Eventually, they predicted that the smoke would go northeast and also mapped where the fire would create large disturbances.
Kevin Murphy, Program Executive at Earth Science Data Systems at NASA, said: “We can understand how smog gets transported from China to the US and also understand incidents during hurricanes, thanks to open data and analysis on the cloud. This data is available to researchers and companies to make sense of.”
Kevin was speaking at AWS re:Invent, Amazon's sixth annual cloud conference. The conference, which started on Sunday night in Las Vegas, will continue till December 1. It has historically been a platform for the internet giant to introduce new products, partnerships, and services for its web services division.
With ISRO and NASA launching the NISAR satellite in 2021, this satellite will generate 50 million peta bytes of data per year. It will take images of ice sheet collapses and will study ecosystem disturbances. This is important because of the significant compute power needed to crunch data to be made sense of. The colossal data expected to be crunched for Indians and Americans will be 80 terra bytes generated; 400TB will go into reprocessing where 150 peta bytes will need to be processed at 50GB per second.
This is where AWS comes in - this volume and variety of data is going to be crunched on AWS AI tools. What is even more interesting is that all this data is made, and will be publicly, available on the cloud. Courtesy AWS. Today, around 367 million variables have been indexed and made searchable. This data is available to researchers in milliseconds.
Earth data is a national asset today. It allows rigorous science investigations for spacecraft research, companies, geo-spatial scientists and airline companies. NASA has the single largest repository of earth science data and it integrates this data from diverse observational satellite platforms that observe gravity, atmosphere and water reservoirs. This data, which has been collected over a period of more than four or five decades, is now open to the public.
“We need to share this data so scientists can help us predict outcomes on earth,” Kevin says.
NASA gives out data generated from the Earth Observing System Data and Information System (EOSDIS), which transforms and distributes data to different groups through API calls. This data has helped several businesses in agriculture and transportation. NASA has 3 million users for this data and distributes 24 peta bytes of data per day.
“We have visualisation and discovery tools for people to access with open APIs. Developers can build services to satisfy their clients,” Kevin says.
Analytics and processing speed is key in the open data world and NASA has created open APIs for all startups to connect to this data.
Who is crunching all this data?
Dan Pilone, CTO and Co-founder Element84, says the data modelling was done by his company for NASA.
“We had to forklift this heavy data to create sub-second search, which we brought down to answers in 500 milliseconds and worked on the project on a managed services mode on AWS,” Dan says. He adds that NASA’s aim was to open it up to as many people as possible to search and use.
However, there will be a few factors that people will have to keep in mind in the open source world. To manage a multi-node environment there will be permission and authentication problems, but the data will be integrated for all parties on AWS.
“The data should be protected during distribution,” Dan says.
The reason NASA is on Amazon Web Services is because the open data project can work on cost-effective large archive storage. NASA can also use predictive analytics and machine learning to look at past user behaviour and predict future behaviour of clients who use this data to help them search better.
Doing more in less time
Joe Flasher, Open GeoSpatial Data Lead at Amazon Web Services, says, “AWS tech allows public sector organisations to unarchive their data and structure it for crunching. Now students and researchers are using Landsat on AWS; they use its data in seconds.”
In the old days, processing 100,000 requests would take 200 days for a researcher; Amazon Web Services does that in less than a week.
In the world of data, users need to understand how to use it and what kind of tools and algorithms can make it easy to crunch them for insights.
A blog from Columbia University highlights the need for better tools for sustainable development. A recent report coordinated by Sustainable Development Solutions Network estimates the world will need to spend roughly $1 billion a year to sustain and enhance statistical systems.
The report, Data for Development: A Needs Assessment for SDG Monitoring and Statistical Capacity Development, proposes a typology of data systems and the estimated costs to measure and generate this data for 77 developing countries.
The blog adds that the world needs $300 million a year towards statistical systems and, going further, will need to leverage up to $200 million in additional development assistance to support country efforts.
The efforts of NASA and AWS to open up data and work with the startup community and researchers are commendable indeed.