The 2003 NASA shuttle disaster will live long in the memories of the space exploration community and indeed the rest of the world.
The Space Shuttle Columbia disaster occurred on February 1, 2003, when Columbia disintegrated over Texas and Louisiana as it re-entered Earth’s atmosphere, killing all seven crew members.
This was a tragic event that was very much in the public eye. In the future, could data analytics help avoid similar disasters? And in turn, help protect future astronauts?
NASA’s chief knowledge architect, David Meza, discussed with Information Age how analysing data on heat thermal tiles via a graph database might have helped avoid the 2003 shuttle disaster, and how it will help keep astronauts safe in the future.
Why did you decide to use graph databases — i.e., what were the specific functional advantages of this approach over other ways of working with data?
Over the years, I have worked with various database systems and early in my career worked as a SQL database developer and administrator.
While they were useful, I found the term relational in relational management database system (RDBMS) to be a misnomer.
The approach of using a graph database and the relationship structure between nodes was more intuitive to me, and allowed me to see traverse through patterns much easier.
>See also: 4 predictions for NoSQL technologies in 2016
Also, doing large join queries in SQL was quite painful and time consuming. The same queries can be done in a graph db with less coding, and much faster.
Prior to choosing Neo4j, what other tools did you look at (graph database or otherwise)?
My job as NASA chief knowledge architect is to develop and implement the technological roadmap to transform data into actionable knowledge.
To that end, I evaluate numerous technologies each year, too many to list. The ones I eventually employ have made it through our testing and have shown their capacity to deliver the information our end users need in a manner they require.
Can you talk briefly about how you set up and managed the project? For example, what were the biggest challenges you ran into, if any, and how did you overcome them?
Similar to other data science projects, I defined the goal, collected and cleaned the data, built the model evaluated the model, deployed the model, and visualised the results.
Fortunately, my office owned the data – by that, I mean we curated and were responsible for the data. Often, that is one of the biggest issues in a project for my team, getting access to all the data.
In this case, our biggest challenge was to get buy-in on using a new type of database most have not seen before.
We had to start with small demos of select data to show its capabilities and potential for finding answers quicker.
The demos led to additional questions and demo requests by management. Once buy-in was achieved, the project moved forward very quickly.
Safety is a fundamental value for all NASA work. How did that imperative impact this project? Specifically, how might using a graph database have prevented the 2003 shuttle disaster?
That is a tough question to answer, remember I did this analysis in 2015 and hindsight is 20/20.
I cannot second guess any decisions made at that time. That being said, once the topic modelling was completed on the lessons learned, I was able to do perform a trend analysis on past lessons, depicting the rise and fall of topics over the years.
In other words, it showed which topics had an increasing or decreasing number of lessons each year.
When I reviewed the past results, one topic, consisting of terms such as material, temperature, excess and contamination, began having an increase in submitted lessons around 1998 through 2000.
The old RDBMS system could not show this type of pattern. With this new system, engineers and managers could track these trends to be alerted to an increase of lessons in areas and react accordingly.
What does the resulting system do for your organisation now — how does it allow you to deliver against operational objectives?
The system allows engineers and project managers to quickly find clusters of lessons in a specific topic, allowing them to stay on target with their project goals and avoid past issues.
In the end, it is my hope this will allow projects to be completed on or before schedule and with fewer surprises.
Can you address the issues of size and complexity of data?
The data is ever growing.
NASA has over 50 years of data, dating back to the Apollo and Gemini eras. I am constantly looking at new lesson learned databases to import and connect to the existing graph db.
Based on what I have seen, billions of nodes and relationships will not be an issue with Neo4j.
How does Neo4j operate with other key software technologies?
Neo4j operates well with other technologies I use for analysis, search and visualisation.
The drivers they have developed and update have made it quite easy for me to use many languages to connect, such as R, Python and Java.
On top of that, many vendors have created applications that can connect to Neo4j and give you access to many visualisation options.
Where would you like to go next with graphs?
My next steps are to attack other business issues such as content management, data management, expert locator, recommendation engines and research impact. I think all these areas could benefit from graphs.