I had the opportunity to attend a few sessions at the Accelerated Computing for Innovation Conference 2018 resented by The Sydney Informatics Hub at the University of Sydney. It was a great time hearing some of the leading-edge work conducted by many different researchers and academics around how data science and machine learning has been used in a variety of applications specifically using large scale, high performance computing. All presenters had fantastic content to share across a broad range of topics such as health, genomics, astronomy, climate and more social aspects. Below is only a few of the topics that resonated with me - particularly because it was closer to the work I do in my consulting work and also around the teaching and training I conduct. For a full list of presenters, see here.
Machine Learning and Big Data in Life Science and Society
I really enjoy the work that Aidan O’Brien from CSIRO is doing in applying high performance computing and machine learning with genomic data and the use of VariantSpark. It was also particularly interesting to see how RandomForest in R performed around speed and accuracy against it given it’s a more common framework.
It was also really interesting to hear from Mark Pinese from the Garvan institute on computing the genomics of good health. Fascinating work building the Genome Reference Bank with over 4,000 people made up of healthy elderly (70+ years old) Australians. Some of the big data work in whole-genome sequencing presented used over 1 petabyte of data and more than 3.2 million hours of computation. Key highlight was a comment that the cost to perform genome sequencing is getting cheaper but analysing its data given volume is only going to get hard.
Another interesting presentation was from John Sebastian Eden, WIMR, USyd showcasing the work from the ‘Revealing the Australian Virome’ project and some of the computational requirements to conduct genome-related data work. The larger the complexity of the scope of the work the more computational power needed.
Another interesting presentation was from Ben Evans from NCI. He showcased some of the current challenges in cross-disciplinary computational and data integration his team and other researchers face. There were some interesting constraints around big data integration across interdisciplinary computational applications. The suggested FAIR approached looked like a sensible one.
Also, the work Cormac Purcell, MQ presented around how machine learning spans diverse scientific fields was really interesting. He showcased some really cool and appropriate examples of neuralnet models and TensorFlow applications which span from star classification in astronomy related research to shark identification using drones in the oceans.
The work showcased by Daisee on NLP and AI for businesses looked promising with emphasis on customer relationship management AI. A big issue they are trying to solve is around call centre and voice / dialog / speech recognition data. Off the back of their strategic approach, there was an interesting research they conducted around AI adoption in businesses. They found that many businesses in Australia are in the ‘Passives’ group - which I guess was not very surprising.
Above are some pictures of this really good event. In summary, great presenters, great topics. There was a lot of talk around utilising cloud computing and high-performance computing to make all of the data crunching work happened with special mentions to only two out of the big three cloud platforms as well as the super computing capabilities from the Pawsey Supercomputing Centre and NCI. The overall impression was that the role that cloud computing will play in the big data and machine learning applied to academia and life science and social research will only continue to grow over time.