Zuhair Khayyat

Zuhair Khayyat

About

Detail

Co-founder and Chief Technology Officer
Riyadh, Riyadh Province, Saudi Arabia

Timeline


work
Job
school
Education
auto_stories
Publication

Résumé


Jobs verified_user 0% verified
  • Lucidya
    Co-founder and Chief Technology Officer
    Lucidya
    Oct 2017 - Current (8 years 9 months)
    Transforming a minimum viable product into a highly scalable fault-tolerant tool, through state-of-the-art big data technologies, and utilizing advanced machine learning techniques to provide sophisticated analytics
  • Qatar Computing Research Institute
    Research Intern
    Qatar Computing Research Institute
    Sep 2013 - Dec 2013 (4 months)
    Worked on distributed data cleaning
  • I
    Research intern
    IBM T J Watson Research Center
    Jul 2011 - Sep 2011 (3 months)
    Research Subject: Improving large scale graph based processing
  • SABIC
    Unix Admininstrator
    SABIC
    Sep 2008 - Mar 2009 (7 months)
    Manage and support SABIC's Unix Servers
  • Saudi Aramco
    Traning
    Saudi Aramco
    Jul 2007 - Jan 2008 (7 months)
    Got an award for excellent performance in Training
Education verified_user 0% verified
  • King Abdullah University of Science and Technology
    King Abdullah University of Science and Technology
    King Abdullah University of Science and Technology
    Jan 2010 - Dec 2017 (8 years)
  • KAUST King Abdullah University of Science and Technology
    Doctor of Philosophy - PhD, Computer Science
    KAUST King Abdullah University of Science and Technology
    Jan 2010 - Dec 2017 (8 years)
    Data cleansing approaches have usually focused on detecting and fixing errors with little attention to big data scaling. This presents a serious impediment since identifying and repairing dirty data often involves processing huge input datasets, handling sophisticated error discovery approaches and managing huge arbitrary errors. With large datasets, error detection becomes overly expensive and complicated especially when considering user-defined functions. Furthermore, a distinctive algorithm is desired to optimize inequality joins in sophisticated error discovery rather than naïvely parallelizing them. Also, when repairing large errors, their skewed distribution may obstruct effective error repairs. In this dissertation, I present solutio
  • KAUST King Abdullah University of Science and Technology
    M.S, Computer Science
    KAUST King Abdullah University of Science and Technology
    Jan 2009 - Dec 2010 (2 years)
  • K
    B.S, Computer Engineering
    King Fahd University of Petroleum Minerals
    Jan 2003 - Dec 2008 (6 years)
Projects (professional or personal) verified_user 0% verified
  • V
    Violet: Fast and Scalable Violation Detection
Publications verified_user 0% verified
  • P
    A survey and experimental comparison of distributed SPARQL engines for very large RDF data
    Proceedings of the VLDB Endowment
    Sep 2017
    Distributed SPARQL engines promise to support very large RDF datasets by utilizing shared-nothing computer clusters. Some are based on distributed frameworks such as MapReduce; others implement proprietary distributed processing; and some rely on expensive preprocessing for data partitioning. These systems exhibit a variety of trade-offs that are not well-understood, due to the lack of any comprehensive quantitative and qualitative evaluation. In this paper, we present a survey of 22 state-of-the-art systems that cover the entire spectrum of distributed RDF data processing and categorize them by several characteristics. Then, we select 12 representative systems and perform extensive experimental evaluation with respect to preprocessing cost
  • T
    Fast and scalable inequality joins
    The VLDB Journal
    Feb 2017
    Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B^+-tree, R^*-tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algor
  • I
    ScaleMine: Scalable parallel frequent subgraph mining in a single large graph
    International Conference for High Performance Computing Networking Storage and Analysis SC
    Nov 2016
    Abstract: Frequent Subgraph Mining is an essential operation for graph analytics and knowledge extraction. Due to its high computational cost, parallel solutions are necessary. Existing approaches either suffer from load imbalance, or high communication and synchronization overheads. In this paper we propose ScaleMine; a novel parallel frequent subgraph mining system for a single large graph. ScaleMine introduces a novel two-phase approach. The first phase is approximate; it quickly identifies subgraphs that are frequent with high probability, while collecting various statistics. The second phase computes the exact solution by employing the results of the approximation to achieve good load balance; prune the search space; generate efficient
  • Springer
    Large-Scale Graph Processing Using Apache Giraph
    Springer
    Sep 2016
    This book takes its reader on a journey through Apache Giraph, a popular distributed graph processing platform designed to bring the power of big data processing to graph data. Designed as a step-by-step self-study guide for everyone interested in large-scale graph processing, it describes the fundamental abstractions of the system, its programming models and various techniques for using the system to process graph data at scale, including the implementation of several popular and advanced graph analytics algorithms.
  • S
    BigDansing: A System for Big Data Cleansing
    SIGMOD
    Jan 2015
    Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling to big datasets. This presents a serious impediment since data cleansing often involves costly computations such as enumerating pairs of tuples, handling inequality joins, and dealing with user-defined functions. In this paper, we present BigDansing, a Big Data Cleansing system to tackle efficiency, scalability, and ease-of-use issues in data cleansing. The system can run on top of most common general purpose data processing platforms, ranging from DBMSs to MapReduce-like frameworks. A user-friendly programming interface allows users to express data quality rules both declaratively and procedurally, with no requirement of being awa
  • E
    Mizan:A System for Dynamic Load Balancing in Large-scale Graph Processing
    Eurosys
    Jan 2013
    Pregel was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on graph partitioning as a preprocessing step to balance computation across compute nodes. In this paper, we examine the runtime characteristics of a Pregel system. We show that graph partitioning alone is insufficient for minimizing end-to-end computation. Especially where data is very large or the runtime behavior of the algorithm is unknown, an adaptive approach is needed. To this end, we introduce Mizan, a Pregel system that achieves efficient load balancing to better adapt to changes in computing needs. Unlike known implementations of
This is a community-created genome.