Faster Segmented Sort on GPUs