-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asciiext #15
Merged
Merged
Asciiext #15
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
{-# LANGUAGE TypeApplications #-} | ||
|
||
-- | This module treats 'Bytes' data as holding ASCII text. Providing bytes | ||
-- outside the ASCII range (@U+0000@ -- @U+007F@) may cause a failure or | ||
-- unspecified results, but such bytes will never be inspected. | ||
-- | ||
-- For functions that can operate on ASCII-compatible encodings, see | ||
-- 'Data.Bytes.Text.AsciiExt'. | ||
module Data.Bytes.Text.Ascii | ||
( fromString | ||
) where | ||
|
||
import Data.Bytes.Types (Bytes) | ||
import Data.Char (ord) | ||
import Data.Word (Word8) | ||
|
||
import qualified Data.Bytes.Pure as Bytes | ||
import qualified GHC.Exts as Exts | ||
|
||
|
||
-- | Convert a 'String' consisting of only characters in the ASCII block | ||
-- to a byte sequence. Any character with a codepoint above @U+007F@ is | ||
-- replaced by @U+0000@. | ||
fromString :: String -> Bytes | ||
fromString = Bytes.fromByteArray | ||
. Exts.fromList | ||
. map (\c -> let i = ord c in if i < 128 then fromIntegral @Int @Word8 i else 0) | ||
|
||
-- TODO presumably also fromText and fromShortText |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
{-# LANGUAGE BangPatterns #-} | ||
{-# LANGUAGE LambdaCase #-} | ||
{-# LANGUAGE RankNTypes #-} | ||
|
||
-- | This module contains functions which operate on supersets of 'Bytes' containing ASCII-encoded text. | ||
-- That is, none of the functions here inspect bytes with a value greater than 127, and do not fail due to the presence of such bytes. | ||
|
||
-- For functions that can fail for bytes outside the ASCII range, see | ||
-- 'Data.Bytes.Ascii'. For functions that can inspect bytes outside ASCII, see | ||
-- any of the modules for ASCII-compatible encodings (e.g. 'Data.Bytes.Utf8', | ||
-- 'Data.Bytes.Latin1', and so on). | ||
module Data.Bytes.Text.AsciiExt | ||
( -- * Line-Oriented IO | ||
hFoldLines | ||
, hForLines_ | ||
-- ** Standard Handles | ||
, forLines_ | ||
, foldLines | ||
-- * Text Manipulation | ||
, toLowerU | ||
) where | ||
|
||
import Control.Monad.ST (ST) | ||
import Control.Monad.ST.Run (runByteArrayST) | ||
import Data.Bytes.Types (Bytes(..)) | ||
import Data.Primitive (ByteArray) | ||
import Data.Word (Word8) | ||
import System.IO (Handle, hIsEOF, stdin) | ||
|
||
import qualified Data.Bytes.Pure as Bytes | ||
import qualified Data.ByteString.Char8 as BC8 | ||
import qualified Data.Primitive as PM | ||
|
||
-- | `hForLines_` over `stdin` | ||
forLines_ :: (Bytes -> IO a) -> IO () | ||
{-# INLINEABLE forLines_ #-} | ||
forLines_ = hForLines_ stdin | ||
|
||
-- | `hFoldLines` over `stdin` | ||
foldLines :: a -> (a -> Bytes -> IO a) -> IO a | ||
{-# INLINEABLE foldLines #-} | ||
foldLines = hFoldLines stdin | ||
|
||
-- | Perform an action on each line of the input, discarding results. | ||
-- To maintain a running state, see 'hFoldLines'. | ||
-- | ||
-- Lines are extracted with with 'BC8.hGetLine', which does not document its | ||
-- dectection algorithm. As of writing (bytestring v0.11.1.0), lines are | ||
-- delimited by a single @\n@ character (UNIX-style, as all things should be). | ||
hForLines_ :: Handle -> (Bytes -> IO a) -> IO () | ||
hForLines_ h body = loop | ||
where | ||
loop = hIsEOF h >>= \case | ||
False -> do | ||
line <- Bytes.fromByteString <$> BC8.hGetLine h | ||
_ <- body line | ||
loop | ||
True -> pure () | ||
|
||
-- | Perform an action on each line of the input, threading state through the computation. | ||
-- If you do not need to keep a state, see `hForLines_`. | ||
-- | ||
-- Lines are extracted with with 'BC8.hGetLine', which does not document its | ||
-- dectection algorithm. As of writing (bytestring v0.11.1.0), lines are | ||
-- delimited by a single @\n@ character (UNIX-style, as all things should be). | ||
hFoldLines :: Handle -> a -> (a -> Bytes -> IO a) -> IO a | ||
hFoldLines h z body = loop z | ||
where | ||
loop !x = hIsEOF h >>= \case | ||
False -> do | ||
line <- Bytes.fromByteString <$> BC8.hGetLine h | ||
x' <- body x line | ||
loop x' | ||
True -> pure x | ||
|
||
-- | /O(n)/ Convert ASCII letters to lowercase. This adds @0x20@ to bytes in the | ||
-- range @[0x41,0x5A]@ (@A-Z@ ⇒ @a-z@) and leaves all other bytes alone. | ||
-- Unconditionally copies the bytes. | ||
toLowerU :: Bytes -> ByteArray | ||
toLowerU (Bytes src off0 len0) = | ||
runByteArrayST action | ||
where | ||
action :: forall s. ST s ByteArray | ||
action = do | ||
dst <- PM.newByteArray len0 | ||
let go !off !ix !len = if len == 0 | ||
then pure () | ||
else do | ||
let w = PM.indexByteArray src off :: Word8 | ||
w' = if w >= 0x41 && w <= 0x5A | ||
then w + 32 | ||
else w | ||
PM.writeByteArray dst ix w' | ||
go (off + 1) (ix + 1) (len - 1) | ||
go off0 0 len0 | ||
PM.unsafeFreezeByteArray dst |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dectection