The smallest operation unit supported by SCT memory database is a table, that is, there is no library concept;SCT memory database is based on columnar storage and supports C++ basic data structure types, including integer(Including
u08, i08, u16, i16, u32, i32, u64, i64)、
float(Including float, double)、string(string)、date(date)、time(time)、datetime(datetime).
Any field of SCT memory database implies an index based on a balanced tree, see create,The save and load statements in the sql statement support the saving and loading of
tables.
Explanation:SCT memory database query language (hereinafter abbreviated as sql),similar to standard query language (hereinafter abbreviated as SQL),The basic syntax structure is compatible, and the overall processing flow is not compatible with SQL.
Note that the following SQL operation methods are different from standard SQL:
SCT sql count internal parameters do not support *, the expression part only supports single*
Features supported by SCT memory database:
Support sql subquery, multi-table query
Support the highest-level transactions
Support character fuzzy matching and regular expression matching
Support full utf8 character encoding (and unique)
Good C/C++/JAVA/Python language interface operation form
Support external persistence, csv exchange
Support Windows, Linux system
Support local and network modes
Feature-rich GUI client
system composition
SCT memory database contains two modes of executable programs and development SDK and examples.
Run the corresponding sct_local_mem.exe directly in the local mode; after entering the execution environment, enter the "help" command to view the help information or directly enter the sql statement to execute. For the corresponding
sql specifications, see the sql quick guide section .
The network mode server is sct_s_mem.exe, and the server uses sct_mdb.toml as the configuration file. The current configuration content is only the binding port and user group information. For the toml file specification, see Related specifications (Note that the configuration file used by SCT does not support toml table array).
The sct_s_mem.exe will auto execute "init.sql" in work directory when start, this is useful for auto load data into memory.
The network mode client is sct_c_mem.exe, the client must specify the user name, password and optional IP port to run. The following is a valid running example.
sct_c_mem.exe user user_pwd
sct_c_mem.exe user user_pwd 192.168.1.11 55555
The default network mode verification policy is that the user name and password must be specified for the first login, 3 errors stop for 1 minute, the next 2 errors stop for 30 minutes, the next error stop for 15 days, and the next 1 error stop forever
until The service restarts.
The corresponding system file maybe not include the help document, and the complete lastest help reference file is the content of this page on website.
sql query process
The actual memory storage is columnar storage
The SCT memory database data processing flow is as follows:
The content of the where statement must be the retrieval condition using the index (different from the standard SQL statement here), and the condition is applied to filter the data rows.
The row information takes out the column content and performs expression calculation to generate a data set.
The data set can be expanded horizontally and vertically (join/union) to expand into a larger single data set.
Sort, filter, group and aggregate, project, rename and recalculate several post-operations in a single set.
sort
There are five operations for filtering:
filter inner row
filter between rows, complexity n*n
filter adjacent rows
filter row range
existence/subquery filter
group
project
rename
recompute
sql basic specification
This article is used as a quick guide to write simple and effective SCT memory databasesql statements.
The sql statement is composed of DDL and DML, where DML is composed of
select,
insert,
delete,
update,
rows_of,
cols_of,
fields_of,
show tables, etc.; DDL consists of
create,
alter,
drop,
save,
load,
exec and other components.
Each sql statement non-constant expression or comment content can be connected with any number of blank or comment characters, the blank characters are 0x09 to 0x0c (note that the blank characters do not contain 0x0d, which
means that the end of the SQL statement execution line defaults to unix format) or 0x20 ASCII characters.
All SQL statements are strictly case sensitive, and the character encoding is the only encoding UTF-8.
The terminator of each sql statement is ";", which can be omitted if it does not affect the parsing of the statement.
+0. //Positive floating point number without fractional part
-0.01 //Only decimal part negative floating point number
-100.0 //Only the integer part of the negative floating point number
-100e20 //Only integers and negative floating-point numbers with positive exponents
1.25E-32 //decimal and negative exponent part positive floating point number
1.25E+20 //decimal and positive exponent part positive floating point number
1.00E20f //strictly four-byte floating point number
1.00E20d //strict eight-byte floating point number
string
"a和中文bc" //Contains valid strings in Chinese and English (UTF-8 encoding)
"ab和中文c\"\'\\\a\b\f\n\r\t\v" //Valid string containing control escape
"ab和中文c\x0a\x0001\uaabb\Uaabbccdd" //Valid string containing hexadecimal escape and Unicode escape
@"aabbcc(018d7##@!@^_""AA6)aabbcc" //Raw string separated by aabbcc
The expression uses the C language expression basic structure and retains the priority of the corresponding
operation symbol
id //Domain expression represents the corresponding field value
(id) //The bracket expression indicates the evaluation of the priority bracket content
id+10 //additive expression represents the sum of the domain field and the constant value 10
id+max(grade) //additive expression represents the sum of the maximum value of the field field and the grade field
id*sin(id) //Integral expression represents the product of the sin value of id domain and id domain
id << 10 //The shift expression represents the value of the id field shifted by 10 bits to the left
id&&sin(id)>0 //The logical AND expression represents the logical AND result of id and sin(id)>0
id/sin(id) //Integral expression represents the quotient of id and sin(id)
id&min(id) //bits and expressions represent id and min(id) bits and results
id!=0||id==10 //Logical OR expression means the logical OR result of id not equal to 0 and id equal to 10
Expression type conversion and function processing follow the following principles
If the data type can be implicitly converted to the target type according to the C++ language, then it is
implicitly converted to the target type according to C++.
The function processing process is: if the function parameter is a field, the function acts on the value of each
row of the field; if the function parameter is a constant value, the constant value is used as the function
parameter; if the number of rows of the two parameters is not equal, the longer number of rows is used As an output
result, short parameters that are insufficient are filled continuously to the same length with the last bit.
can be used for functions in expressions
Function explanation and usage example after the function
Type conversion function
x2i08 //Convert any data type to a one-byte signed integer x2i08(type), x2i08(2.0)
x2i16 //Convert any data type to a two-byte signed integer x2i16(type), x2i16(2.0)
x2i32 //Convert any data type to a four-byte signed integer x2i32(type), x2i32(2.0)
x2i64 //Convert any data type to an eight-byte signed integer x2i64(type), x2i64(2.0)
x2u08 //Convert any data type to a one-byte unsigned integer x2u08(type), x2u08(2.0)
x2u16 //Convert any data type to a two-byte unsigned integer x2u16(type), x2u16(2.0)
x2u32 //Convert any data type to a four-byte unsigned integer x2u32(type), x2u32(2.0)
x2u64 //Convert any data type to an eight-byte unsigned integer x2u64(type), x2u64(2.0)
x2flt //Convert any data type to a four-byte floating point number x2flt(id), x2flt(10)
x2dbl //Convert any data type to an eight-byte floating point number x2dbl(id), x2dbl(10)
x2str //Convert any data type to a string x2str(id), x2str(10)
x2date //Convert any data type to date x2date(b_date),x2date("2012-03-05")
x2time //Convert any data type to time x2time(b_time),x2time("18:35:24")
x2datetime //Convert any data type to date and time x2datetime(b_dt),x2datetime("2012-03-05 18:35:24")
Aggregate function
sum //Find the sum of a column sum(id)
product //Find the product of a certain column product(id)
min //Find the minimum value of a column min(id)
max //Find the maximum value of a column max(id)
avg //Find the mean of a column avg(id)
count //Find the total number of a column count(id)
sort //Sort a column sort(id)
unique //Unique a column unique(sort(id))
count_unique //Find the unique count of a column count_unique(sort(id))
Range function
first //Take the first element of a column first(id)
last //Take the last element of a column last(id)
first_k //Take the first k elements of a certai first_k(id,10)
last_k //Take the last k elements of a certain co last_k(id,20)
range //Take a column of interval elements range(id,1,5)
nth //Take the nth element of a certain column nth(id,10)
String function
lcase //Convert the string to lowercase lcase("Abd")
ucase //Convert the string to uppercase ucase("abc")
concat //Connect two strings concat("abc","012")
length //Find the string length length("NAME")
all_case //Output all case string all_case("Insert")
starts //Output is string start with starts(name, "Jim")
contain //Determine whether the string contains the target contain(desc, "error")
ends //Output is string end with ends(val, "es")
substr //Take the string starting from the index to a fixed number of substrings substr(name, 1, 4)
replace //Replace the substring of the string with the new string replace(name, 2, 4, "abc")
trim_l //Move out consecutive spaces on the left side of the string trim_l(name)
trim_r //Move out consecutive spaces on the right side of the string trim_r(name)
trim //Move out consecutive spaces on both sides of the string trim(name)
reg_match //Regular expression matching (using ECMA-262 (2011) standard) reg_match(id,"[1-9]+.*")
fuzzy_match //Fuzzy Retrieval of Differences in Matching Characters fuzzy_match(id,"what"),fuzzy_match(type,"system", 5)
Mathematical function
abs //Find the absolute value of the value abs(value)
ceil //Find the smallest integer that is not less than the given valu ceil(value)
floor //Find the largest integer that is not greater than the give floor(value)
round //Find the rounded value of the value round(3.4)
exp //Find the definite power of e exp(value)
pow //The power of the value pow(id,2.0)
pi //Constant value Π pi()
sqrt //Find the square root of the value sqrt(value)
log //Find the natural logarithm of the value log(value)
log2 //Find the base-2 logarithm of the value log2(value)
log10 //Find the base-10 logarithm of the value log10(value)
log1p //Find the natural (base e) logarithm of 1+value log1p(value)
sin //Find the sine of the value sin(value)
cos //Find the cosine of the value cos(value)
tan //Find the tangent of the value tan(value)
asin //Find the arc sine of the value asin(value)
acos //Find the arc cosine of the value acos(value)
atan //Find the arc tangent of the value atan(value)
e //The mathematical constant e e()
Internal function
raw2hex //Memory hexadecimal display raw2hex(id)
hex2raw //Convert hexadecimal to memory data representation hex2raw("00aabb")
row_id //Output line number row_id()
id //Output internal id id()
now_id //Output max id that not used. now_id()
hash_passwd //Output the string's hash password hash_passwd("mypass")
Statistical function
var //Output variance var(id)
stddev //Standard deviation stddev(id)
Sequence function
seq //Output sequence 0 to n seq(10), seq(count(id)), seq(3)*3
rand //Output n random numbers from 0 rand(10)
constants //Output n constants constants(1,10), constants("sss",10)
Date function
last_day //Return to the last day of the date month last_day(date())
is_leap //Return whether it is a leap year is_leap(date())
now //Return the current date and time now()
date //Return the current date date()
time //Return the current time time()
add_day //Return the time or date time after n days add_day(now(),100)
add_nanosec //Return the time or date time after n nanoseconds add_nanosec(time(),10000000)
sub_day //Return the time or date and time n days ago sub_day(now(),100)
sub_nanosec //Return the time or date and time before n nanoseconds sub_nanosec(time(),10000000)
day_diff //Return the difference between the date or date time day_diff(x2date(d,date())
nanosec_diff //Return the difference in nanoseconds between time or date and time nanosec_diff(dt,now())
year_of //Return the year part of the date or date time year_of(date())
month_of //Return the month part of the date or date time month_of(date())
day_of //Return the day part of the date or date time day_of(date())
hour_of //Return the time part of the time or date time hour_of(time())
minute_of //Returns the minute part of the time or date time minute_of(time())
second_of //Returns the second part of the time or date time second_of(time())
nanosec_of //Return the nanosecond part of the time or datetime nanosec_of(time())
date_of //Return the date part of the date and time date_of(now())
time_of //Return the time part of the date and time time_of(now())
weekday //Return the day of the date or date and time (0 is Monday) weekday(now())
weekday_name //Return the name of the week of the date or date and time weekday_name(now())
weekday_cname //Return the Chinese name of the week of the date or date and time weekday_cname(now())
The where condition allows the use of relational operations, range operations, in operations and logical operations
id>=10 //Greater than or equal to logic operation
id>=all (select id from t2) //The logical operation of all results greater than or equal to the subquery
id==any (select id from t2) //Logical operation equal to any result of subquery
id==any (1,3,5,7) //Logical operation equal to any result of the set
id==10 //Equal to logic operation
id!=10 //Not equal to logical operation
id>=10&&id<=20 //Logic AND operation between relational operations
id>=10||(id < 5&& id > 1) //Logical OR operation between relational operations
10 <= id <= 20 //Relational operation is less than or equal to, less than or equal to operation
10 < id <= 20 //Less than, less than or equal to relational operations
first //The first of the range operation
last //The last of the range operation
first 100 //The first 100 of the range operation
last 100 //The last 100 of the range operation
range 10 100 //The 10th to 100th range operation
nth 10 //The 10th number of the range operation
id max //The largest value range operation
id min //The smallest value range operation
id max 100 //The largest first 100 of the value range operation
id min 100 //The smallest first 100 of the value range operation
id in (1,2,3,4,7) //in operation takes the id of 1, 2, 3, 4, 7
id in (select id from t2); //Retrieve the result of id belonging to the sub-expression
id not in (select id from t2); //Remove the result that id does not belong to the sub-expression
Note that the following are invalid where conditions10 < id //The field name id should be placed first
select
select * from table1;
//Select all fields from table1
select id from tab2;
//Select the id field from tab2
select id /sin(id) from tab2 where id < 10;
//Select the id data with id less than 10 from tab2 and find the value of id divided by sin(id)
select max(id) from tab2 where id < 10&&id > 3;
//Select the largest id greater than 3 and less than 10 from tab2
select id << 10 from tab2 where id!=3
//Select the id whose id is not equal to 3 from tab2 and shift it to the left by 10 places
select sin(id),name into t2 from t1 where id < 10;
//Select from t1 with id less than 10 and insert t2 with sin(id) and name as the column
select * from t1 where id in (select id2 from t2 where id2 > 10);
//Select from t1 when id is in t2 and when id2 is greater than 10, all of t1 in id2
select * from t1 where id < 10 order by id, name asc;
//Select from t1 when the id is less than 10 and sort the dictionary in ascending order by id and name
select * from t1 where id < 10 group sum(id) by name;
//Select from t1 when id is less than 10 and use name to group sum(id)
select * from t1 where id < 10 all_each on distinct;
//Select from t1 when the id is less than 10 and filter all rows to remove duplicates
select * from t1 where id < 10 all_each on id==_id && name==_name;
//Select from t1 when the id is less than 10 and filter all rows with the same id and the same name,_ means each
of the previous set
select * from t1 where id < 10 adj on id>_id && name==_name;
//Select from t1 when the id is less than 10, then filter the row neighbor id greater than the previous data with
the same id and name,_ means the previous line
select * from t1 where id < 10 inner on id==name;
//Select from t1 when the id is less than 10 and filter the data with the same id and name in the row
select * from t1 where id < 10 first;
//Select from t1 when the id is less than 10 and filter to keep the first row
select * from t1 where id < 10 first 30;
//Select from t1 when the id is less than 10, filter and keep the first 30 rows
select * from t1 where id < 10 range 30 40;
//Select from t1 when the id is less than 10 to filter and retain the data in the range of 30 to 40 rows
select id1,name from t1 where id < 10 exists (select id from t2 where id==id1);
//Select from t1 when id is less than 10 and exist in t2 id1 and name
select id as nid,name from t1 where id < 10 not exists (select id from t2 where id==nid);
//Select from t1 when id is less than 10 and there is no nid and name with id equal to nid in t2
select * from t1 where id < 10 recompute id=id+10;
//Select from t1 when id is less than 10 and recalculate id equal to id plus 10.
select * from t1 where id < 10 recompute id+=10,name=x2str(id);
//Select from t1 when the id is less than 10, recalculate the column id equal to id plus 10, name equal to the id
converted into a string
select * from t1 where id < 10 project except id;
//Select from t1 when the id is less than 10 all post-projection to retain the column data except id
select * from t1 where id < 10 project id;
//Select all post-projection columns whose id is less than 10 from t1 to retain column id
select * from t1 where id < 10 rename id newid;
//Select from t1 when the id is less than 10 and rename the id to newid
select * from t1 where id < 10 project id range 30 40 group sum(id) by id order by id rand rename id newid;
//Select all post-projection id from t1 when id is less than 10, filter data in the range of 30 to 40, use id to
group sum(id), sort by id randomly, rename id to newid
select * from t1 where id < 10 join (select * from t2 where id2 < 10 on id1==id2);
//Select from t1 when id1 is less than 10 and expand horizontally. Select from t2 when id2 is less than 10 and id2
is equal to id1.
select sin(seq(count(id))+1), id, name from t1 where id < 10 union (select * from t2 where id < 10);
//Select from t1 when id is less than 10 sin(seq(count(id))+1), id, name vertical expansion, select all the id less
than 10 from t2
select sin(id) as sinid,id from t1 where id < 10 inner on sinid < id;
//Select sin(id) with id less than 10 from t1 as sinid and id filter to keep the data with sinid less than id in
the row
select * from t1 export as csv split by "," path = "t1.csv";
//Select all fields of t1 to export to t1.csv file separated by commas
select * from t1 export as csv split by "," path += "t1.csv";
//Select all fields of t1 to export to or replace the t1.csv file separated by commas
insert
insert into tab1 (name,id) values("Tom",100); //Single insert
insert into tab2 (val,type)values(1.00,1)(2.00,2)(3.00,3); //Multiple inserts
insert into tab2 (val,type)values(1.00,1),(2.00,2),(3.00,3); //Multiple inserts
insert into t2 (id,name) select id,name from t1; //Insert subquery result
insert into t2 (id,name) select seq(10)+1,"name"+x2str(seq(10)+1); //Insert custom data
Note that SCT memory database does not support null fields, so insert must specify a value for each field
delete
delete from tab1 where id < 10; //Delete all data with id less than 10 from tab1
update
update tab1 set name="Tom",id=10 where id==5;
//Update the name field with id equal to 5 in tab1 to Tom's id to 10.
update tab1 set value=10 where __id==5;
//Update the value of the internal system id equal to 5 in tab1 to 10 (note that __id is a reserved keyword for the system, which is the result of the id() function output, and is only used for the double equal sign query condition)
create
create table tab1 (id u08 (false, false));
//Create table tab1, the initial field is id type is one-byte unsigned integer, no repetition is allowed
create table tab1 (val float(),name string(true));
//Create table tab1, the initial field val is floating-point type and no repetition is allowed, and the field name
is string type and repetition is allowed
Note that the second parameter of the current field is a reserved parameter and has no effect in actual use
alter
alter table tab1 add id u08 (false, false); //Add id field in tab1
alter table tab1 modify id u08 (false, false); //Modify the id field type in tab1, the modification will clear
the previous field data, please backup
alter table tab1 rename id newid; //Rename the id field in tab1 to newid
alter table tab1 drop id; //Delete the id field in tab1
drop
drop table tab1 //Delete table tab1
truncate
truncate table t1; //Quickly clear t1 and retain data structure
save
save tab1 as bin path+="tab1.bin"
//Save table tab1 as a binary data file, the saved file name is tab1.bin, if there is a file, replace it
save tab1 as sql path="tab1.sql"
//Save table tab1 as a sql data file, and the saved file name is tab1.sql
load
Note: The table cannot exist when using load
load tab1 from bin path="tab1.bin"
//Load the binary data file of table tab1, load table tab1 from the file to the database
load tab1 from sql path="tab1.sql"
//Load the sql data file of table tab1, and load the table information from the file to the database (note that the
table name tab1 is ignored, and the imported table name is determined by the content of the sql file)
import
Note: The table must exist when using import
import t1 from csv split by "\t" path="t1.csv"
//Import from t1.csv to t1 with \t as the separator
import t1 from csvs path="t1.csvs"
//Import t1.csvs into t1 (csvs file indicates that the string is converted to ASCII code hexadecimal characters)
field name
abc
abc0123
ABC_abc
[Valid non-@string (see string)]
table name
abc
abc0123
ABC_abc
[Valid non-@string (see string)]
rows_of
rows_of tab1 //Get the number of rows in tab1
cols_of
cols_of tab1 //Get the number of columns in tab1
fields_of
fields_of tab1 //Get the field name of tab1
show tables
show tables //Get all table names in the database
exec
exec "data.sql"
//Execute all statements in the sql file
atom_exec
atom_exec insert into t1 (id,name) select 1,"sss";delete from t1; commit
//Transactional execution of multiple statements with ";" as the interval and ending with commit (the isolation
level is the highest level, generally speaking, either all succeed or all fail, multi-statements only support
insert, update, delete)
backup
backup as bin path="b1.sbk" //Backup the entire database as binary data and save it in the b1.sbk file
backup as sql path+="b1.sbk" //Backup the entire database as sql data and replace the b1.sbk file
restore
restore from sql path="b1.sbk" //Restore the entire database from the sql backup file b1.sbk
restore from bin path="b1.sbk" //Restore the entire database from the binary backup file b1.sbk
optimization suggestion
SCT memory database uses BST as an index internally. When querying for each column, the data is retrieved through the BST tree to construct query results, which is very suitable for small-scale interval queries; the content of the
where statement is the first-level filter condition of the search condition, which determines the output row The most important part of the number, the query conditions should give priority to ensuring that the where part is fully
filtered.
Where query conditions are given priority to use a
<=id <=b instead of id>= a&&id
<=b. The former will directly use the internal interval algorithm to complete, the latter will query the two parts of the interval separately for collection Intersection operation.
Use && in preference for where conditions, and try to avoid using || logical operations.
Try not to use != expressions in where conditions.
Use rows_of first to get the number of rows in the table, not the count field.
For the low effective data of the big data table, the query filter in is preferred for the initial filtering.
For large data sets, multiple valid data can be filtered according to conditions using exists.
When sql is parsed, the parsed syntax tree will be cached. It is recommended to merge insert statements as much as possible to avoid multiple parsing.
Use the where statement to filter the data set first, and reduce the use of data set post-operation processing.
The sql statement expression evaluation is slower than the C language, try to avoid using a large amount of expression processing; a large amount of expression processing can be reprocessed in C language after the parsed
expression is obtained.
When importing large quantities of data, the load command is preferred, the import statement is second best, and the exec or individual insert statement is used last.
development instruction
SCT memory database is written in C++, and the network side provides packaged C/C++/Java(include JDBC)/Python interface. The development library is located in the sdk directory and also provide a sample.
The Windows dynamic link library of vs2019 is provided in the installation package, located in the sdk/dll directory, also G++ 8.2.0 library in so directory. The header file is located in the sdk/inc directory.
Basic functions include connecting, exiting the server, executing SQL statements, and obtaining data set row and column information and value information. The following is a complete C language interface example (located in sdk/sql_lib_sample.c):
#include <stdlib.h>
#include <stdio.h>
#include "sql_lib.h"
int main(int argc, char** argv) {
if(!connect_to_server("192.168.5.10", 55555, "user", "user")){ return -1; }
const char* s = "create table tab1 (id u32(true), name string());";
void* d = exec(s, strlen(s));
if (d == NULL) { printf("execute sql failed\n"); return 0; }
del_datas(d);
s = "insert into tab1 (id,name) select seq(1000000), x2str(seq(1000000)+10);";
d = exec(s, strlen(s));
if (d == NULL) { printf("execute sql failed\n"); return 0; }
del_datas(d);
s = "select id,name from tab1;";
d = exec(s, strlen(s));
//d = exec("select id,name from tab1 where id < 10;");
if( d == NULL ){ printf("execute sql failed\n"); return 0; }
print(d);
//print_bymyself(d);
del_datas(d);
s = "drop table tab1;";
d = exec(s, strlen(s));
exit_to_server();
return 0;
}
common problem
Network connection is unsuccessful
1. The local end of SCT does not support network functions. Make sure that the server is turned on when using the client to connect; in the default configuration file, sct_mdb.toml, the server section
port specifies the local link port. Make sure that the corresponding port is not occupied.
2. SCT uses login verification strategy. Ensure that the number of operations does not exceed the threshold requirement.
where statement is not supported
SCT where statements are not compatible with standard SQL content. Related statements such as inline condition filtering (id==name), exist, group by, etc. need to be converted to data post-processing operations。
Function reg_match is not working properly
The SCT regular expression parsing library is an implementation of ECMA-262 (version 2011), compatible with most regular expression grammars, and the addition of \h matches Chinese \H matches non-Chinese boundary characters.
In order to avoid performance pitfalls under special circumstances, the regular expression library uses non-backtracking matching by default, that is, .* will match all, and should be avoided when using it.
fuzzy_match is particularly slow
The default fuzzy matching will match all matching results with a difference of less than half the length, so the matching algorithm is not suitable for longer strings.
Error feedback
SCT memory database has been thoroughly and fully tested, but some unknown errors may still be unavoidable. For related errors, please send the problem description to admin@shucantech.com.